Exploring Explainability in the Evaluation of Korean L2 Learners’ Three-Way Stop Pronunciation Errors Using the ChatGPT Realtime API

doi:10.18627/jslg.41.3.202511.245

All Issue

2025 Vol.41, Issue 3 Next Page

Research Article

Exploring Explainability in the Evaluation of Korean L2 Learners’ Three-Way Stop Pronunciation Errors Using the ChatGPT Realtime API ChatGPT 실시간 음성 대화 API 기반 한국어 학습자 삼중 파열음 발음 오류 평가의 설명 가능성 탐색: 노선주^1*
Noh, Sunju^1*; ¹건국대학교

¹Konkuk University

30 November 2025. pp. 245-278

PDF

Abstract

This study, motivated by the limitations of transcription-based evaluation for articulatory guidance, explores the explainability of pronunciation assessment using a GPT-4o Realtime API–based system, focusing on Korean L2 learners’ three-way stop consonants. A total of 120 utterances from 40 Indonesian and Vietnamese learners were analyzed, and expert ratings, Azure scores, and Realtime API assessments were collected. Realtime API outputs were annotated for error location, cause, correction, and encouragement. We examined explanation consistency, performed k-means clustering, and compared score distributions using descriptive statistics and Bland–Altman analysis. The Realtime API and Azure produced higher scores than experts and showed similar distributional patterns. The Realtime API accurately pinpointed erroneous syllables and provided concrete correction suggestions, although causal explanations were often vague, showing only partial consistency. All responses included encouragement (e.g., praise, acceptance, clarity emphasis, practice prompts). Focusing on Korean three-way stops, this study suggests that the Realtime API can deliver actionable, real-time, speech-based explanatory feedback; future work should validate acoustic-phonetic causes and explore classroom applicability.

Keywords

ChatGPT Realtime API

explainable pronunciation assessment

three-way stops

pronunciation errors in L2 Korean

automated pronunciation evaluation

References

경나, 나희정, 박지은, 박정식. 2025. 비원어민 한국어 발음 평가를 위한 자기 지도 학습 기반 한국어 음소 인식. 말소리와 음성과학, 17.1, 51-61. 10.13064/KSSS.2025.17.1.051
김형민, 고현준. 2022. 한국어 학습자 발화에 대한 발음 평가 API와 한국어교육 전문가의 평가 비교. 이중언어학, 90, 29-52. 10.17296/KORBIL.2022..90.29
김형민. 2023. 한국어교육에서의 대화형 인공지능 챗봇 적용 가능성 탐색: 고급 한국어 학습자와 ChatGPT의 상호작용 분석. 우리어문연구, 76, 261-292. 10.15711/WR.76.0.9
박진, 이창균. 2023. 인공지능 기반의 말더듬 자동분류 방법: 합성곱신경망(CNN) 활용. 말소리와 음성과학, 15.4, 71-80. 10.13064/KSSS.2023.15.4.071
이선영, 김영주, 피트리 메우티아. 2022. 인도네시아인 학습자의 숙달도에 따른 한국어 치경폐쇄음 습득 연구. 응용언어학, 38.1, 89-122. 10.17154/kjal.2022.3.38.1.89
이진. 2025. 한국어교육에서 음성인식 기반 말하기 평가의 실제와 과제. 문법교육, 20.1, 1-22.
장흔이. 2024. 한국어 학습용 대화형 인공지능 챗봇 개선 방안 연구: 세종학당 AI 선생님(KSI)을 중심으로. 석사학위논문, 고려대학교 석사학위논문.
피트리 메우티아. 2024. 인도네시아인 학습자의 학습 기간에 따른 한국어 파열음 습득 연구. 응용언어학, 40.3, 83-112. 10.17154/kjal.2024.03.40.3.83
한국전자통신연구원. 2024. 발음평가 기술. ETRI https://epretx.etri.re.kr/apiDetail?id=98 (2025년 7월 13일에 인출).
한국지능정보사회진흥원. 2023. 교육용 아시아어(중 ‧ 일어 제외) 사용자의 한국어 음성 데이터. AI 허브. https://www.aihub.or.kr/aihubdata/data/view.do?dataSetSn=71479
허용, 강현화, 고명균, 김미옥, 김선정, 김재옥, 박동호. 2024. 외국어로서의 한국어 교육학 개론 (개정 3판). 하남: 박이정.
황효성. 2023. 베트남 한국어 학습자를 위한 한국어 자음 지각 훈련 연구. 말소리와 음성과학, 15.4, 17-26. 10.13064/KSSS.2023.15.4.017
Choi, H.-W. and Ko, G. 2023. Orthographic Influence on Resyllabification Errors by Vietnamese Learners of Korean: A Speech Corpus Study. Linguistic Research 40(Special Edition), 33-59. 10.17250/KHISLI.40..202309.002
Choi, Y. 2023. Intra‑ and Intersegmental Durational Compensation of Korean Plosives. In Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS), 2135-2139. International Phonetic Association.
Dai, Y. and Wu, Z. 2023. Mobile-Assisted Pronunciation Learning with feedback From Peers and/or Automatic Speech Recognition: A Mixed-Methods Study. Computer Assisted Language Learning 36.5-6, 861-884. 10.1080/09588221.2021.1952272
Evmenova, A. S., Regan, K., Mergen, R., and Hrisseh, R. 2024. Improving Writing Feedback for Struggling Writers: Generative AI to the Rescue? TechTrends 68, 790-802. 10.1007/s11528-024-00965-y
Feng, S., Kudina, O., Halpern, B. M., and Scharenborg, O. 2021. Quantifying Bias in Automatic Speech Recognition. arXiv, 2103.15122.
Flege, J. E. and Bohn, O.-S. 2021. The Revised Speech Learning Model (SLM-r). In R. Wayland (ed.), Second language speech learning: Theoretical and Empirical Progress. Cambridge, UK: Cambridge University Press, 3-83. 10.1017/9781108886901.002
Fu, K., Peng, L., Yang, N., and Zhou, S. 2024. Pronunciation Assessment with Multi-Modal Large Language Models. arXiv, 2407.09209.
Giavarina, D. 2015. Understanding Bland Altman Analysis. Biochemia Medica 25.2, 141-151. 10.11613/BM.2015.015 26110027 PMC4470095
Gong, X., Lv, A., Wang, Z., and Qian, Y. 2024. Contextual Biasing Speech Recognition in Speech-Enhanced Large Language Model. In Proceedings of Interspeech 2024 , 257-261. 10.21437/Interspeech.2024-965
Gong, Y., Chen, Z., Chu, I-H., Chang, P., and Glass, J. 2022. Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment. In ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing, 7262-7266. 10.1109/ICASSP43922.2022.9746743
Jahanbin, P. 2025. Modeling L1 Influence on L2 Pronunciation: An MFCC-based Framework for Explainable Machine Learning and Pedagogical Feedback. arXiv, 2504.13765.
John, P. and Frasnelli, J. 2022. On the Lexical Source of Variable L2 Phoneme Production. The Mental Lexicon 17.2, 239-276. 10.1075/ml.22002.joh
Karimov, A., Saarela, M., and Kärkkäinen, T. 2023. Clustering to Define Interview Participants for Analyzing Student Feedback: A Case of Legends of Learning. In Proceedings of the 16th International Conference on Educational Data Mining (p. 236). International Educational Data Mining Society.
Liang, Y., Song, K., Mao, S., Jiang, H., Qiu, L., Yang, Y., Li, D., Xu, L., and Qiu, L. 2023. End-to-End Word-Level Pronunciation Assessment with Mask Pre-Training. In Proceedings of Interspeech 2023, 969-973. 10.21437/Interspeech.2023-585
McCrocklin, S. 2019. Learners’ Feedback Regarding ASR-based Dictation Practice for Pronunciation Learning. CALICO Journal 36.2, 119-137. 10.1558/cj.34738
Microsoft. 2025a. Characteristics and Limitations of Pronunciation Assessment. Microsoft Learn. https://learn.microsoft.com/en-us/azure/ai-foundry/responsible-ai/speech-service/pronunciation-assessment/characteristics-and-limitations-pronunciation-assessment (Accessed July 3 2025).
Microsoft. 2025b. Use Pronunciation Assessment - Azure AI Speech Service. Microsoft Learn. https://learn.microsoft.com/ko-kr/azure/ai-services/speech-service/how-to-pronunciation-assessment (Accessed July 3 2025).
OpenAI. 2024a. GPT-4o System Card. https://cdn.openai.com/gpt-4o-system-card.pdf (Accessed July 6 2025).
OpenAI. 2024b. Hello GPT-4o. https://openai.com/index/hello-gpt-4o (Accessed July 6 2025).
OpenAI. 2024c. Introducing the Realtime API. OpenAI Blog. https://openai.com/index/introducing-the-realtime-api (Accessed July 6 2025).
OpenAI. n.d.-a. Advanced Voice Mode FAQ. OpenAI Help. https://help.openai.com/en/articles/9617425-advanced-voice-mode-faq (Accessed August 11 2025).
OpenAI. n.d.-b. Realtime API Guide. OpenAI Platform Documentation. https://platform.openai.com/docs/guides/realtime/overview (Accessed July 6 2025).
Park, S., Gupta, C., Kwan, M. K. Y., Fung, X., Yip, A. W., and Nanayakkara, S. 2025. Towards Temporally Explainable Dysarthric Speech Clarity Assessment. arXiv, 2506.00454. 10.21437/Interspeech.2025-777
Topal, I. H. 2024. Revisiting Fossilised Pronunciation Errors: An Exploratory Study with Turkish EFL teachers. International Journal of Language, Education and Applied Linguistics 14.2, 105-121. 10.15282/ijleal.v14i2.10840
Wang, K., He, L., Liu, K., Deng, Y., Wei, W., and Zhao, S. 2025. Exploring the Potential of large Multimodal Models as Effective Alternatives for Pronunciation Assessment. arXiv, 2503.11229.
Wei, X., Cucchiarini, C., van Hout, R., and Strik, H. 2023. Measuring Intelligibility in Non-Native Speech: The Usability of Automatically Extracted Acoustic-Phonetic Features. Proceedings of the 9th Workshop on Speech and Language Technology in Education (SLaTE 2023), 121-125. 10.21437/SLaTE.2023-23
Williams, A. 2024. Delivering Effective Student Feedback in Higher Education: An Evaluation of the Challenges and Best Practice. International Journal of Research in Education and Science 10.2, 473-501. 10.46328/ijres.3404
Wu, X., Bell, P., and Rajan, A. 2023. Explanations for Automatic Speech Recognition. arXiv, 2302.14062.
Zhang, Y. and Ai, J. 2024. Semantic-Weighted Word Error Rate based on BERT for Evaluating Automatic Speech Recognition Models. In Proceedings of the 11th International Conference on Dependable Systems and Their Applications (DSA), 189-198. 10.1109/DSA63982.2024.00034

Information

Publisher :The Modern Linguistic Society of Korea
Publisher(Ko) :한국현대언어학회
Journal Title :The Journal of Studies in Language
Journal Title(Ko) :언어연구
Volume : 41
No :3
Pages :245-278
DOI :https://doi.org/10.18627/jslg.41.3.202511.245

[1] 경나, 나희정, 박지은, 박정식. 2025. 비원어민 한국어 발음 평가를 위한 자기 지도 학습 기반 한국어 음소 인식. 말소리와 음성과학, 17.1, 51-61. 10.13064/KSSS.2025.17.1.051

[2] 김형민, 고현준. 2022. 한국어 학습자 발화에 대한 발음 평가 API와 한국어교육 전문가의 평가 비교. 이중언어학, 90, 29-52. 10.17296/KORBIL.2022..90.29

[3] 김형민. 2023. 한국어교육에서의 대화형 인공지능 챗봇 적용 가능성 탐색: 고급 한국어 학습자와 ChatGPT의 상호작용 분석. 우리어문연구, 76, 261-292. 10.15711/WR.76.0.9

[4] 박진, 이창균. 2023. 인공지능 기반의 말더듬 자동분류 방법: 합성곱신경망(CNN) 활용. 말소리와 음성과학, 15.4, 71-80. 10.13064/KSSS.2023.15.4.071

[5] 이선영, 김영주, 피트리 메우티아. 2022. 인도네시아인 학습자의 숙달도에 따른 한국어 치경폐쇄음 습득 연구. 응용언어학, 38.1, 89-122. 10.17154/kjal.2022.3.38.1.89

[6] 이진. 2025. 한국어교육에서 음성인식 기반 말하기 평가의 실제와 과제. 문법교육, 20.1, 1-22.

[7] 장흔이. 2024. 한국어 학습용 대화형 인공지능 챗봇 개선 방안 연구: 세종학당 AI 선생님(KSI)을 중심으로. 석사학위논문, 고려대학교 석사학위논문.

[8] 피트리 메우티아. 2024. 인도네시아인 학습자의 학습 기간에 따른 한국어 파열음 습득 연구. 응용언어학, 40.3, 83-112. 10.17154/kjal.2024.03.40.3.83

[9] 한국전자통신연구원. 2024. 발음평가 기술. ETRI https://epretx.etri.re.kr/apiDetail?id=98 (2025년 7월 13일에 인출).

[10] 한국지능정보사회진흥원. 2023. 교육용 아시아어(중 ‧ 일어 제외) 사용자의 한국어 음성 데이터. AI 허브. https://www.aihub.or.kr/aihubdata/data/view.do?dataSetSn=71479

[11] 허용, 강현화, 고명균, 김미옥, 김선정, 김재옥, 박동호. 2024. 외국어로서의 한국어 교육학 개론 (개정 3판). 하남: 박이정.

[12] 황효성. 2023. 베트남 한국어 학습자를 위한 한국어 자음 지각 훈련 연구. 말소리와 음성과학, 15.4, 17-26. 10.13064/KSSS.2023.15.4.017

[13] Choi, H.-W. and Ko, G. 2023. Orthographic Influence on Resyllabification Errors by Vietnamese Learners of Korean: A Speech Corpus Study. Linguistic Research 40(Special Edition), 33-59. 10.17250/KHISLI.40..202309.002

[14] Choi, Y. 2023. Intra‑ and Intersegmental Durational Compensation of Korean Plosives. In Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS), 2135-2139. International Phonetic Association.

[15] Dai, Y. and Wu, Z. 2023. Mobile-Assisted Pronunciation Learning with feedback From Peers and/or Automatic Speech Recognition: A Mixed-Methods Study. Computer Assisted Language Learning 36.5-6, 861-884. 10.1080/09588221.2021.1952272

[16] Evmenova, A. S., Regan, K., Mergen, R., and Hrisseh, R. 2024. Improving Writing Feedback for Struggling Writers: Generative AI to the Rescue? TechTrends 68, 790-802. 10.1007/s11528-024-00965-y

[17] Feng, S., Kudina, O., Halpern, B. M., and Scharenborg, O. 2021. Quantifying Bias in Automatic Speech Recognition. arXiv, 2103.15122.

[18] Flege, J. E. and Bohn, O.-S. 2021. The Revised Speech Learning Model (SLM-r). In R. Wayland (ed.), Second language speech learning: Theoretical and Empirical Progress. Cambridge, UK: Cambridge University Press, 3-83. 10.1017/9781108886901.002

[19] Fu, K., Peng, L., Yang, N., and Zhou, S. 2024. Pronunciation Assessment with Multi-Modal Large Language Models. arXiv, 2407.09209.

[20] Giavarina, D. 2015. Understanding Bland Altman Analysis. Biochemia Medica 25.2, 141-151. 10.11613/BM.2015.015 26110027 PMC4470095

[21] Gong, X., Lv, A., Wang, Z., and Qian, Y. 2024. Contextual Biasing Speech Recognition in Speech-Enhanced Large Language Model. In Proceedings of Interspeech 2024 , 257-261. 10.21437/Interspeech.2024-965

[22] Gong, Y., Chen, Z., Chu, I-H., Chang, P., and Glass, J. 2022. Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment. In ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing, 7262-7266. 10.1109/ICASSP43922.2022.9746743

[23] Jahanbin, P. 2025. Modeling L1 Influence on L2 Pronunciation: An MFCC-based Framework for Explainable Machine Learning and Pedagogical Feedback. arXiv, 2504.13765.

[24] John, P. and Frasnelli, J. 2022. On the Lexical Source of Variable L2 Phoneme Production. The Mental Lexicon 17.2, 239-276. 10.1075/ml.22002.joh

[25] Karimov, A., Saarela, M., and Kärkkäinen, T. 2023. Clustering to Define Interview Participants for Analyzing Student Feedback: A Case of Legends of Learning. In Proceedings of the 16th International Conference on Educational Data Mining (p. 236). International Educational Data Mining Society.

[26] Liang, Y., Song, K., Mao, S., Jiang, H., Qiu, L., Yang, Y., Li, D., Xu, L., and Qiu, L. 2023. End-to-End Word-Level Pronunciation Assessment with Mask Pre-Training. In Proceedings of Interspeech 2023, 969-973. 10.21437/Interspeech.2023-585

[27] McCrocklin, S. 2019. Learners’ Feedback Regarding ASR-based Dictation Practice for Pronunciation Learning. CALICO Journal 36.2, 119-137. 10.1558/cj.34738

[28] Microsoft. 2025a. Characteristics and Limitations of Pronunciation Assessment. Microsoft Learn. https://learn.microsoft.com/en-us/azure/ai-foundry/responsible-ai/speech-service/pronunciation-assessment/characteristics-and-limitations-pronunciation-assessment (Accessed July 3 2025).

[29] Microsoft. 2025b. Use Pronunciation Assessment - Azure AI Speech Service. Microsoft Learn. https://learn.microsoft.com/ko-kr/azure/ai-services/speech-service/how-to-pronunciation-assessment (Accessed July 3 2025).

[30] OpenAI. 2024a. GPT-4o System Card. https://cdn.openai.com/gpt-4o-system-card.pdf (Accessed July 6 2025).

[31] OpenAI. 2024b. Hello GPT-4o. https://openai.com/index/hello-gpt-4o (Accessed July 6 2025).

[32] OpenAI. 2024c. Introducing the Realtime API. OpenAI Blog. https://openai.com/index/introducing-the-realtime-api (Accessed July 6 2025).

[33] OpenAI. n.d.-a. Advanced Voice Mode FAQ. OpenAI Help. https://help.openai.com/en/articles/9617425-advanced-voice-mode-faq (Accessed August 11 2025).

[34] OpenAI. n.d.-b. Realtime API Guide. OpenAI Platform Documentation. https://platform.openai.com/docs/guides/realtime/overview (Accessed July 6 2025).

[35] Park, S., Gupta, C., Kwan, M. K. Y., Fung, X., Yip, A. W., and Nanayakkara, S. 2025. Towards Temporally Explainable Dysarthric Speech Clarity Assessment. arXiv, 2506.00454. 10.21437/Interspeech.2025-777

[36] Topal, I. H. 2024. Revisiting Fossilised Pronunciation Errors: An Exploratory Study with Turkish EFL teachers. International Journal of Language, Education and Applied Linguistics 14.2, 105-121. 10.15282/ijleal.v14i2.10840

[37] Wang, K., He, L., Liu, K., Deng, Y., Wei, W., and Zhao, S. 2025. Exploring the Potential of large Multimodal Models as Effective Alternatives for Pronunciation Assessment. arXiv, 2503.11229.

[38] Wei, X., Cucchiarini, C., van Hout, R., and Strik, H. 2023. Measuring Intelligibility in Non-Native Speech: The Usability of Automatically Extracted Acoustic-Phonetic Features. Proceedings of the 9th Workshop on Speech and Language Technology in Education (SLaTE 2023), 121-125. 10.21437/SLaTE.2023-23

[39] Williams, A. 2024. Delivering Effective Student Feedback in Higher Education: An Evaluation of the Challenges and Best Practice. International Journal of Research in Education and Science 10.2, 473-501. 10.46328/ijres.3404

[40] Wu, X., Bell, P., and Rajan, A. 2023. Explanations for Automatic Speech Recognition. arXiv, 2302.14062.

[41] Zhang, Y. and Ai, J. 2024. Semantic-Weighted Word Error Rate based on BERT for Evaluating Automatic Speech Recognition Models. In Proceedings of the 11th International Conference on Dependable Systems and Their Applications (DSA), 189-198. 10.1109/DSA63982.2024.00034

The Journal of Studies in Language ISSN:1225-4770(Print) 2671-6151(Online) 언어연구

All Issue