Analyzing and Embedding Multi-Word Expressions and Acronyms in ChatGPT-related Paper Abstracts

doi:10.18627/jslg.41.2.202508.109

All Issue

2025 Vol.41, Issue 2 Next Page

Research Article

Analyzing and Embedding Multi-Word Expressions and Acronyms in ChatGPT-related Paper Abstracts 챗GPT 관련 논문 요약문에 나타난 다단어 표현과 두문자어의 분석 및 임베딩 연구: 구유선¹, 이병희^2*
Koo, Yuson¹, Lee, Byeong-Hee^1*; ¹충남대학교
²과학기술연합대학원대학교

¹Chungnam National University
²University of Science and Technology

31 August 2025. pp. 109-126

PDF

Abstract

In the field of computational linguistics, this study aims to investigate how multi-word expressions, which play an important role in disambiguating text meaning and effectively communicating a paper’s content, are represented as word embeddings in large-scale language models (LLMs) and utilized in context. Additionally, we identify multi-word expressions (MWEs) such as “artificial intelligence” and “machine learning” through phrase mining in scientific papers, extract acronyms such as “AI” and “GPT”, and analyze the characteristics of technical terms using MWE word embeddings. To achieve this, this study not only collects 41,230 abstracts from recent international academic papers related to ChatGPT but also extracts and analyzes the MWEs in these abstracts. In addition, it extends the application of word embeddings for MWEs in natural language processing and seeks to understand the process of integrating natural language processing technology with the workings of generative AI.

Keywords

LLM

Multi-word Expression

Phrase Mining

Acronym

Word-embedding

References

구유선, 이병희. 2024. 챗GPT 관련 요약문에서 기술 신조어 추출과 단어 임베딩. 언어연구, 40.2, 109-125.
김영광. 2021. 영화를 이용한 다단어 표현(Multi-word Expressions) 자각력 향상 연구. 영상영어교육, 22.1, 39-63. 10.16875/stem.2021.22.1.39
나혜인, 이병희. 2023. 국가R&D정보서비스 고도화를 위한 신문기사를 이용한 ChatGPT 도입 PEST 분석과 텍스트마이닝. 한국정보기술학회논문지, 21.10, 171-184. 10.14801/jkiit.2023.21.10.171
전윤식. 2024. 우리 기업이 주목할 만한 2024년 글로벌 기술 트렌드 전망: AI Everywhere All at Once. 한국무역협회, Trade Focus, 11.
Brandl, S., Lassner, D., Baillot, A., and Nakajima, S. 2023. Domain-Specific Word Embeddings with Structure Prediction. Transactions of the Association for Computational Linguistics 11, 320-335. 10.1162/tacl_a_00538
Hariri, W. 2023. Unlocking the Potential of ChatGPT: A Comprehensive Exploration of Its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing.
Li, S., Zhang, X., and Wang, J. 2024. A Novel Optimization Scheme for Named Entity Recognition with Pre-trained Language Models. Journal of Electronic Research and Application 8.5, 125-133. 10.26689/jera.v8i5.8402
Martínez, G., Molero, J. D., González, S., Conde, J., Brysbaert, M., and Reviriego, P. 2024. Using Large Language Models to Estimate Features of Multi-word Expressions: Concreteness, Valence, Arousal. Behavior Research Methods 57.5. 10.3758/s13428-024-02515-z
Masini, F. 2019. Multi-word Expressions and Morphology. Oxford Research Encyclopedia of Linguistics. Oxford: Oxford University Press. 10.1093/acrefore/9780199384655.013.611
Shang, J., Liu, J., Jiang, M., Ren, X., Voss, C. R., and Han, J. 2018. Automated Phrase Mining from Massive Text Corpora. IEEE Transactions on Knowledge and Data Engineering 30.10, 1825-1837. 10.1109/TKDE.2018.2812203 31105412 PMC6519941
Veyseh, A. P. B., Meister, N., Dernoncourt, F., and Nguyen, T. H. 2022. Acronym Extraction and Acronym Disambiguation Shared Tasks at the Scientific Document Understanding Workshop.
Wu, L., Yen, I. E., Xu, K., Xu, F., Balakrishnan, A., Chen, P. Y., and Witbrock, M. J. 2018. Word Mover’s Embedding: From Word2vec to Document Embedding. 10.18653/v1/D18-1482

Information

Publisher :The Modern Linguistic Society of Korea
Publisher(Ko) :한국현대언어학회
Journal Title :The Journal of Studies in Language
Journal Title(Ko) :언어연구
Volume : 41
No :2
Pages :109-126
DOI :https://doi.org/10.18627/jslg.41.2.202508.109

[1] 구유선, 이병희. 2024. 챗GPT 관련 요약문에서 기술 신조어 추출과 단어 임베딩. 언어연구, 40.2, 109-125.

[2] 김영광. 2021. 영화를 이용한 다단어 표현(Multi-word Expressions) 자각력 향상 연구. 영상영어교육, 22.1, 39-63. 10.16875/stem.2021.22.1.39

[3] 나혜인, 이병희. 2023. 국가R&D정보서비스 고도화를 위한 신문기사를 이용한 ChatGPT 도입 PEST 분석과 텍스트마이닝. 한국정보기술학회논문지, 21.10, 171-184. 10.14801/jkiit.2023.21.10.171

[4] 전윤식. 2024. 우리 기업이 주목할 만한 2024년 글로벌 기술 트렌드 전망: AI Everywhere All at Once. 한국무역협회, Trade Focus, 11.

[5] Brandl, S., Lassner, D., Baillot, A., and Nakajima, S. 2023. Domain-Specific Word Embeddings with Structure Prediction. Transactions of the Association for Computational Linguistics 11, 320-335. 10.1162/tacl_a_00538

[6] Hariri, W. 2023. Unlocking the Potential of ChatGPT: A Comprehensive Exploration of Its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing.

[7] Li, S., Zhang, X., and Wang, J. 2024. A Novel Optimization Scheme for Named Entity Recognition with Pre-trained Language Models. Journal of Electronic Research and Application 8.5, 125-133. 10.26689/jera.v8i5.8402

[8] Martínez, G., Molero, J. D., González, S., Conde, J., Brysbaert, M., and Reviriego, P. 2024. Using Large Language Models to Estimate Features of Multi-word Expressions: Concreteness, Valence, Arousal. Behavior Research Methods 57.5. 10.3758/s13428-024-02515-z

[9] Masini, F. 2019. Multi-word Expressions and Morphology. Oxford Research Encyclopedia of Linguistics. Oxford: Oxford University Press. 10.1093/acrefore/9780199384655.013.611

[10] Shang, J., Liu, J., Jiang, M., Ren, X., Voss, C. R., and Han, J. 2018. Automated Phrase Mining from Massive Text Corpora. IEEE Transactions on Knowledge and Data Engineering 30.10, 1825-1837. 10.1109/TKDE.2018.2812203 31105412 PMC6519941

[11] Veyseh, A. P. B., Meister, N., Dernoncourt, F., and Nguyen, T. H. 2022. Acronym Extraction and Acronym Disambiguation Shared Tasks at the Scientific Document Understanding Workshop.

[12] Wu, L., Yen, I. E., Xu, K., Xu, F., Balakrishnan, A., Chen, P. Y., and Witbrock, M. J. 2018. Word Mover’s Embedding: From Word2vec to Document Embedding. 10.18653/v1/D18-1482

The Journal of Studies in Language ISSN:1225-4770(Print) 2671-6151(Online) 언어연구

All Issue