ChatGPT-related Technology Neologism Extraction and Word Embedding in Paper Abstracts

doi:10.18627/jslg.40.2.202408.109

All Issue

2024 Vol.40, Issue 2 Next Page

Research Article

ChatGPT-related Technology Neologism Extraction and Word Embedding in Paper Abstracts 챗GPT 관련 논문 요약문에서 기술 신조어 추출과 단어 임베딩: 구유선¹, 이병희^2*
Koo, Yuson¹, Lee, Byeong-Hee^2*; ¹충남대학교
²과학기술연합대학원대학교

¹Chungnam National University
²University of Science and Technology

31 August 2024. pp. 109-125

PDF

Abstract

This study aims to analyze the recent innovative changes in LLMs related to ChatGPT through neologism extraction and word embedding analysis. To this end, this paper collects 26,010 abstracts of international academic papers related to ChatGPT from November 2022 to April 2024. It extracts and analyzes the neologisms appearing in these abstracts by focusing on acronyms. Acronyms are frequently used in scientific texts such as AI or ChatGPT. The paper also analyzes the characteristics of word embedding, which quantifies the semantic similarity of words in the abstracts and seeks to understand and utilize how new concepts and terms are introduced and spread in the field of natural language processing. This research is one proper methodology for knowledge discovery.

Keywords

ChatGPT

LLM (Large Language Model)

acronym

neologism extraction

word embedding

References

관계부처합동. 2023. 전국민 AI 일상화 실행계획. 세종: 관계부처합동.
나혜인, 이병희. 2023. 국가R&D정보서비스 고도화를 위한 신문기사를 이용한 ChatGPT 도입 PEST 분석과 텍스트마이닝. 한국정보기술학회논문지, 21.10, 171-184. 10.14801/jkiit.2023.21.10.171
나혜인, 이병희. 2024. Reddit 소셜미디어를 활용한 ChatGPT에 대한 사용자의 감정 및 요구 분석. 한국인터넷정보학회논문지, 25.2, 79-92.
홍영예, 윤영은, 백미현, 오은진, 채서영, 이화연. 2003. 영어학의 이해. 서울: 한국문화사.
Brandl, S., Lassner, D., Baillot, A., and Nakajima, S. 2023. Domain-Specific Word Embeddings with Structure Prediction. Transactions of the Association for Computational Linguistics 11, 320-335. 10.1162/tacl_a_00538
Caliskan, A., Ajay, P. P., Charlesworth, T., Wolfe, R., and Banaji, M. R. 2022. Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 156-170. 10.1145/3514094.3534162 34730715
Iseni, A. and Ejupi, S. 2023. Most Common Neologisims which Emerged during and in the Wake of COVID-19. Anglisticum Journal of the Association-Institute for English Language and American Studies 12.7, 32-39. 10.58885/ijllis.v12i7.32ai
Katamba, F. 2005. English Words. London: Routledge.
Kwak, M. 2024. Social Images Reflected through English and Korean Neologisms in the New Normal Era. The Journal of Studies in Language 39.4, 431-445.
Lejeune, G. and Cartier, E. 2017. Character Based Pattern Mining for Neology Detection. In Proceedings of the First Workshop on Subword and Character Level Models in NLP, 25-30, Copenhagen: Denmark. 10.18653/v1/W17-4103
Plag, I. 2003. Word-Formation in English. Cambridge: Cambridge University Press. 10.1017/CBO9780511841323 14618359
Shang, J., Liu, J., Jiang, M., Ren, X., Voss, C. R., and Han, J. 2018. Automated Phrase Mining from Massive Text Corpora. IEEE Transactions on Knowledge and Data Engineering 30.10. 10.1109/TKDE.2018.2812203 31105412 PMC6519941
Sullivan, C. 2020. Clustering Words from Biased Contexts using Dimensionality Reduction. MA Thesis. Seoul National University.
Veyseh, A. P. B., Meister, N., Dernoncourt, F., and Nguyen, T. H. 2022. Acronym Extraction and Acronym Disambiguation Shared Tasks at the Scientific Document Understanding Workshop. The second workshop on Scientific Document Understanding at AAAI. 1-6.

Information

Publisher :The Modern Linguistic Society of Korea
Publisher(Ko) :한국현대언어학회
Journal Title :The Journal of Studies in Language
Journal Title(Ko) :언어연구
Volume : 40
No :2
Pages :109-125
DOI :https://doi.org/10.18627/jslg.40.2.202408.109

[1] 관계부처합동. 2023. 전국민 AI 일상화 실행계획. 세종: 관계부처합동.

[2] 나혜인, 이병희. 2023. 국가R&D정보서비스 고도화를 위한 신문기사를 이용한 ChatGPT 도입 PEST 분석과 텍스트마이닝. 한국정보기술학회논문지, 21.10, 171-184. 10.14801/jkiit.2023.21.10.171

[3] 나혜인, 이병희. 2024. Reddit 소셜미디어를 활용한 ChatGPT에 대한 사용자의 감정 및 요구 분석. 한국인터넷정보학회논문지, 25.2, 79-92.

[4] 홍영예, 윤영은, 백미현, 오은진, 채서영, 이화연. 2003. 영어학의 이해. 서울: 한국문화사.

[5] Brandl, S., Lassner, D., Baillot, A., and Nakajima, S. 2023. Domain-Specific Word Embeddings with Structure Prediction. Transactions of the Association for Computational Linguistics 11, 320-335. 10.1162/tacl_a_00538

[6] Caliskan, A., Ajay, P. P., Charlesworth, T., Wolfe, R., and Banaji, M. R. 2022. Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 156-170. 10.1145/3514094.3534162 34730715

[7] Iseni, A. and Ejupi, S. 2023. Most Common Neologisims which Emerged during and in the Wake of COVID-19. Anglisticum Journal of the Association-Institute for English Language and American Studies 12.7, 32-39. 10.58885/ijllis.v12i7.32ai

[8] Katamba, F. 2005. English Words. London: Routledge.

[9] Kwak, M. 2024. Social Images Reflected through English and Korean Neologisms in the New Normal Era. The Journal of Studies in Language 39.4, 431-445.

[10] Lejeune, G. and Cartier, E. 2017. Character Based Pattern Mining for Neology Detection. In Proceedings of the First Workshop on Subword and Character Level Models in NLP, 25-30, Copenhagen: Denmark. 10.18653/v1/W17-4103

[11] Plag, I. 2003. Word-Formation in English. Cambridge: Cambridge University Press. 10.1017/CBO9780511841323 14618359

[12] Shang, J., Liu, J., Jiang, M., Ren, X., Voss, C. R., and Han, J. 2018. Automated Phrase Mining from Massive Text Corpora. IEEE Transactions on Knowledge and Data Engineering 30.10. 10.1109/TKDE.2018.2812203 31105412 PMC6519941

[13] Sullivan, C. 2020. Clustering Words from Biased Contexts using Dimensionality Reduction. MA Thesis. Seoul National University.

[14] Veyseh, A. P. B., Meister, N., Dernoncourt, F., and Nguyen, T. H. 2022. Acronym Extraction and Acronym Disambiguation Shared Tasks at the Scientific Document Understanding Workshop. The second workshop on Scientific Document Understanding at AAAI. 1-6.

The Journal of Studies in Language ISSN:1225-4770(Print) 2671-6151(Online) 언어연구

All Issue