All Issue

2021 Vol.37, Issue 2

Research Article

31 August 2021. pp. 127-148
Abstract
This study focuses on the frequencies of basic words in Korean and English. Their distributions were analyzed quantitatively to compare Korean and English. The correlation analysis shows that the inter-correlation between Korean and English is very high. It implies that the linguistic-cultural difference between these two languages rarely affects the use of basic words. The analysis of visualized frequency distributions shows that the frequencies of basic words exhibit heavy tailed distributions and seem to follow the power-law distribution in both languages. However, the quantitative analysis tells us that their frequency distributions do not follow the power-law distribution but the log-normal distribution. The result suggests that selecting probability distribution best fitted to frequency data should depend on well-established statistical procedures, but not only on visually inspecting of plotted distributions.
References
  1. 강범모, 김흥규. 2009. 『한국어 사용 빈도』(CD 포함). 서울: 한국문화사.
  2. 국립국어원. 2005. 『현대 국어 사용 빈도 조사 2』. 서울: 국립국어원.
  3. Baayen, R. H. 2001. Word Frequency Distributions. Dordrecht: Springer. 10.1007/978-94-010-0844-0
  4. Calude, A. S. and M. Pagel. 2011. How Do We Use Language? Shared Patterns in the Frequency of Word Use across 17 World Languages. Philosophical Transactions of the Royal Society B: Biological Sciences 366.1567, 1101-1107. 10.1098/rstb.2010.0315 21357232 PMC3049087
  5. Clauset, A., C. R. Shalizi, and M. E. J. Newman. 2009. Power-law Distributions in Empirical Data. SIAM Rev 51. 661-703. 10.1137/070710111
  6. Davies, M. 2008. The Corpus of Contemporary American English (COCA). Available online at https://www.english-corpora.org/coca/.
  7. Gillespie, C. S. 2014. Fitting Heavy Tailed Distributions: The PoweRlaw Package. arXiv:1407.3492 [stat.CO]. Available at: http://arxiv.org/abs/1407.3492
  8. Gillespie, C. S. 2020. The PoweRlaw Package: Examples. Available at: cran.rstudio.org/web/packages/poweRlaw/ vignettes/b_powerlaw_examples.pdf.
  9. Gries, S. T. 2009. Statistics for Linguistics with R: A Practical Introduction (최재웅, 홍정하 옮김. 2013). 『언어학자를 위한 통계학』. 서울: 고려대학교 출판부. 10.1515/9783110307474
  10. Jäger, G. 2012. Power Laws and Other Heavy-tailed Distributions in Linguistic Typology. Advances in Complex Systems 15.3. 1-21. 10.1142/S0219525911500196
  11. Kilgarriff, A. 1995. BNC Database and Word Frequency Lists. http://www.kilgarriff.co.uk/bnc-readme.html.
  12. Macklin-Cordes, J. L. and E. R. Round. 2020. Re-evaluating Phoneme Frequencies. Frontiers in Psychology 11, 1-17. 10.3389/fpsyg.2020.570895 33329209 PMC7714923
  13. Malevergne, Y., V. Pisarenko, and D. Sornette. 2011. Testing the Pareto against the Lognormal Distributions with the Uniformly Most Powerful Unbiased Test Applied to the Distribution of Cities. Physical Review E 83. 036111 (1-11). 10.1103/PhysRevE.83.036111 21517562
  14. Mandelbrot, B. 1954. Structure Formelle des Textes et Communication. Word 10.1, 1-27. 10.1080/00437956.1954.11659509
  15. Martindale, C., S. M. Gusein-Zade, D. McKenzie, and M. Y. Borodovsky. 1996. Comparison of Equations Describing the Ranked Frequency Distributions of Graphemes and Phonemes. Journal of Quantitative Linguistics 3.2, 106-112. 10.1080/09296179608599620
  16. Newman, M. E. J. 2005. Power Laws, Pareto Distributions and Zipf ’s Law. Contemporary Physics 46.5, 323-351. 10.1080/00107510500052444
  17. Pagel, M., Q. Atkinson, and A. Meade. 2007. Frequency of Word-use Predicts Rates of Lexical Evolution throughout Indo-European History. Nature 449, 717-720. 10.1038/nature06176 17928860
  18. Piantadosi, S. T. 2014. Zipf’s Word Frequency Law in Natural Language: A Critical Review and Future Directions. Psychonomic Bulletin and Review 21.5, 1112-1130. 10.3758/s13423-014-0585-6 24664880 PMC4176592
  19. R Development Core Team. 2019. R: A Language and Environment for Statistical Computing (Version 3.6.0). http://www.r-project.org.
  20. Stumpf, M. P. H. and M. A. Porter. 2012. Critical Truths about Power Laws. Science 335, 665-666. 10.1126/science.1216142 22323807
  21. Swadesh, M. 1952. Lexicostatistic Dating of Prehistoric Ethnic Contacts: with Special Reference to North American Indians and Eskimos. Proceedings of the American Philosophical Society 96.4, 452-463.
  22. Swadesh, M. 1955. Towards Greater Accuracy in Lexicostatistic Dating. International Journal of American Linguistics 21.2, 121-137. 10.1086/464321
  23. Tambovtsev, Y. and C. Martindale. 2007. Phoneme Frequencies Follow a Yule Distribution. SKASE Journal of Theoretical Linguistics 4, 1-11.
  24. The British National Corpus, version 3 (BNC XML Edition). 2007. Distributed by Bodleian Libraries, University of Oxford, on behalf of the BNC Consortium. URL: http://www.natcorp.ox.ac.uk/.
  25. Vuong, Q. H. 1989. Likelihood Ratio Tests for Model Selection and Non-nested Hypotheses. Econometrica 57, 307-333. 10.2307/1912557
  26. Wiktionary (https://en.wiktionary.org/wiki/Appendix:Korean_Swadesh_list)
  27. Zipf, G. K. 1936. The Psychobiology of Language. London: Routledge.
  28. Zipf, G. K. 1949. Human Behavior and the Principle of Least Effort. Reading, MA: Addison-Wesley.
Information
  • Publisher :The Modern Linguistic Society of Korea
  • Publisher(Ko) :한국현대언어학회
  • Journal Title :The Journal of Studies in Language
  • Journal Title(Ko) :언어연구
  • Volume : 37
  • No :2
  • Pages :127-148