All Issue

2021 Vol.37, Issue 3 Preview Page

Research Article

30 November 2021. pp. 291-305
Both tree models and logistic regression models are widely used to analyze multifactorial data in recent corpus studies. Using my previous corpus study on relative clauses, this paper argues that tree models have difficulties dealing with the integrated effect of multiple linguistic factors, that is, a three-way interaction of non-syntactic factors that affect the preference of relative clause types. The integrated interaction effect cannot be captured by adding interaction terms in a logistic regression model but by suppressing an intercept and creating a single variable that is the combination of all three factors. A mixed-effects logistic regression analysis is ultimately implemented by adding the random effect of register, which has been ignored in the corpus linguistics literature on relative clauses.
  1. Ai, C. and E. C. Norton. 2003. Interaction Terms in Logit and Probit Models. Economics Letters 80.1, 123-129. 10.1016/S0165-1765(03)00032-6
  2. Aissen, J. 1999. Markedness and Subject Choice in Optimality Theory. Natural Language and Linguistic Theory 17.4, 673-711. 10.1023/A:1006335629372
  3. Aissen, J. 2003. Differential Object Marking: Iconicity vs. Economy. Natural Language and Linguistic Theory 21.3, 435-483. 10.1023/A:1024109008573
  4. Albert, A. and J. Anderson. 1984. On the Existence of Maximum Likelihood Estimates in Logistic Regression Models. Biometrika 71.1, 1-10. 10.1093/biomet/71.1.1
  5. Barth, D. and V. Kapatsinski. 2018. Evaluating Logistic Mixed-effects Models for Corpus-linguistic Data in Light of Lexical Diffusion. In D. Speelman, K. Heylen, and D. Geeraets (eds), Mixed-effects Regression Models in Linguistics. Springer. 99-116. 10.1007/978-3-319-69830-4_6
  6. Comrie, B. 1989. Language Universals and Linguistic Typology. Chicago: University of Chicago Press.
  7. Eddington, D. 2010. A Comparison of Two Tools for Analyzing Linguistic Data: Logistic Regression and Decision Trees. Italian Journal of Linguistics 22.2, 265-286.
  8. Fox, J., W. Sanford, B. Price, J. Hong, R. Anderson, D. Firth, S. Taylor, and the R Core Team. 2020. Effect Displays for Linear, Generalized Linear, and Other Models. UTC.
  9. Gennari, S. and M. MacDonald. 2008. Semantic Indeterminacy in Object Relative Clauses. Journal of Memory and Language 58.2, 161-187. 10.1016/j.jml.2007.07.004 10.1016/j.jml.2007.07.004 PMC2735264
  10. Gennari, S. and M. MacDonald. 2009. Linking Production and Comprehension Processes: The Case of Relative Clauses. Cognition 111.1, 1-23. 10.1016/j.cognition.2008.12.006 19215912
  11. Gordon, P., R. Hendrick, and M. Johnson. 2001. Memory Interference during Language Processing. Journal of Experimental Psychology: Learning, Memory, and Cognition 27.6, 1411-1423. 10.1037/0278-7393.27.6.1411
  12. Gordon, P., R. Hendrick, and M. Johnson. 2004. Effects of Noun Phrase Type on Sentence Complexity. Journal of Memory and Language 51.1, 97-114. 10.1016/j.jml.2004.02.003
  13. Gordon, P. and R. Hendrick, 2005. Relativization, Ergativity, and Corpus Frequency. Linguistic Inquiry 36, 456-463. 10.1162/0024389054396953
  14. Gries, S. 2015. The Most Under-used Statistical Method in Corpus Linguistics: Multi-Level (and mixed-effects) Models. Corpora 10.1, 95-125. 10.3366/cor.2015.0068
  15. Gries, S. 2020. On Classification Trees and Random Forests in Corpus Linguistics: Some Words of Caution and Suggestions for Improvement. Corpus Linguistics and Linguistic Theory 16.3, 617-647. 10.1515/cllt-2018-0078
  16. Gries, S. 2021. (Generalized linear) Mixed-effects Modeling: A Learner Corpus Example. Language Learning 1-42. 10.1111/lang.12448
  17. Hörberg, T. 2018. Functional Motivations Behind Direct Object Fronting in Written Swedish: A Corpus-Distributional Account. Glossa 3.1, 81. 10.5334/gjgl.502
  18. Jenset, G., B. McGillivray, and M. Rundell. 2018. The English Dative Alternation Revisited: Fresh Insights from Contemporary British Spoken Data. In V. Brezina, R. Love, and K. Aijmer (eds.), Corpus Approaches to Contemporary British Speech: Sociolinguistic Studies of the Spoken BNC 2014. London: Routledge, 185-207. 10.4324/9781315268323-10
  19. Just, M. and P. Carpenter. 1992. A Capacity Theory of Comprehension: Individual Differences in Working Memory Capacity. Psychological Review 99.1, 122-149. 10.1037/0033-295X.99.1.122 1546114
  20. King, J. and M. Just. 1991. Individual Differences in Syntactic Processing: The Role of Working Memory. Journal of Memory and Language 30.5, 580-602. 10.1016/0749-596X(91)90027-H
  21. Levy, R. 2012. Probabilistic Methods in Linguistics. Lecture 14: Logistic regression. Manuscript. UC San Diego.
  22. Manning, C. 2007. Logistic regression (with R). Manuscript. Stanford University.
  23. Norton, E. C., H. Wang, and C. Ai. 2004. Computing Interaction Effects and Standard Errors in Logit and Probit Models. Stata Journal 4.2, 154-167. 10.1177/1536867X0400400206
  24. R Core Team. 2021. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. URL:
  25. Reali, F. and M. Christiansen. 2007. Processing of Relative Clauses is Made Easier by Frequency of Occurrence. Journal of Memory and Language 57, 1-23. 10.1016/j.jml.2006.08.014
  26. Roland, D., F. Dick, and J. Elman. 2007. Frequency of Basic English Grammatical Structures: A Corpus Analysis. Journal of Memory and Language 57, 348-379. 10.1016/j.jml.2007.03.002 19668599 PMC2722756
  27. Roland, D., G. Mauner C., O’Meara, and H. Yun. 2012. Discourse Expectations and Relative Clause Processing. Journal of Memory and Language 66, 479-508. 10.1016/j.jml.2011.12.004
  28. Shin, K. 2019. An Expectation-Based Account for the Processing Difficulty of it Object-Extracted Relative Clauses. Korean Journal of Linguistics 44.4, 807-829. 10.18855/lisoko.2019.44.4.006
  29. Shin, K. 2020a. Some Remarks on Gries’s Criticism on a Tree-Based Approach of Multifactorial Data. Language and Information 24.1, 15-28. 10.29403/LI.24.1.2
  30. Shin, K. 2020b. Non-linear Interactions of Factors Influencing Relative Clasue Distribution and Their Implications on Relative Clause Processing. Korean Journal of Linguistics 45.1, 919-940.
  31. Silverstein, M. 1976. Hierarchy of Features and Ergativity, In R. Dixon (ed.), Grammatical Categories in Australian Languages. Canberra: Australian Institute of Aboriginal Studies, 112-171.
  32. Traxler, M., R. Morris, and R. Seely. 2002. Processing Subject and Object Relative Clauses: Evidence from Eye Movements. Journal of Memory and Language 47.1, 69-90. 10.1006/jmla.2001.2836
  33. Traxler, M., R. Williams, S. Blozis, and R. Morris. 2005. Working Memory, Animacy, and Verb Class in the Processing of Relative Clauses. Journal of Memory and Language 53.2, 204-224. 10.1016/j.jml.2005.02.010
  • Publisher :The Modern Linguistic Society of Korea
  • Publisher(Ko) :한국현대언어학회
  • Journal Title :The Journal of Studies in Language
  • Journal Title(Ko) :언어연구
  • Volume : 37
  • No :3
  • Pages :291-305