Natural Syntax, Artificial Intelligence and Language Acquisition
Abstract
:1. Introduction
The key point that we want to make is that the output of [large language models] is, almost without exception, grammatical. Even when examining experiments that are designed to exhibit some of the model’s shortcomings …, one cannot help but notice that the content is grammatically correct….[1] (p. 3)
[LLMs] are trained only on text prediction. This means that the models form probabilistic expectations about the next word in a text and they use the true next word as an error signal to update their latent parameters.[2] (p. 5)
A learner makes predictions. If these predictions are not borne out, the difference between the predicted and actual input serves as implicit corrective feedback, pushing the learner more towards the target language.[4] (p. 4)
… if a learner can use his current knowledge of the language and the context to predict how the sentence will unfold, then the comparison of what he expects with what actually occurs could provide valuable additional information.[5]
[The model] learns from its predictions at each word position. Incremental prediction increases the amount of information that can be used for learning and focuses learning on the particular representations that made the incorrect prediction. The prediction-based approach is thus a potential solution to the problem of learning from sparse input.[6] (p. 263)
… crucial components of our tacit linguistic knowledge are not learned through experience but are given by our biological/genetic specifications [Universal Grammar].[7] (p. 151)
The success of large language models is a failure for generative theories because it goes against virtually all of the principles these theories have espoused. In fact, none of the principles and innate biases that Chomsky and those who work in his tradition have long claimed necessary needed to be built into these models.[2] (pp. 14–15)
We suggest that the most recent generation of Large Language Models (LLMs) might finally provide the computational tools to determine empirically how much of the human language ability can be acquired from linguistic experience.[1] (p. 1)
Rapid progress in machine learning for natural language processing has the potential to transform debates about how humans learn language.[8] (p. 18)
2. Natural Syntax
The advantages of cost minimization are both obvious and widely acknowledged.The algorithms that map form onto meaning (and vice versa) are shaped by forces that seek to minimize processing costs.
Speed in communicating the intended message from speaker to hearer and minimal processing effort in doing so are the two driving forces of efficiency…
… the [processor] should operate in the most efficient manner possible, promptly resolving dependencies so that they do not have to be held any longer than necessary. This is a standard assumption in work on processing, where it is universally recognized that sentences are built in real time under conditions that favor quickness.[13] (p. 7)
Efficiency in communication means that successful communication can be achieved with minimal effort on average by the sender and receiver.[14]
… the avoidance [and minimization] of cognitive demand is a fundamental principle of cognition…
A spate of recent work demonstrates that humans seek to avoid the expenditure of cognitive effort, much like physical effort …[17] (p. 92)
The Syntax of Negation
Jane didn’t buy a Tesla. (=‘There was no buying event involving Jane and a Tesla.’)
Jane bought nothing.
- Double Negation
- Once again, you did nothing.
- I didn’t do nothing—I washed the car!(not nothing = something)
- Negative Concord
KoreanKunye-nun amwukesto ha-ci anh-ass-ta.she-top nothing do-comp not-past-decl‘She did nothing.’
Middle EnglishTher nys nat oon kan war by other be.There not-is not one can aware by other be=‘There is no one who can be warned by another.’(Chaucer: Troilus 1, 203; cited by [21] (p. 86))
Non-standard Modern EnglishDon’t blame me. I didn’t do nothing wrong.=‘I did nothing wrong.’/‘I didn’t do anything wrong.’
- The sentential negative signals the non-occurrence of an event, just as it does in sentences such as I didn’t go and their equivalent in other languages;
- The negative pronoun has the same null-set interpretation that it has in stand-alone contexts (e.g., What did you see? Nothing!).
3. The Acquisition of Negation
3.1. An Early Preference for Negative Concord
Samples of negative concord in spontaneous speech by Adam:I didn’t do nothing. (file 63, age 3;5)I didn’t call him nothing. (file 72, age 3;8)Because nobody didn’t broke it. (file 107; age 4;5)
The girl who skipped didn’t buy nothing.
3.2. A Natural Explanation
3.3. A Proposal for Future Research
Step 1: In the relatively early stages of acquisition (ages 3 to 5), utterances containing a sentential negative and a negative pronoun have a negative concord interpretation. (That is, a sentence such as ‘I didn’t do nothing’ is incorrectly used to mean ‘I did nothing.’)
Step 2: At a later stage, utterances containing a sentential negative and a negative pronoun have the correct double-negative interpretation. (That is, a sentence such as ‘I didn’t do nothing’ is used to mean ‘I did something.’)
… improving our ability to train [language models] on the same types and quantities of data that humans learn from will give us greater access to more plausible cognitive models of humans and help us understand what allows humans to acquire language so efficiently.[30] (p. 1)
… simulating the first few years of child language acquisition in its entirety is a realistic goal for an ambitious and well-funded research team.[31] (p. 4.2)
- A set that simulates the input for 4-year-olds, containing between 8 million and 28 million words (the minimum and maximum for that age according to [32]’s estimate);
- A set that simulates the input for 12-year-olds, containing between 24 million and 84 million words (the estimated minimum and maximum for that age).
- The Type A language model accurately mirrors children’s two-stage acquisition process, with the data set for 4-year-olds, leading to the generation of negative concord patterns and the data set for older children correcting that tendency;
- The Type A model fails, but the Type B language model, with its processing cost bias, accurately mirrors the two-stage process that we see in child language acquisition. (For now, we leave to the side the technical issue of precisely how a bias for processing cost should be incorporated into the language model.)
4. Conclusions
I went away from the Symposium with a strong conviction, more intuitive than rational, that human experimental psychology, theoretical linguistics, and computer simulation of cognitive processes were all pieces of a larger whole, and that the future would see progressive elaboration and coordination of their shared concerns …
5. Post-Script—A Syntax for AI?
Grammars generate sentences whose properties reflect preferences relating to the ease and efficiency of processing.[10] (p. 3)
Just as the conceptual components of language may derive from cognitive content, so might the computational facts about language stem from … the multitude of competing and converging constraints imposed by perception, production, and memory for linear forms in real time.[37] (pp. 189–190)
… in very significant ways, language is a radically new behavior. At a phenomenological level, it is quite unlike anything else that we (or any other species) do. It has features that are remarkable and unique. The crucial difference between this view and the view of language as a separable domain-specific module … is that the uniqueness emerges out of an interaction involving small differences in domain-nonspecific behaviors.[38] (p. 25)
In order to qualify as emergentist, an account of language functioning must tell us where a language behavior “comes from”. In most cases, this involves accounting for a behavior in a target domain as emerging from some related external domain.[39] (p. xii)
The mapping between a sentence’s form and its meaning is direct; there is no role for syntactic structure. Instead, forms (strings of words) are mapped directly onto ‘proto-semantic’ representations consisting largely of predicate–argument complexes.
The mapping between form and meaning is regulated by algorithms that operate in real time in the course of speaking and understanding.
The algorithms involved in mapping form onto meaning (and vice versa) are largely shaped by forces that contribute to processing efficiency.
- Natural Syntax does not presuppose or make use of syntactic structure; sentences are just what they appear to be—strings of words, as also assumed in language models;
- The algorithms of Natural Syntax are designed to operate in a way compatible with real-time processing—from the first word to the last, a common premise in language models as well;
- The algorithms of Natural Syntax are shaped by processing considerations that (depending on the outcome of experiments such as the one suggested here) could open the door to a deeper understanding of the syntax of human language in general, including language acquisition and language change—also a goal of contemporary AI research.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Contreras Kallens, P.; Kristensen-McLachland, R.D.; Christiansen, M.H. Large language models demonstrate the potential of statistical learning in language. Cogn. Sci. 2023, 47, e13256. [Google Scholar] [CrossRef] [PubMed]
- Piantadosi, S. Modern Language Models Refute Chomsky’s Approach to Language. Unpublished ms. Department of Psychology, University of California, Berkeley, CA, USA. 2023. Available online: http://colala.berkeley.edu/ (accessed on 1 May 2023).
- Hoff, E.; Core, C.; Place, S.; Rumiche, R.; Señor, M.; Parra, M. Dual language exposure and bilingual acquisition. J. Child Lang. 2012, 39, 1–27. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kaan, E.; Grüter, T. Prediction in second language processing and learning: Advances and directions. In Prediction in Second Language Processing and Learning; Kaan, E., Grüter, T., Eds.; John Benjamins: Amsterdam, The Netherlands, 2021; pp. 1–24. [Google Scholar]
- Phillips, C.; Ehrenhofer, L. The role of language processing in language acquisition. Linguist. Approaches Biling. 2015, 5, 409–453. [Google Scholar] [CrossRef] [Green Version]
- Chang, F.; Dell, G.S.; Bock, K. Becoming syntactic. Psychol. Rev. 2006, 113, 234–272. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Legate, J.; Yang, C. Empirical re-assessment of stimulus poverty arguments. Linguist. Rev. 2002, 19, 151–162. [Google Scholar] [CrossRef]
- Warstadt, A.; Bowman, S. What artificial neural networks can tell us about human language acquisition. In Algebraic Structures in Natural Language; Lappin, S., Bernardy, J.-P., Eds.; Taylor & Francis Group: New York, NY, USA, 2023; pp. 17–59. [Google Scholar]
- Hawkins, J. Efficiency and Complexity in Grammars; Oxford University Press: Oxford, UK, 2004. [Google Scholar]
- Hawkins, J. Cross-Linguistic Variation and Efficiency; Oxford University Press: Oxford, UK, 2014. [Google Scholar]
- O’Grady, W. Natural Syntax: An Emergentist Primer, 3rd ed.; 2022; Available online: http://ling.hawaii.edu/william-ogrady (accessed on 12 August 2022).
- O’Grady, W. Working memory and natural syntax. In The Cambridge Handbook of Working Memory and Language; Schwieter, J., Wen, Z., Eds.; Cambridge University Press: Cambridge, UK, 2022; pp. 322–342. [Google Scholar]
- O’Grady, W. Syntactic Carpentry: An Emergentist Approach to Syntax; Erlbaum: Mahwah, NJ, USA, 2005. [Google Scholar]
- Gibson, E.; Futrell, R.; Piantadosi, S.; Dautriche, I.; Mahowald, K.; Bergen, L.; Levey, R. How efficiency shapes human language. Trends Cogn. Sci. 2019, 23, 389–407. [Google Scholar] [CrossRef] [PubMed]
- Christie, S.T.; Schrater, P. Cognitive cost as dynamic allocation of energetic resources. Front. Neurosci. 2015, 9, 289. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kool, W.; McGuire, J.; Rosen, Z.; Botvinick, M. Decision making and the avoidance of cognitive demand. J. Exp. Psychol. Gen. 2010, 139, 665–682. [Google Scholar] [CrossRef] [Green Version]
- Otto, A.R.; Daw, N. The opportunity cost of time modulates cognitive effort. Neuropsychologia 2019, 123, 92–105. [Google Scholar] [CrossRef]
- Dressler, W. What is natural in natural morphology? Prague Linguist. Circ. Pap. 1999, 3, 135–144. [Google Scholar]
- Zeijlstra, H. Sentential Negation and Negative Concord. Ph.D. Thesis, University of Amsterdam, Amsterdam, The Netherlands, 2004. [Google Scholar]
- Robinson, M.; Thoms, G. On the syntax of English variable negative concord. Univ. Pa. Work. Pap. Linguist. 2021, 27, 24. [Google Scholar]
- Fischer, O.; van Kemenade, A.; Koopman, W.; van der Wurff, W. The syntax of Early English; Cambridge University Press: Cambridge, UK, 2001. [Google Scholar]
- Bellugi, U. The Acquisition of the System of Negation in Children’s Speech. Ph.D. Thesis, Harvard University, Cambridge, MA, USA, 1967. [Google Scholar]
- Thornton, R.; Notley, A.; Moscati, V.; Crain, S. Two negations for the price of one. Glossa 2016, 1, 45. [Google Scholar] [CrossRef] [Green Version]
- Moscati, V. Children (and some adults) overgeneralize negative concord: The case of fragment answers to negative questions in Italian. Univ. Pa. Work. Pap. Linguist. 2020, 26, 169–178. [Google Scholar]
- Zhou, P.; Crain, S.; Thornton, R. Children’s knowledge of double negative structures in Mandarin Chinese. J. East Asian Ling. 2014, 23, 333–359. [Google Scholar] [CrossRef]
- van Schijndel, M.; Mueller, A.; Linzen, T. Quantity doesn’t buy quality syntax with neural language models. In Proceedings of the Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5831–5837. [Google Scholar]
- Linzen, T. How can we accelerate progress toward human-like linguistic generalization? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 5210–5217. [Google Scholar]
- Zhang, Y.; Warstadt, A.; Li, H.-S.; Bowman, S. When do you need billions of words of pretraining data? In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Bangkok, Thailand, 1–6 August 2021; Volume 1, pp. 1112–1125. [Google Scholar]
- Hosseini, E.; Schrimpf, M.; Zhang, Y.; Bowman, S.; Zaslavsky, N.; Fedorenko, E. Artificial Neural Network Language Models Align Neurally and Behaviorally with Humans Even after a Developmentally Realistic Amount of Training. Unpublished ms. 2022. Available online: https://www.biorxiv.org/content/10.1101/2022.10.04.510681v1 (accessed on 1 May 2023).
- Warstadt, A.; Choshen, L.; Mueler, A.; Wilcox, E.; Williams, A.; Zhuang, C. Call for Papers—the BabyLM Challenge: Sample-Efficient Pretraining on a Developmentally Plausible Corpus. 2023. Available online: https://babylm.github.io/ (accessed on 1 May 2023).
- Ambridge, B. A computational simulation of children’s language acquisition. In Proceedings of the Third Conference on Language, Data and Knowledge (LDK 2021), Zaragoza, Spain, 1–3 September 2021; pp. 4:1–4:3. [Google Scholar]
- Gilkerson, J.; Richards, J.; Warren, S.; Montgomery, J.; Greenwood, C.; Oller, D.K.; Hansen, J.; Terrence, P. Mapping the early language environment using all-day recordings and au tomated analysis. Am. J. Speech Lang. Pathol. 2017, 26, 248–265. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Miller, G. The cognitive revolution: A historical perspective. Trends Cogn. Sci. 2003, 17, 141–144. [Google Scholar] [CrossRef] [PubMed]
- Miller, G. The Society Theory. In Artificial Intelligence: An MIT Perspective; Winston, P.H., Brown, R.H., Eds.; MIT Press: Cambridge, MA, USA, 1979; Volume 1, pp. 423–450. [Google Scholar]
- Gardner, H. The Mind’s New Science: The History of the Cognitive Revolution; Basic Books: New York, NY, USA, 1985. [Google Scholar]
- Stampe, D. A dissertation on Natural Phonology. Ph.D. Thesis, Department of Linguistics, University of Chicago, Chicago, IL, USA, 1973. [Google Scholar]
- Bates, E. Bioprograms and the innateness hypothesis. Behav. Brain Sci. 1984, 7, 188–190. [Google Scholar] [CrossRef]
- Elman, J. The emergence of language: A conspiracy theory. In The Emergence of Language; MacWhinney, B., Ed.; Erlbazum: Mahwah, NJ, USA, 1999; pp. 1–27. [Google Scholar]
- MacWhinney, B. Preface. In The Emergence of Language; MacWhinney, B., Ed.; Erlbazum: Mahwah, NJ, USA, 1999; pp. ix–xvii. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
O’Grady, W.; Lee, M. Natural Syntax, Artificial Intelligence and Language Acquisition. Information 2023, 14, 418. https://doi.org/10.3390/info14070418
O’Grady W, Lee M. Natural Syntax, Artificial Intelligence and Language Acquisition. Information. 2023; 14(7):418. https://doi.org/10.3390/info14070418
Chicago/Turabian StyleO’Grady, William, and Miseon Lee. 2023. "Natural Syntax, Artificial Intelligence and Language Acquisition" Information 14, no. 7: 418. https://doi.org/10.3390/info14070418
APA StyleO’Grady, W., & Lee, M. (2023). Natural Syntax, Artificial Intelligence and Language Acquisition. Information, 14(7), 418. https://doi.org/10.3390/info14070418