An NLP-Based Exploration of Variance in Student Writing and Syntax: Implications for Automated Writing Evaluation
Abstract
:1. Introduction
1.1. Recognizing Writing Variance in Automated Writing Evaluation
1.2. A Focus on Syntax
“Syntactic complexity refers to the formal characteristics of syntax (e.g., the amount of subordination) […]. In contrast, syntactic sophistication refers to the relative difficulty of learning particular syntactic structures […], which (from a usage-based perspective) is related to input frequency and contingency. The term sophistication […] refers to less frequent words as more sophisticated because they tend to be produced by more proficient writers”
1.3. Research Questions
- What variance do student writers display with regards to syntactic sophistication and complexity? AWE algorithms typically capture syntactic variance by a single dimension that varies linearly from “lower” to “higher” sophistication and complexity. In truth, students may enact complexity in different ways revealed by variance in syntactic features or constructions;
- What are the primary linguistic features (i.e., NLP metrics) that characterize patterns of syntactic sophistication and complexity? If there are multiple patterns or profiles, these patterns should exhibit distinct defining characteristics. Subsequently, these patterns of NLP metrics could directly inform future AWE algorithms;
- How does variance in syntactic sophistication and complexity relate to writing quality (i.e., human-assigned scores)? The purpose of this research is neither to analyze how syntactic measures predict writing quality nor to create AWE scoring algorithms. Nonetheless, it is useful to consider how variance in syntactic patterns is associated with variance in writing quality, if at all. Greater sophistication and complexity may be predictive of higher quality. Similarly, if distinct patterns are observed, certain patterns may be associated with higher quality, whereas others are associated with lower quality. Alternatively, variance in writing quality may be observed across all patterns. That is, observed patterns may represent truly distinct ways of writing that can each be navigated successfully or unsuccessfully. These generalizations may be applicable to development AWE algorithms that are responsive or personalized to students’ variability in writing.
2. Method
2.1. Essay Corpus and Preparation
2.1.1. Initial Corpus
2.1.2. Syntactic NLP Features
- Anna is [delighted with her new job].
- Anna is [delighted that the people who interviewed her last week have made an offer and the salary is what she had hoped for].
- 3.
- Ryan is a [teacher of Portuguese].
- 4.
- Ryan is a [teacher who really likes doing fun activities and creating fun lesson plans for his students every semester].
2.1.3. Corpus Filtering and Analysis
2.2. Analysis
2.2.1. K-Means Clustering
2.2.2. Discriminant Function Analysis
2.2.3. Analysis of Variance (ANOVA) and Linear Regression
3. Results
3.1. Variance in Syntactic Sophistication and Complexity among Student Writers
3.2. Summary of Clusters
3.3. Relationships between Clusters, Syntactic Sophistication, and Writing Quality
- 5.
- Venus is the most comparable planet to earth, and sometimes, the closest in distance.
- 6.
- The Earth is old and has many different kinds of things living on Earth.
- 7.
- Each time a person gets into a car, they put themselves at the risk of being killed or severly injured in a car accident from the second they turn the ignition to the moment they put the car back in “park”. Traffic accidents claim the lives of countless innocent people each and every day.
- 8.
- driverless cars should not be made or thought about personal. Also in the reading it states that driverless cars arent fully driverless some of them need to have the hands on the sensors on the steering wheel and the seats will vibrate when something is wrong and the car cant take control of it and you have to control the car yourself.
- 9.
- With car companies such as [company name] already planning the release of these self-driving cars, this future of transportation will increase safety, efficiency, and entertainment for humans going from one place to another and eventually make standard automobiles obsolete.
- 10.
- I never want there to be flying cars because thats when people get lazy and the cars would be useless i want to be able to hop in my cars and go race around and not hop in it and read a book and watch the car drive.
- 11.
- Since automobiles were first invented, they have been continuously updated in all aspects of the car, it’s design, how aerodynamic it is, the amount of cylinders an engine can have, the fuel efficiency, and a large variety of other properties.
- 12.
- Self-driving cars could be a more productive way for transportation and could also save a lot of lives in the process, a long with making the common person’s life just a bit easier in this hard world.
4. Discussion
4.1. Syntactic Variance in Writing
4.2. Style and Score
4.3. Implications for Automated Writing Evaluation
4.4. AWE Development
4.5. Implications for Instruction with AWE
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Biber, D.; Conrad, S. Register, Genre, and Style; Cambridge University Press: Cambridge, UK, 2019. [Google Scholar] [CrossRef]
- Nesi, H.; Gardner, S. Genres across the Disciplines: Student Writing in Higher Education; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
- Allen, L.K.; Likens, A.D.; McNamara, D.S. Writing flexibility in argumentative essays: A multidimensional analysis. Read. Writ. 2019, 32, 1607–1634. [Google Scholar] [CrossRef]
- Collins, P.; Tate, T.P.; Won Lee, J.; Krishnan, J.A.; Warschauer, M. A multi-dimensional examination of adolescent writing: Considering the writer, genre and task demands. Read. Writ. 2021, 34, 2151–2173. [Google Scholar] [CrossRef]
- Graham, S.; Harris, K.R.; Fishman, E.; Houston, J.; Wijekumar, K.; Lei, P.W.; Ray, A.B. Writing skills, knowledge, motivation, and strategic behavior predict students’ persuasive writing performance in the context of robust writing instruction. Elem. Sch. J. 2019, 119, 487–510. [Google Scholar] [CrossRef]
- Wijekumar, K.; Graham, S.; Harris, K.R.; Lei, P.W.; Barkel, A.; Aitken, A.; Ray, A.; Houston, J. The roles of writing knowledge, motivation, strategic behaviors, and skills in predicting elementary students’ persuasive writing from source material. Read. Writ. 2019, 32, 1431–1457. [Google Scholar] [CrossRef]
- Crossley, S.A.; Roscoe, R.D.; McNamara, D.S. What is successful writing? An investigation into the multiple ways writers can write successful essays. Writ. Commun. 2014, 31, 184–214. [Google Scholar] [CrossRef]
- Attali, Y. A comparison of newly-trained and experienced raters on a standardized writing assessment. Lang. Test. 2016, 33, 99–115. [Google Scholar] [CrossRef]
- Raczynski, K.R.; Cohen, A.S.; Engelhard, G., Jr.; Lu, Z. Comparing the effectiveness of self-paced and collaborative frame-of-reference training on rater accuracy in a large-scale writing assessment. J. Educ. Meas. 2015, 52, 301–318. [Google Scholar] [CrossRef]
- Denessen, E.; Hornstra, L.; van den Bergh, L.; Bijlstra, G. Implicit measures of teachers’ attitudes and stereotypes, and their effects on teacher practice and student outcomes: A review. Learn. Instr. 2022, 78, 101437. [Google Scholar] [CrossRef]
- Quinn, D.M. Experimental evidence on teachers’ racial bias in student evaluation: The role of grading scales. Educ. Eval. Policy Anal. 2020, 42, 375–392. [Google Scholar] [CrossRef]
- Kellogg, R.T.; Whitehead, A.P. Training advanced writing skills: The case for deliberate practice. Educ. Psychol. 2009, 44, 250–266. [Google Scholar] [CrossRef]
- Stevenson, M.; Phakiti, A. Automated feedback and second language writing. In Feedback in Second Language Writing: Contexts and Issues; Hyland, K., Hyland, F., Eds.; Cambridge University Press: Cambridge, UK, 2019; pp. 125–142. [Google Scholar] [CrossRef]
- Wilson, J.; Myers, M.C.; Potter, A. Investigating the promise of automated writing evaluation for supporting formative writing assessment at scale. Assessment in Education. Assess. Educ. Princ. Policy Pract. 2022, 29, 183–199. [Google Scholar] [CrossRef]
- Dodigovic, M.; Tovmasyan, A. Automated writing evaluation: The accuracy of Grammarly’s feedback on form. Int. J. TESOL Stud. 2021, 3, 71–88. [Google Scholar] [CrossRef]
- Ferrara, S.; Qunbar, S. Validity arguments for AI-based automated scores: Essay scoring as an illustration. J. Educ. Meas. 2022, 59, 288–313. [Google Scholar] [CrossRef]
- Shermis, M.D.; Burstein, J. Handbook of Automated Essay Evaluation: Current Applications and New Direction; Routledge: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
- Strobl, C.; Ailhaud, E.; Benetos, K.; Devitt, A.; Kruse, O.; Proske, A.; Rapp, C. Digital support for academic writing: A review of technologies and pedagogies. Comput. Educ. 2019, 131, 33–48. [Google Scholar] [CrossRef]
- Yan, D.; Rupp, A.A.; Foltz, P.W. (Eds.) Handbook of Automated Scoring: Theory into Practice, 1st ed.; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar] [CrossRef]
- Chen, C.E.; Cheng, W. Beyond the design of automated writing evaluation: Pedagogical practices and perceived learning effectiveness in EFL writing classes. Lang. Learn. Technol. 2008, 12, 94–112. [Google Scholar]
- Fu, Q.K.; Zou, D.; Xie, H.; Cheng, G. A review of AWE feedback: Types, learning outcomes, and implications. Comput. Assist. Lang. Learn. 2022, 37, 179–221. [Google Scholar] [CrossRef]
- Li, Z.; Feng, H.H.; Saricaoglu, A. The short-term and long-term effects of AWE feedback on ESL students’ development of grammatical accuracy. CALICO J. 2017, 34, 355–375. [Google Scholar] [CrossRef]
- Link, S.; Mehrzad, M.; Rahimi, M. Impact of automated writing evaluation on teacher feedback, student revision, and writing improvement. Comput. Assist. Lang. Learn. 2022, 35, 605–634. [Google Scholar] [CrossRef]
- Stevenson, M.; Phakiti, A. The effects of computer-generated feedback on the quality of writing. Assess. Writ. 2014, 19, 51–65. [Google Scholar] [CrossRef]
- Zhang, Z.V.; Hyland, K. Fostering student engagement with feedback: An integrated approach. Assess. Writ. 2022, 51, 100586. [Google Scholar] [CrossRef]
- Grimes, D.; Warschauer, M. Utility in a fallible tool: A multi-site case study of automated writing evaluation. J. Technol. Learn. Assess. 2010, 8, 4–42. [Google Scholar]
- Lv, X.; Ren, W.; Xie, Y. The effects of online feedback on ESL/EFL writing: A meta-analysis. Asia-Pac. Educ. Res. 2021, 30, 643–653. [Google Scholar] [CrossRef]
- Nunes, A.; Cordeiro, C.; Limpo, T.; Castro, S.L. Effectiveness of automated writing evaluation systems in school settings: A systematic review of studies from 2000 to 2020. J. Comput. Assist. Learn. 2022, 38, 599–620. [Google Scholar] [CrossRef]
- Zhang, Z.V. Engaging with automated writing evaluation (AWE) feedback on L2 writing: Student perceptions and revisions. Assess. Writ. 2020, 43, 100439. [Google Scholar] [CrossRef]
- Deane, P.; Quinlan, T. What automated analyses of corpora can tell us about students’ writing skills. J. Writ. Res. 2010, 2, 151–177. [Google Scholar] [CrossRef]
- Hyland, K.; Hyland, F. Feedback on second language students’ writing. Lang. Teach. 2006, 39, 83–101. [Google Scholar] [CrossRef]
- McCaffrey, D.F.; Zhang, M.; Burstein, J. Across performance contexts: Using automated writing evaluation to explore student writing. J. Writ. Anal. 2022, 6, 167–199. [Google Scholar] [CrossRef]
- Perelman, L. When “the state of the art” is counting words. Assess. Writ. 2014, 21, 104–111. [Google Scholar] [CrossRef]
- Hoang, G.T.L.; Kunnan, A.J. Automated essay evaluation for English language learners: A case study of MY Access. Lang. Assess. Q. 2016, 13, 359–376. [Google Scholar] [CrossRef]
- Anson, C.M. Assessing writing in cross-curricular programs: Determining the locus of activity. Assess. Writ. 2006, 11, 100–112. [Google Scholar] [CrossRef]
- Bai, L.; Hu, G. In the face of fallible AWE feedback: How do students respond? Educ. Psychol. 2017, 37, 67–81. [Google Scholar] [CrossRef]
- Chen, D.; Hebert, M.; Wilson, J. Examining human and automated ratings of elementary students’ writing quality: A multivariate generalizability theory application. Am. Educ. Res. J. 2022, 59, 1122–1156. [Google Scholar] [CrossRef]
- Crossley, S.A. Linguistic features in writing quality and development: An overview. J. Writ. Res. 2020, 11, 415–443. [Google Scholar] [CrossRef]
- Crossley, S.A.; Kyle, K. Assessing writing with the tool for the automatic analysis of lexical sophistication (TAALES). Assess. Writ. 2018, 38, 46–50. [Google Scholar] [CrossRef]
- Crossley, S.A.; McNamara, D.S. Say more and be more coherent: How text elaboration and cohesion can increase writing quality. J. Writ. Res. 2016, 7, 351–370. [Google Scholar] [CrossRef]
- Gardner, S.; Nesi, H.; Biber, D. Discipline, level, genre: Integrating situational perspectives in a new MD analysis of university student writing. Appl. Linguist. 2019, 40, 646–674. [Google Scholar] [CrossRef]
- Graham, S. A revised writer(s)-within-community model of writing. Educ. Psychol. 2018, 53, 258–279. [Google Scholar] [CrossRef]
- Biber, D.; Conrad, S. Variation in English: Multi-Dimensional Studies; Routledge: London, UK, 2014. [Google Scholar] [CrossRef]
- Friginal, E.; Weigle, S.C. Exploring multiple profiles of L2 writing using multi-dimensional analysis. J. Second Lang. Writ. 2014, 26, 80–95. [Google Scholar] [CrossRef]
- Goulart, L. Register variation in L1 and L2 student writing: A multidimensional analysis. Regist. Stud. 2021, 3, 115–143. [Google Scholar] [CrossRef]
- Goulart, L.; Staples, S. Multidimensional analysis. In Conducting Genre-Based Research in Applied Linguistics; Kessler, M., Polio, C., Eds.; Routledge: New York, NY, USA, 2023; pp. 127–148. [Google Scholar]
- McNamara, D.S.; Graesser, A.C. Coh-Metrix: An automated tool for theoretical and applied natural language processing. In Applied Natural Language Processing and Content Analysis: Identification, Investigation, and Resolution; McCarthy, P.M., Boonthum-Denecke, C., Eds.; IGI Global: Hershey, PA, USA, 2012; pp. 188–205. [Google Scholar] [CrossRef]
- McNamara, D.S.; Graesser, A.C.; McCarthy, P.M.; Cai, Z. Automated Evaluation of Text and Discourse with Coh-Metrix; Cambridge University Press: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
- Butterfuss, R.; Roscoe, R.D.; Allen, L.K.; McCarthy, K.S.; McNamara, D.S. Strategy uptake in Writing Pal: Adaptive feedback and instruction. J. Educ. Comput. Res. 2022, 60, 696–721. [Google Scholar] [CrossRef]
- McCarthy, K.S.; Roscoe, R.D.; Allen, L.K.; Likens, A.D.; McNamara, D.S. Automated writing evaluation: Does spelling and grammar feedback support high-quality writing and revision? Assess. Writ. 2022, 52, 100608. [Google Scholar] [CrossRef]
- McNamara, D.S.; Crossley, S.A.; Roscoe, R. Natural language processing in an intelligent writing strategy tutoring system. Behav. Res. Methods 2013, 45, 499–515. [Google Scholar] [CrossRef]
- Roscoe, R.D.; Allen, L.K.; Weston, J.L.; Crossley, S.A.; McNamara, D.S. The Writing Pal intelligent tutoring system: Usability testing and development. Comput. Compos. 2014, 34, 39–59. [Google Scholar] [CrossRef]
- Weston-Sementelli, J.L.; Allen, L.K.; McNamara, D.S. Comprehension and writing strategy training improves performance on content-specific source-based writing tasks. Int. J. Artif. Intell. Educ. 2018, 28, 106–137. [Google Scholar] [CrossRef]
- Jagaiah, T.; Olinghouse, N.G.; Kearns, D.M. Syntactic complexity measures: Variation by genre, grade-level, students’ writing abilities, and writing quality. Read. Writ. 2020, 33, 2577–2638. [Google Scholar] [CrossRef]
- Song, R. A scientometric review of syntactic complexity in L2 writing based on Web of Science (2010–2022). Int. J. Linguist. Lit. Transl. 2022, 5, 18–27. [Google Scholar] [CrossRef]
- Staples, S.; Egbert, J.; Biber, D.; Gray, B. Academic writing development at the university level: Phrasal and clausal complexity across level of study, discipline, and genre. Writ. Commun. 2016, 33, 149–183. [Google Scholar] [CrossRef]
- Abbott, R.D.; Berninger, V.W.; Fayol, M. Longitudinal relationships of levels of language in writing and between writing and reading in grades 1 to 7. J. Educ. Psychol. 2010, 102, 281–298. [Google Scholar] [CrossRef]
- Berninger, V.W.; Mizokawa, D.T.; Bragg, R.; Cartwright, A.; Yates, C. Intraindividual differences in levels of written language. Read. Writ. Q. 1994, 10, 259–275. [Google Scholar] [CrossRef]
- Wilson, J.; Roscoe, R.D.; Ahmed, Y. Automated formative writing assessment using a levels of language framework. Assess. Writ. 2017, 34, 16–36. [Google Scholar] [CrossRef]
- Lu, X. Automated measurement of syntactic complexity in corpus-based L2 writing research and implications for writing assessment. Lang. Test. 2017, 34, 493–511. [Google Scholar] [CrossRef]
- Mostafa, T.; Crossley, S.A. Verb argument construction complexity indices and L2 writing quality: Effects of writing tasks and prompts. J. Second Lang. Writ. 2020, 49, 100730. [Google Scholar] [CrossRef]
- Kyle, K.; Crossley, S.; Verspoor, M. Measuring longitudinal writing development using indices of syntactic complexity and sophistication. Stud. Second Lang. Acquis. 2021, 43, 781–812. [Google Scholar] [CrossRef]
- Kyle, K.; Crossley, S.A. Measuring syntactic complexity in L2 writing using fine-grained clausal and phrasal indices. Mod. Lang. J. 2018, 102, 333–349. [Google Scholar] [CrossRef]
- Wang, S.; Xu, T.; Li, H.; Zhang, C.; Liang, J.; Tang, J.; Yu, P.S.; Wen, Q. Large language models for education: A survey and outlook. arXiv 2024, arXiv:2403.18105. [Google Scholar] [CrossRef]
- Ramesh, D.; Sanampudi, S.K. An automated essay scoring systems: A systematic literature review. Artif. Intell. Rev. 2022, 55, 2495–2527. [Google Scholar] [CrossRef]
- Janda, H.K.; Pawar, A.; Du, S.; Mago, V. Syntactic, semantic and sentiment analysis: The joint effect on automated essay evaluation. IEEE Access 2019, 7, 108486–108503. [Google Scholar] [CrossRef]
- Crossley, S.A.; Baffour, P.; Tian, Y.; Picou, A.; Benner, M.; Boser, U. The persuasive essays for rating, selecting, and understanding argumentative and discourse elements (PERSUADE) corpus 1.0. Assess. Writ. 2022, 54, 100667. [Google Scholar] [CrossRef]
- Uccelli, P.; Dobbs, C.L.; Scott, J. Mastering academic language: Organization and stance in the persuasive writing of high school students. Writ. Commun. 2013, 30, 36–62. [Google Scholar] [CrossRef]
- Davies, M. The Corpus of Contemporary American English (COCA): 560 Million Words, 1990-Present; Brigham Young University: Provo, UT, USA, 2008. [Google Scholar]
- Larsson, T.; Kaatari, H. Syntactic complexity across registers: Investigating (in) formality in second-language writing. J. Engl. Acad. Purp. 2020, 45, 100850. [Google Scholar] [CrossRef]
- Deane, P.; Wilson, J.; Zhang, M.; Li, C.; van Rijn, P.; Guo, H.; Roth, A.; Winchester, E.; Richter, T. The sensitivity of a scenario-based assessment of written argumentation to school differences in curriculum and instruction. Int. J. Artif. Intell. Educ. 2021, 31, 57–98. [Google Scholar] [CrossRef]
- Clarke, N.; Foltz, P.; Garrard, P. How to do things with (thousands of) words: Computational approaches to discourse analysis in Alzheimer’s disease. Cortex 2020, 129, 446–463. [Google Scholar] [CrossRef]
- Bernius, J.P.; Krusche, S.; Bruegge, B. Machine learning based feedback on textual student answers in large courses. Comput. Educ. Artif. Intell. 2022, 3, 100081. [Google Scholar] [CrossRef]
- Whitelock-Wainwright, A.; Laan, N.; Wen, D.; Gašević, D. Exploring student information problem solving behaviour using fine-grained concept map and search tool data. Comput. Educ. 2020, 145, 103731. [Google Scholar] [CrossRef]
- Mizumoto, A. Calculating the relative importance of multiple regression predictor variables using dominance analysis and random forests. Lang. Learn. 2023, 73, 161–196. [Google Scholar] [CrossRef]
- Sinharay, S.; Zhang, M.; Deane, P. Prediction of essay scores from writing process and product features using data mining methods. Appl. Meas. Educ. 2019, 32, 116–137. [Google Scholar] [CrossRef]
- Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A K-means clustering algorithm. J. R. Stat. Soc. Ser. C Appl. Stat. 1979, 28, 100–108. [Google Scholar] [CrossRef]
- Crowther, D.; Kim, S.; Lee, J.; Lim, J.; Loewen, S. Methodological synthesis of cluster analysis in second language research. Lang. Learn. 2021, 71, 99–130. [Google Scholar] [CrossRef]
- Wu, J. Advances in K-Means Cluster: A Data Mining Thinking; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2013; Available online: https://www.R-project.org/ (accessed on 31 May 2024).
- Talebinamvar, M.; Zarrabi, F. Clustering students’ writing behaviors using keystroke logging: A learning analytic approach in EFL writing. Lang. Test. Asia 2022, 12, 6. [Google Scholar] [CrossRef]
- Meyers, L.S.; Gamst, G.; Guarino, A.J. Applied Multivariate Research: Design and Interpretation, 3rd ed.; Sage: Thousand Oaks, CA, USA, 2016. [Google Scholar]
- Tabachnick, B.G.; Fidell, L.S. Using Multivariate Statistics, 7th ed.; Pearson: Boston, MA, USA, 2018. [Google Scholar]
- Dikli, S. An overview of automated scoring of essays. J. Technol. Learn. Assess. 2006, 5, 4–34. [Google Scholar]
- Shermis, M.D.; Burstein, J.; Elliot, N.; Miel, S.; Foltz, P.W. Automated writing evaluation: An expanding body of knowledge. In Handbook of Writing Research; MacArthur, C.A., Graham, S., Fitzgerald, J., Eds.; Guilford Press: New York, NY, USA, 2016; pp. 395–409. [Google Scholar]
- Johnson, A.C.; Wilson, J.; Roscoe, R.D. College student perceptions of writing errors, text quality, and author characteristics. Assess. Writ. 2017, 34, 72–87. [Google Scholar] [CrossRef]
- Jarvis, S.; Grant, L.; Bikowski, D.; Ferris, D. Exploring multiple profiles of highly rated learner compositions. J. Second Lang. Writ. 2003, 12, 377–403. [Google Scholar] [CrossRef]
- Tywoniw, R.; Crossley, S. The Effect of Cohesive Features in Integrated and Independent L2 Writing Quality and Text Classification. Lang. Educ. Assess. 2019, 2, 110–134. [Google Scholar] [CrossRef]
- Andrade, H.L.; Du, Y.; Mycek, K. Rubric-referenced self-assessment and middle school students’ writing. Assess. Educ. Princ. Policy Pract. 2010, 17, 199–214. [Google Scholar] [CrossRef]
- Ghaffar, M.A.; Khairallah, M.; Salloum, S. Co-constructed rubrics and assessment for learning: The impact on middle school students’ attitudes and writing skills. Assess. Writ. 2020, 45, 100468. [Google Scholar] [CrossRef]
- Panadero, E.; Jonsson, A. The use of scoring rubrics for formative assessment purposes revisited: A review. Educ. Res. Rev. 2013, 9, 129–144. [Google Scholar] [CrossRef]
- Knight, S.; Shibani, A.; Abel, S.; Gibson, A.; Ryan, P.; Sutton, N.; Shum, S. AcaWriter: A learning analytics tool for formative feedback on academic writing. J. Writ. Res. 2020, 12, 141–186. [Google Scholar] [CrossRef]
- McNamara, D.S.; Crossley, S.A.; McCarthy, P.M. Linguistic features of writing quality. Writ. Commun. 2010, 27, 57–86. [Google Scholar] [CrossRef]
- Kim, H. Profiles of undergraduate student writers: Differences in writing strategy and impacts on text quality. Learn. Individ. Differ. 2020, 78, 101823. [Google Scholar] [CrossRef]
- Van Steendam, E.; Vandermeulen, N.; De Maeyer, S.; Lesterhuis, M.; Van den Bergh, H.; Rijlaarsdam, G. How students perform synthesis tasks: An empirical study into dynamic process configurations. J. Educ. Psychol. 2022, 114, 1773–1800. [Google Scholar] [CrossRef]
- Kyle, K.; Crossley, S. Assessing syntactic sophistication in L2 writing: A usage-based approach. Lang. Test. 2017, 34, 513–535. [Google Scholar] [CrossRef]
- Crossley, S.A.; Kyle, K.; Dascalu, M. The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap. Behav. Res. Methods 2019, 51, 14–27. [Google Scholar] [CrossRef] [PubMed]
- Benetos, K.; Bétrancourt, M. Digital authoring support for argumentative writing: What does it change? J. Writ. Res. 2020, 12, 263–290. [Google Scholar] [CrossRef]
- Bowen, N.E.J.A.; Thomas, N.; Vandermeulen, N. Exploring feedback and regulation in online writing classes with keystroke logging. Comput. Compos. 2022, 63, 102692. [Google Scholar] [CrossRef]
- Correnti, R.; Matsumura, L.C.; Wang, E.L.; Litman, D.; Zhang, H. Building a validity argument for an automated writing evaluation system (eRevise) as a formative assessment. Comput. Educ. Open 2022, 3, 100084. [Google Scholar] [CrossRef]
- Goldshtein, M.; Alhashim, A.G.; Roscoe, R.D. Automating Bias in Writing Evaluation: Sources, Barriers, and Recommendations. In The Routledge International Handbook of Automated Essay Evaluation; Shermis, M.D., Wilson, J., Eds.; Routledge: London, UK, 2024; pp. 421–445. [Google Scholar]
- Yan, L.; Sha, L.; Zhao, L.; Li, Y.; Martinez-Maldonado, R.; Chen, G.; Li, X.; Jin, Y.; Gašević, D. Practical and ethical challenges of large language models in education: A systematic scoping review. Br. J. Educ. Technol. 2024, 55, 90–112. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Pack, A.; Barrett, A.; Escalante, J. Large language models and automated essay scoring of English language learner writing: Insights into validity and reliability. Comput. Educ. Artif. Intell. 2024, 6, 100234. [Google Scholar] [CrossRef]
- Xiao, C.; Ma, W.; Xu, S.X.; Zhang, K.; Wang, Y.; Fu, Q. From Automation to Augmentation: Large Language Models Elevating Essay Scoring Landscape. arXiv 2024, arXiv:2401.06431. [Google Scholar]
- Aull, L. Student-centered assessment and online writing feedback: Technology in a time of crisis. Assess. Writ. 2020, 46, 100483. [Google Scholar] [CrossRef]
- MacSwan, J. Academic English as standard language ideology: A renewed research agency for asset-based language education. Lang. Teach. Res. 2020, 24, 28–36. [Google Scholar] [CrossRef]
- Kay, J. Foundations for human-AI teaming for self-regulated learning with explainable AI (XAI). Comput. Hum. Behav. 2023, 147, 107848. [Google Scholar] [CrossRef]
Index Category (TAASSC Software Label) | |
---|---|
Clause Complexity | |
1. Adjective Complements (avg) | |
average number of adjective complements per clause (acomp_per_cl) | |
2. Nominal Complements (avg) | |
average number of nominal complements per clause (ncomp_per_cl) | |
Noun Phrase Complexity | |
3. Dependents per Nominal (avg) | |
average number of dependents per nominal (av_nominal_deps) | |
4. Dependents per Object (avg) | |
average number of dependents per direct object (av_dobj_deps) | |
5. Dependents per Preposition (avg) | |
average number of dependents per object of the preposition (av_pobj_deps) | |
6. Dependents per Nominal (stdev) | |
dependents per nominal, standard deviation (nominal_deps_stdev) | |
7. Dependents per Subject (stdev) | |
dependents per nominal subject, standard deviation (nsubj_stdev) | |
8. Dependents per Object (stdev) | |
dependents per direct object, standard deviation (dobj_stdev) | |
9. Dependents per Preposition (stdev) | |
dependents per object of the preposition, standard deviation (pobj_stdev) | |
10. Determiners per Nominal (avg) | |
average number of determiners per nominal (det_all_nominal_deps_struct) | |
11. Prepositions per Nominal (avg) | |
average number of prepositions per nominal (prep_all_nominal_deps_struct) | |
12. Adjectival Modifiers (avg) | |
average number of adjectival modifiers per direct object (amod_dobj_deps_struct) | |
13. Prepositions per Preposition (avg) | |
average number of prepositions per object of the preposition (prep_pobj_deps_struct) | |
Syntactic Sophistication | |
14. Lemmas (avg) | |
average frequency of lemmas (all_av_lemma_freq) | |
15. Lemma Construction Combinations (avg) | |
average frequency of lemma construction combinations (all_av_lemma_construction_freq) | |
16. Constructions (log) | |
average frequency of constructions, log transform (all_av_construction_freq_log) | |
17. Constructions in Reference (prop) | |
proportion of constructions in reference corpus (all_construction_attested) | |
18. Combinations in Reference (prop) | |
prop. of lemma construction combinations in reference (all_lemma_construction_attested) |
Student Information | n (% of Essays) |
---|---|
Grade | |
6th | 6212 (17.2%) |
8th | 8072 (22.3%) |
10th | 21,923 (60.5%) |
Gender | |
Male | 17,659 (48.8%) |
Female | 18,548 (51.2%) |
Race_Ethnicity | |
American Indian/Alaskan Native | 102 (0.3%) |
Asian/Pacific Islander | 689 (1.9%) |
Black/African American | 3353 (9.3%) |
Hispanic/Latino | 3073 (8.5%) |
Two or more races/Other | 1512 (4.2%) |
White | 27,478 (75.9%) |
ELL | |
Yes | 1104 (3.0%) |
No | 35,103 (97.0%) |
Disability | |
Not identified as having disability | 34,073 (94.1%) |
Identified as having disability | 2134 (5.9%) |
Economic disadvantage | |
Economically disadvantaged | 15,305 (42.3%) |
Not economically disadvantaged | 20,902 (57.7%) |
Essay Score | n (% of Essays) |
---|---|
2 | 5003 (13.8%) |
3 | 13,847 (38.2%) |
4 | 13,356 (36.9%) |
5 | 3453 (9.5%) |
6 | 548 (1.5%) |
Clusters | |||||||
---|---|---|---|---|---|---|---|
Cluster 1 (n = 8730) | Cluster 2 (n = 9608) | Cluster 3 (n = 6306) | Cluster 4 (n = 11,563) | F(3,36,203) | p | η2 | |
Clausal Complexity | |||||||
Adjective Complements (avg) | 0.12 (0.06) | 0.07 (0.04) | 0.07 (0.05) | 0.08 (0.04) | 2061.28 | <0.001 | 0.15 |
Nominal Complements (avg) | 0.12 (0.06) | 0.07 (0.04) | 0.17 (0.07) | 0.08 (0.04) | 6251.43 | <0.001 | 0.34 |
Noun Phrase Complexity | |||||||
Dependents per Nominal (avg) | 0.95 (0.11) | 0.80 (0.11) | 1.19 (0.14) | 1.06 (0.11) | 15,611.47 | <0.001 | 0.56 |
Dependents per Object (avg) | 1.15 (0.29) | 1.04 (0.25) | 1.48 (0.37) | 1.30 (0.27) | 3244.24 | <0.001 | 0.21 |
Dependents per Preposition (avg) | 1.07 (0.21) | 1.06 (0.22) | 1.26 (0.24) | 1.27 (0.19) | 2817.51 | <0.001 | 0.19 |
Dependents per Nominal (stdev) | 1.07 (0.12) | 1.01 (0.12) | 1.29 (0.18) | 1.14 (0.12) | 6010.14 | <0.001 | 0.33 |
Dependents per Subject (stdev) | 0.83 (0.18) | 0.70 (0.20) | 0.95 (0.26) | 0.88 (0.19) | 2311.62 | <0.001 | 0.16 |
Dependents per Object (stdev) | 0.95 (0.22) | 0.96 (0.20) | 1.14 (0.29) | 1.07 (0.22) | 1165.04 | <0.001 | 0.09 |
Dependents per Preposition (stdev) | 0.92 (0.17) | 0.94 (0.19) | 1.13 (0.20) | 1.06 (0.19 | 2207.78 | <0.001 | 0.15 |
Determiners per Nominal (avg) | 0.32 (0.07) | 0.25 (0.07) | 0.34 (0.08) | 0.33 (0.07) | 4056.27 | <0.001 | 0.25 |
Prepositions per Nominal (avg) | 0.11 (0.04) | 0.09 (0.04) | 0.18 (0.05) | 0.14 (0.04) | 5953.60 | <0.001 | 0.33 |
Adjectival Modifiers (avg) | 0.21 (0.14) | 0.18 (0.11) | 0.28 (0.17) | 0.24 (0.13) | 848.93 | <0.001 | 0.07 |
Prepositions per Preposition (avg) | 0.10 (0.06) | 0.09 (0.06) | 0.16 (0.08) | 0.15 (0.07) | 2113.29 | <0.001 | 0.15 |
Syntactic Sophistication | |||||||
Lemma Frequency (avg) | 2,189,536.92 (441,103.99) | 1,499,780.58 (393,182.18) | 2,254,996.81 (526,383.97) | 1,572,929.61 (397,670.37) | 7278.08 | <0.001 | 0.38 |
Combinations Frequency (avg) | 219,078.92 (71,150.95) | 116,342.16 (52,686.20) | 127,725.66 (54,329.51) | 219,198.11 (84,868.54) | 6633.29 | <0.001 | 0.35 |
Constructions Frequency (log) | 4.79 (0.21) | 4.67 (0.22) | 4.82 (0.22) | 4.61 (0.20) | 1844.56 | <0.001 | 0.13 |
Lemmas in Reference (prop) | 0.95 (0.03) | 0.94 (0.04) | 0.95 (0.04) | 0.92 (0.04) | 1399.68 | <0.001 | 0.10 |
Combinations in Reference (prop) | 0.86 (0.05) | 0.83 (0.06) | 0.86 (0.06) | 0.80 (0.06) | 2408.76 | <0.001 | 0.17 |
Function | |||
---|---|---|---|
Syntactic Variable | 1 | 2 | 3 |
Dependents per Nominal (avg) | 0.706 | −0.327 | |
Dependents per Nominal (stdev) | 0.444 | 0.306 | |
Prepositions per Nominal (avg) | 0.434 | ||
Determiners per Nominal (avg) | 0.364 | −0.345 | |
Dependents per Direct Object (avg) | 0.319 | ||
Lemma Frequency (avg) | 0.350 | 0.573 | |
Lemma Construction Combinations (avg) | 0.321 | 0.563 | |
Lemmas in Reference (prop) | 0.432 | 0.322 | |
Dependents per Preposition | −0.357 | ||
Construction Frequency (log) | 0.340 | ||
Combinations in Reference (prop) | 0.328 | ||
Adjective Complements (avg) | −0.785 | ||
Nominal Complements (avg) | 0.461 |
Group Centroid | |||
---|---|---|---|
Cluster | Function 1 | Function 2 | Function 3 |
Cluster 1 | 0.02 | 1.40 | −0.39 |
Cluster 2 | −2.05 | 0.03 | 0.37 |
Cluster 3 | 2.67 | 0.22 | 0.48 |
Cluster 4 | 0.24 | −1.20 | −0.28 |
β Coefficients | |||||
---|---|---|---|---|---|
Syntactic Measure | Entire Corpus | Cluster 1 (n = 8730) | Cluster 2 (n = 9608) | Cluster 3 (n = 6306) | Cluster 4 (n = 11,563) |
Clausal Complexity | |||||
Adjective Complements (avg) | 0.11 | 0.01 | 0.08 | 0.15 | 0.14 |
Nominal Complements (avg) | 0.04 | 0.02 | 0.07 | −0.01 | 0.07 |
Noun Phrase Complexity | |||||
Dependents per Nominal (avg) | 0.13 | 0.07 | 0.07 | 0.01 | 0.13 |
Dependents per Object (avg) | −0.10 | −0.05 | −0.03 | −0.12 | −0.12 |
Dependents per Preposition (avg) | −0.12 | −0.07 | −0.02 | −0.20 | −0.15 |
Dependents per Nominal (stdev) | −0.17 | −0.08 | −0.06 | −0.19 | −0.14 |
Dependents per Subject (stdev) | 0.12 | 0.11 | 0.13 | 0.10 | 0.08 |
Dependents per Object (stdev) | 0.15 | 0.17 | 0.15 | 0.06 | 0.13 |
Dependents per Preposition (stdev) | 0.13 | 0.16 | 0.10 | 0.12 | 0.10 |
Determiners per Nominal (avg) | 0.12 | 0.09 | 0.09 | 0.13 | 0.08 |
Prepositions per Nominal (avg) | 0.05 | 0.05 | 0.04 | −0.02 | 0.07 |
Adjectival Modifiers (avg) | 0.05 | 0.06 | 0.05 | 0.00 | 0.05 |
Prepositions per Preposition (avg) | 0.06 | 0.08 | 0.07 | 0.05 | 0.02 |
Syntactic Sophistication | |||||
Lemma Frequency (avg) | −0.19 | −0.18 | −0.10 | −0.20 | −0.11 |
Combinations Frequency (avg) | 0.03 | −0.01 | 0.08 | −0.04 | 0.09 |
Constructions Frequency (log) | −0.08 | −0.12 | −0.03 | −0.10 | −0.04 |
Lemmas in Reference (prop) | −0.09 | −0.11 | −0.10 | −0.07 | −0.05 |
Combinations in Reference (prop) | 0.16 | 0.12 | 0.18 | 0.07 | 0.19 |
Model Statistics | R2 = 0.11 | R2 = 0.14 | R2 = 0.09 | R2 = 0.14 | R2 = 0.13 |
F(18,36,206) = 256.38 | F(18,8729) = 79.13 | F(18,9607) = 53.42 | F(18,6305) = 55.58 | F(18,11,562) = 70.48 | |
p < 0.001 | p < 0.001 | p < 0.001 | p < 0.001 | p < 0.001 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Goldshtein, M.; Alhashim, A.G.; Roscoe, R.D. An NLP-Based Exploration of Variance in Student Writing and Syntax: Implications for Automated Writing Evaluation. Computers 2024, 13, 160. https://doi.org/10.3390/computers13070160
Goldshtein M, Alhashim AG, Roscoe RD. An NLP-Based Exploration of Variance in Student Writing and Syntax: Implications for Automated Writing Evaluation. Computers. 2024; 13(7):160. https://doi.org/10.3390/computers13070160
Chicago/Turabian StyleGoldshtein, Maria, Amin G. Alhashim, and Rod D. Roscoe. 2024. "An NLP-Based Exploration of Variance in Student Writing and Syntax: Implications for Automated Writing Evaluation" Computers 13, no. 7: 160. https://doi.org/10.3390/computers13070160
APA StyleGoldshtein, M., Alhashim, A. G., & Roscoe, R. D. (2024). An NLP-Based Exploration of Variance in Student Writing and Syntax: Implications for Automated Writing Evaluation. Computers, 13(7), 160. https://doi.org/10.3390/computers13070160