1. Introduction
The rapid advancement of technology has had a transformative impact on various aspects of people’s lives, including education (
Hsu et al. 2021;
Pavlik 2023;
Khan et al. 2023). This impact was particularly evident during and after the COVID-19 pandemic, as technology played a crucial role in facilitating remote learning and addressing educational challenges. Technology not only increased students’ motivation (
Lee et al. 2022) and engagement (
Huang et al. 2023) but also contributed to empowering K–12 students and reducing educational inequities (
Gibbons 2021).
AI-powered dialogue systems, commonly referred to as chatbots, have emerged as human-like conversational agents (
Zhai and Wibowo 2023). A chatbot, which is a computer program, responds like a smart entity when conversed with through text or speech and understands one or more human languages through natural language processing (
Khanna et al. 2015). In November 2022, ChatGPT, a publicly available cutting-edge chatbot that can generate human-like conversation through text-to-text or text-to-speech prompts, was developed by OpenAI.
With the growing interest in leveraging AI for educational purposes as well as the increasing accessibility of and advancements in AI technologies, educators have been exploring ways to integrate AI into the language learning classroom (
Bandameedi 2022;
Yang et al. 2022). Previous studies have demonstrated the benefits of using AI in language learning in terms of four aspects: learners’ listening, speaking, reading, and writing (
Kim 2022;
Chen et al. 2022). By leveraging the interactional function of large language models, language learners can engage in interactive communication (
Yan and Xia 2023;
Yang et al. 2022) and receive near-immediate feedback (
Britt et al. 2004) and error correction (
Yan 2023) from applications such as ChatGPT, thereby improving their writing skills and enlarging their vocabulary.
Educational policymakers and administrators are increasingly recognizing globalization as a future trend. As a result, a growing number of schools, including private schools, public schools, and charter schools, have started providing foreign language classes (
Caruana 2017). In the United States, a report from the US Department of Education (
Mitchell 2021) shows that Mandarin is the second most popular dual-language program offered by individual states. While this presents a positive opportunity for students to learn Mandarin at school, it also raises concerns for students from low-income families. Learning Chinese can be challenging for those whose first language is English because Mandarin uses syllabic characters, while English is based on the Roman alphabet. Therefore, Mandarin learners may require additional support and assistance outside of school hours. Although students attending private schools may have greater access to the financial resources that can help them afford a Mandarin tutor, those from low-income families may need a more cost-effective option. This disparity creates an educational inequality that ChatGPT has the potential to mitigate.
This research adopted the single case design method to explore the effectiveness of the after school use of ChatGPT in improving Chinese language learners’ (CLLs) writing skills. By piloting applications of ChatGPT, our research aims to triangulate data resources by gathering writing samples, writing scores, and learners’ reflections to enhance the trustworthiness of our data and provide insights into the following research question: Is there a functional relation between Chinese language learners from low-income families using ChatGPT after school twice a week and improvements in their Chinese writing scores?
The significance of this study lies in its theoretical and practical contribution toward promoting educational equity and equality for second language learners from low-income backgrounds.
2. AI in Second Language Writing
The current research highlights the potential of ChatGPT to improve users’ writing skills (
Stokel-Walker 2022;
Dergaa et al. 2023). However, a limited body of research addresses the limitations of ChatGPT in the context of foreign language writing. Therefore, this paper aimed to explore this question specifically from the perspective of using large language models to support second language (L2) writing.
AI has a long history of helping language learners improve their writing skills by providing immediate feedback. As early as 2004, Britt et al. conducted research on AI’s effectiveness in academic writing. They implemented an app called the Sourcer’s Apprentice Intelligence Feedback mechanism (SAIF) as an intervention. SAIF can immediately identify instances of plagiarism and unquoted sentences, provide revised feedback, and guide students to revise their essays accordingly. The result revealed that students using SAIF demonstrated more vital academic writing skills than their counterparts.
Another AI system, Writing Pal (W-Pal), has been developed to support students in various stages of the writing process, such as the introductory, body, and concluding paragraphs (
Johnson et al. 2017). By breaking down the writing process and providing scaffolding, learners engaging with an AI writing tool have superior writing skills (
Roscoe et al. 2011) and adopt sophisticated writing strategies (
Roscoe et al. 2013).
In addition, students have shown improvements in paragraph writing when requesting feedback from an AI. Forty students from Al-Balqa Applied University used Paragraph Punch Software (PPS) as a writing intervention. PPS aids writers in developing well-structured paragraphs by assisting them with topic sentences, supporting evidence, and wording, as well as by correcting grammatical errors. The results indicated that students’ paragraph writing had improved significantly with more precise sentences, sounder structures, and fewer grammatical errors (
Alotaibi and Alzu’bi 2022).
ChatGPT, a newer type of AI chatbot with a radically different underpinning, has gained popularity and acceptance for various purposes (
Sharma and Sharma 2023;
Short and Short 2023;
Pavlik 2023). Students have also exhibited a positive attitude toward ChatGPT in L2 writing. In an exploratory study by
Yan (
2023), students expressed surprise at ChatGPT’s speedy responses, high-quality context, and various text styles. The students perceived ChatGPT as a potent tool for L2 writing.
3. Research Question
Is there a functional relation between Chinese language learners from low-income families using ChatGPT after school twice a week and improvements in their Chinese writing?
4. Methodology
This research received approval from Fordham University’s Institutional Review Board (IRB). The study adopted an ABA design for participants (
Kratochwill et al. 2010), where A corresponds to the baseline phase and B refers to the intervention phase. The last A represents the reversal phase, presenting the data after the treatment is taken away. ABA is one type of single case design (i.e., single subject research) to evaluate the effectiveness of a treatment or intervention (
Kazdin 2011). In this study, the initial A signifies the baseline, for which we adopted each participant’s last three Chinese writing scores. “B” corresponds to the intervention phase, in which we evaluated the participants’ scores when they used ChatGPT at home. The second “A” designates the reversal phase, in which the students were prohibited from using ChatGPT, allowing us to observe changes in their scores. The researchers studied the three summative scores after the treatment.
The researchers could determine if the treatment was effective based on the data trend in the baseline, intervention, and reversal phases. The researchers also combined qualitative research to acquire an in-depth understanding of the students’ use of ChatGPT to support and enhance their writing skills while learning from home.
We acknowledged the use of ChatGPT 3.5 (Open AI,
https://chat.openai.com/) to proofread our initial draft. ChatGPT helped us correct the grammar and unclear sentence structures. We invited three native speakers to proofread the final version of our articles. They changed the wording, grammar, and provided suggestions for some unclear content for us. Based on these suggestions and recommendations, we finished the published version of this article.
4.1. Study Design
4.1.1. Setting
This study occurred in a high school with the International Baccalaureate Program (IB) in the Bronx, NY, USA, from April 2023 to May 2023, spanning a duration of three weeks. The student population at this high school is predominantly Hispanic-American. Participants were from Hispanic families but were born and raised in the United States as native English speakers. However, no state-wide standardized test data showed their English Language Arts (ELA) and Math levels because these assessments were not administered in 2020 and were optional for students to take in 2021 due to the COVID-19 pandemic. Their ELA results showed an average English level appropriate for 9th graders.
4.1.2. Participants
Four participants were recruited from the Mandarin class, as having multiple study replications enhances the credibility of the findings (
Kratochwill et al. 2010). The participants were 9th-grade students whose ages ranged from 14 years old to 16 years old. They had no background in Mandarin and were learning it for the first time. They were all emergent Mandarin learners but had different proficiency levels. They were fluent in English and Spanish. At the onset of the study, the participants struggled with Chinese writing due to the lack of support and immediate feedback at home, as they needed someone to assist them with their Mandarin writing when questions arose. Many of the students come from low-income families, making it challenging for them to afford additional tutoring outside of school hours. Based on the New York Government Policy, Income below
$68,720 for a family of four is considered as low-income families.
4.1.3. Intervention and Procedure
The intervention procedure is outlined in
Figure 1. The intervention began with the teacher introducing ChatGPT to the class, demonstrating how to create an account and how to prompt ChatGPT. The teacher planned five 45-minute lessons to teach students how to use ChatGPT. The lesson details are in
Figure 2.
Before students started using ChatGPT as a tool to help with their writing at home, the teacher gave students a level-appropriate in-class writing assignment to assess their mastery of their new skills. The teacher observed the students’ ability to compose clear, context-rich prompts in Mandarin to obtain cogent ChatGPT responses. Additionally, the teacher provided one-on-one feedback to each student and encouraged peer evaluation. Furthermore, the teacher analyzed students’ writing through grammar, vocabulary, and comprehension exercises to determine whether or not students could effectively use ChatGPT. If so, students would use ChatGPT after school to assist with their Chinese writing. If not, the teacher would provide more one-on-one modeling and feedback to students. The teacher constantly monitored participants’ skills throughout the study by checking in weekly and scheduling individual meetings with them.
The teacher also conducted ongoing workshops, which happened for 30 min during Mandarin class every week. These ongoing workshops were designed for students to utilize ChatGPT effectively for their Chinese writing homework after students started using ChatGPT at home.
The detailed workshop content is in
Figure 3.
While the intervention was available for the entire class, this study aimed to examine the empowering effects of ChatGPT on their second language writing skills. Thus, students who expressed interest in writing and demonstrated a desire to use ChatGPT at home volunteered to participate in the study. They were selected during the five series sessions. Although the participants had different proficiency levels in Mandarin, they were emergent Mandarin learners.
4.2. Measure
The baseline for students’ writing scores was established using three summative writing scores per participant taken 15 days before the intervention. These scores were reflected in session 1, session 2, and session 3.
During the intervention, participants were assigned writing homework twice a week for the teacher to closely analyze the effectiveness of ChatGPT. The writing homework included the topics of self-introduction (greeting, name, age, grade, and date), family members (creative writing, such as songs and poems), and interviews (interviewing classmates and writing down the dialogue). These three topics were assigned to students each week representatively.
Participants were asked to write on Google Docs one day and on paper the other day. The total duration of the intervention was three weeks, and a total of six writing assignments were distributed.
Participants were allowed to use ChatGPT to develop ideas for writing and revising their work. However, copying and pasting from ChatGPT was not permitted, and the teacher emphasized the importance of maintaining academic integrity. Furthermore, although it has the appearance of a calculator, ChatGPT predicts answers; it does not give definitive results. Therefore, a cut-and-paste approach to academics using the current level of sophistication is not likely to result in correct phrases. After three weeks of intervention, students were not allowed to use ChatGPT when they were assigned writing homework. The teacher worked with IT support to block ChatGPT in the students’ Chromebooks.
The same grading rubric (see
Appendix A) that we used for the baseline was employed. Because IB schools have adopted IB criteria to score students, this study used the IB writing rubric to assess students. Another Mandarin teacher with seven years of IB experience who taught full-time in another school graded their writing in order to avoid a conflict of interest and enhance the data’s trustworthiness. The Mandarin teacher is a first-generation Chinese immigrant in the United States. She is fluent in listening, speaking, reading, and writing Mandarin.
During the intervention phase, participants were also asked to write weekly reflections. They could write their reflections on their feelings about using ChatGPT on paper or on Google Docs. They shared their reflections with the teacher and turned them into the teacher.
4.3. Data Analysis
To ensure the data’ trustworthiness and credibility, they were triangulated by incorporating the participants’ writing samples, writing scores, and reflections. This process helped the researchers better understand how ChatGPT empowered students from low-income families to learn a new language, Mandarin.
The teacher collected the writing samples from the participants throughout the intervention phase and then evaluated the samples using the writing rubric (see
Appendix A) consistently applied during the baseline and intervention phases. The rubric scores ranged from 1 to 8, with a score of three to be considered passing. One is the lowest score and eight is the highest score. Three, a passing score, means learners are able to use a basic range of vocabulary, grammatical structures with some errors, and communicate some relevant information. The criteria assessed in the rubric included vocabulary, grammatical structure, organization, and genre. In addition to the writing samples, students’ writing scores were compared before, during, and after the intervention, enabling a quantitative assessment of the impact of ChatGPT on their writing abilities. The researchers used
Miller’s (
1985) analysis method to analyze the data. This method requires only analyzing the last three data sets from each phase because it is believed that the participants need time to receive treatment and for the treatment to take effect.
Furthermore, students’ reflections on their experience using ChatGPT as an intervention tool were collected and analyzed. These reflections would provide valuable insights into the students’ perspectives, allowing researchers to understand how ChatGPT may affect their language learning journey.
5. Results and Findings
5.1. Writing Scores
The sample comprised four individuals, including two females and two males. They were all emergent Chinese language learners with different proficiency levels in Mandarin. Throughout the intervention, all the participants demonstrated varying degrees of improvement in their writing scores. The scoring Mandarin teacher is an IB teacher and is very familiar with the grading rubric. The following figures demonstrate the four participants’ writing scores before, during, and after the intervention.
The y-axis score ranges from 0 to 8. Three is considered a passing score. A score below three means failing. The higher the score, the more sophisticated Mandarin writing skills the student demonstrated.
Appendix A shows the rubric and its descriptors.
As
Figure 4 shows, the data from Participant 1 show a mean of 5.5, ranging between 5 and 6 during the baseline phase. During the intervention phase, his mean score increased to 7.8, with a range from 5 to 8, which indicates a significant improvement in his writing scores. His writing score somewhat decreased to a mean of 6.3, with a range from 6 to 7 during the reversal phase. However, his mean score during the reversal phase is higher than the baseline. Overall, the data obtained from Participant 1 indicate a functional relationship between his writing scores and use of ChatGPT.
The data from Participant 2 show a mean of 1.7, ranging from 1 to 2 during the baseline phase. During the intervention phase, the mean score climbed to 3.8, with a range from three to five, showing a noticeable increase in his writing scores. However, the participant’s writing score showed a downward trend to a mean of 2.3 with a range from 2 to 3 during the reversal phase. His mean score during the reversal phase is still higher than the baseline. Thus, the data gained from Participant 2 demonstrate a functional relationship between his writing scores and use of ChatGPT.
Participant 3’s data show a mean of 1.3 with a range between 1 and 2. Notably, during the intervention phase, his mean score grew to 3.5, ranging from 3 to 4. During the reversal phase, the participant’s writing score dropped between 2 and 3, and the mean was 2.3. However, it is important to highlight that Participant 3’s post-intervention mean score still surpassed the baseline mean score. Therefore, the data collected from Participant 3 illustrates a functional relationship between their writing scores and use of ChatGPT.
Participant 4’s baseline data show a mean of 4, ranging between 5 and 6. A noteworthy rising trend was observed during the intervention, with a mean score of 6.5, ranging from 5 to 8. Following the removal of the intervention, Participant 4’s mean scores showed a declining trend from 5 to 6 with a mean of 5.7. It is important to note that despite the decrease in scores during the reversal phase, Participant 4’s mean writing score is higher than the baseline. Thus, the data collected from Participant 4 display a functional relationship between their writing scores and use of ChatGPT.
In conclusion, our findings indicate that all four participants improved their writing scores during the interventions, using ChatGPT as a tool for support at home. The participants with lower baseline scores exhibited a significant and immediate improvement in writing when the intervention was implemented. After receiving the one-week intervention, participants with higher baseline scores demonstrated a more gradual and modest improvement in their writing. While their progress may have been less dramatic, it still indicates that they benefited from the intervention. However, all participants experienced somewhat decreased writing scores after removing the intervention. Despite this decline, their scores in the reversal phase remained higher than their respective baselines. This trend suggests that the participants could sustain some of the benefits gained during the intervention period.
5.2. Writing Sample
Analyzing the students’ writing samples shows that they can correct errors and use well-constructed sentences in their Chinese writing with ChatGPT’s assistance. For example, Participant 2 and Participant 4 notably improved their writing by utilizing more comprehensive and detailed sentences, moving beyond simple words or short phrases. As seen in
Figure 5, Participant 2 made several errors in the original version.For example, Participant 2 wrote “什么” as “么吗”. He also missed the measure word “几” in the question of “你有几个姐姐?” After Participant 2 utilized ChatGPT, there was only a few minor errors in the writing shown in
Figure 6. Participant 2 wrote “玛丽” as ”吗丽”. But there is an obvious improvement in sentence structures. Participant 2 used more complex sentence structures after using ChatGPT. In
Figure 7, Participant 4 could only write some single words that did not make any sense, and some of them were wrong. However, In
Figure 8, Participant 2 was able to write some sentences with the help of ChatGPT, although there were a few wrong words.
In addition to being able to use well-structured sentences, the online writing samples from Participant 1 and Participant 3 illustrate their capability to rectify incorrect characters and produce cogent compositions with the guidance of ChatGPT. The participants may have needed to be made aware of certain inappropriate characters because some characters share the same pronunciation. However, after receiving ChatGPT’s feedback, the participants could self-correct these errors. For example, in
Figure 9, Participant 1 typed “鸡肉炒饭” as” 鸡肉炒反”. Participant 1 revised it after using ChatGPT and made the conversation stronger by adding a closing. In
Figure 10, Participant 3 typed “再见” as “在天” in the left chart. Participant 3 noticed and revised this typo after using ChatGPT in the right chart.
5.3. Students’ Reflection
In the students’ reflections on using ChatGPT for L2 writing, there was a shared sense of excitement about having a chatbot that closely resembled human interaction and could provide immediate feedback on their work. One participant expressed their enthusiasm as follows:
ChatGPT is such a great tool to use after school. I enjoyed how it quickly responded to my questions. I feel like my Mandarin teacher is next to me to help me with my Mandarin homework and writing. I used to need to save my questions and wait until the day that I had a Mandarin class to ask my Mandarin teacher. However, I usually forgot my questions when I saw my Mandarin teacher because the time passed. This does not happen to me anymore with the help of ChatGPT. I think I am making progress in Mandarin writing because of ChatGPT.
(Participant 1)
In addition, the participants expressed a sense of empowerment in their foreign language acquisition journey through ChatGPT. They appreciated having a tool they could rely on when feeling lost or encountering difficulties. Two participants reflected this sentiment:
To be honest, I have no idea what to do for Mandarin writing homework. I don’t know where to start because I don’t know anything. No one can help me at home since my parents don’t speak Mandarin … I initially felt frustrated because I wanted to improve my Mandarin scores, but I didn’t know what to do. I appreciate my Mandarin teacher introducing ChatGPT to me. It is beneficial! I don’t feel frustrated now and feel more confident in my Mandarin work.
(Participant 4)
I love ChatGPT! It is my best friend now. ChatGPT empowers me. I have a friend who is from Manhattan, and he is also learning Mandarin. His parents hire a tutor for him once a week to assist him in his Mandarin learning. Nevertheless, I am not jealous about it at all because I also have my private tutor-ChatGPT! It always answers all the questions I ask. I love it.
(Participant 2)
6. Discussion
This study aimed to understand whether ChatGPT can empower students from low-income families to learn a new language by improving their writing scores and skills after school without incurring additional costs. One should note that OpenAI ChatGPT version 3.5 and later versions are not indefinitely ‘free’, and prompting it enough times will trigger a request to purchase ‘compute time’. It takes ‘compute time’ to predict what is the most appropriate response to a given prompt. Three important findings emerged from the study:
ChatGPT’s effectiveness in improving writing skills: The findings revealed that ChatGPT can assist Chinese language learners in enhancing their writing skills by correcting erroneous characters and developing well-structured sentences.
Empowerment through the use of ChatGPT: Participants expressed a sense of empowerment in using ChatGPT at home, as it provided them with valuable assistance when they encountered difficulties or felt lost in their language learning journey.
Writing score improvement after using ChatGPT: All of the participants showed an increase in their writing scores during the intervention. Participants with lower scores at baseline demonstrated immediate and significant improvements in their Chinese writing scores, while participants with higher scores at baseline presented a gradual and more moderate improvement in their writing scores.
Although the participants were able to rectify character errors in their online writing, their paper-based writing still contained numerous mistakes. The participants could copy and paste their online writing content to ChatGPT to check their writing for accuracy. Yet, they needed help putting their writing into ChatGPT when they were writing on paper.
Participants with lower baseline scores benefited from ChatGPT’s assistance immediately because ChatGPT scaffolded the participants to be able to write correct characters and construct words into coherent sentences, addressing their common struggles in character writing and sentence formation. Participants with higher levels of proficiency in Mandarin needed more time to familiarize themselves with using ChatGPT and incorporate its suggestions and feedback into their writing. Because ChatGPT provides likely responses to a prompt, the student must be able to recognize when parts of a response or the whole response are incorrect. It is important to acknowledge that the benefits of ChatGPT extend beyond students from low-income backgrounds. While this study specifically focused on students from low-income families, the findings suggest that ChatGPT can be a valuable tool for students across different socioeconomic backgrounds and academic levels.
Maintaining academic integrity is a legitimate concern for teachers and parents. Ensuring that ChatGPT serves as a helper instead of an “author” is essential when students use it. Teaching students to use proper citation techniques can be a solution in this regard. Some variants of OpenAI’s ChatGPT, such as Bing-Chat, provide references in their responses.
Some limitations are acknowledged in this study. First, when educators face difficulties in precisely gauging students’ ELA and Math levels, it becomes challenging for researchers to obtain a clear and accurate representation of their learning aptitudes. This ambiguity of assessment hampers their ability to design effective interventions and educational programs that cater to the diverse needs of students. Second, the duration of the treatment period affects the impact on participants’ writing skills.
The current findings have significant implications for future research and practice in language learning, educational technology, and education equity and equality. First, future studies can explore the scalability and applicability of ChatGPT in different educational contexts and with larger sample sizes. Replicating and expanding this research will help validate the effectiveness of large language models in improving writing skills across diverse student populations. Second, it is important to assess long-term impacts to determine the sustained effects of using ChatGPT-like applications on language acquisition and writing skills. Examining the persistence of improvements over an extended period will provide valuable insights into the long-term benefits of incorporating ChatGPT as a tool for language learning.
Last, the potential of ChatGPT-like applications to narrow educational inequity and inequality is critical. By providing instant low-cost learning support, accessible language practice, and one-on-one tailored learning experiences, AI-driven language learning can revolutionize education for students from low-income families. As we continue to harness the capabilities of ChatGPT-like applications, it is crucial for us to recognize its potential to create a more equitable and inclusive education landscape for all.
Furthermore, future research should focus on developing strategies and guidelines for integrating large-language models into the curriculum effectively. This includes exploring ways to ensure students use large-language models as a supportive tool, rather than relying solely on it, while emphasizing the importance of academic integrity and proper citation practices. Lastly, additional studies should explore the potential of ChatGPT-like applications in other aspects of language skills beyond writing, such as speaking and listening. Investigating its effectiveness in these areas will contribute to a more comprehensive understanding of an AI’s impact on overall language proficiency. This study paves the way for future research to explore the broader applications and benefits of ChatGPT-like applications in language learning and to further enhance its effectiveness as a supportive tool for students from low-income backgrounds.