Next Article in Journal
Phonetic Diversity vs. Sociolinguistic and Phonological Patterning of R in Québec French
Previous Article in Journal / Special Issue
Improving Classroom Teaching and Learning of Multi-Word Expressions for Conversational Use Through Action Research with Learner Feedback
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Retelling of Stories with Common Phrasal Expressions by High-Proficiency Learners: Implications for Learning and High-Stakes Testing

by
David Gregory Coulson
Graduate School of Language Education and Information, Ritsumeikan University, Kyoto 603-8577, Japan
Languages 2024, 9(11), 337; https://doi.org/10.3390/languages9110337
Submission received: 29 July 2024 / Revised: 21 October 2024 / Accepted: 22 October 2024 / Published: 29 October 2024

Abstract

:
The goal of this research was to investigate how well L2 English speakers of different proficiency levels were able to perform on a test of auditory memory like that used in the Wechsler Memory Scale (WMS). In this test, participants must retell heard stories. While the validity of elicited imitation tests is well established in second-language acquisition research, the effectiveness of using retelling as a neuropsychological measurement when a language other than the test taker’s L1 is used is unclear. Further, due to their importance in memory function, this study also analyzed the role of common phrasal expressions in how well participants retold stories in three groups of different proficiency. The results indicated that the increase in scores in the retelling of stories aligned with the proficiency level of the non-native participants. Higher-proficiency NNSs were also able to use more of the commonly used spoken phrases in their retelling. Further, there was no difference in this measure between the higher-proficiency Second-Language English and First-Language English participants. While the effectiveness of this test method as a neuropsychological tool when a language other than the test taker’s L1 is used is unclear, these results indicate significant progress by the higher-proficiency participants. Nevertheless, given that this test is often used in this context with immigrants and minority language patients, doubts remain about its suitability for less proficient individuals.

1. Introduction

Working memory (WM) is an essential mechanism for language use that allows humans to process briefly stored information. Through focused attention, it is possible to manipulate information necessary for the completion of particular tasks and to suppress that which is not relevant to the task in hand. According to Haarman (2013), WM can be used with “tasks that require immediate recall of a list of items, either in the form of a simple span or complex span task” (p. 696). This can be assessed through both reading and listening, during which participants are presented with information and are then asked to recall it. The ability to recall information is helped by the presence of prefabricated chunks of language that facilitate the production and recall of information. However, as Siyanova-Chanturia and Janssen (2018, p. 2010) explained, the benefit of these chunks is much greater for native speakers than non-native speakers, for whom phrases are frequently unfamiliar. Formulaic phrases are generally processed faster than novel phrases by native speakers. These help speakers to bypass short-term memory, and this leads to faster, more automatic processing. This characteristic of first-language skill is due to the massive amount of input that is available to people who have grown up with a particular language. This also implies that non-native speakers are at a disadvantage in the processing of these phrases.
As a widely used test of neuropsychological function, The Wechsler Memory Scale (WMS) measures various aspects of both short-term, long-term and working memory ability. Subtests include auditory, visual, immediate and delayed memory. One of the tasks, called the Logical Memory Subtest (LMS), involves story retelling during which a clinician reads short stories to the patient, who is then asked to retell the story immediately. These are significantly longer than texts used in Elicited Imitation (EI) tests that have been widely researched in second-language learning and found to be reliable, valid measures of second-language proficiency (e.g., McManus and Liu 2020).
Phonological short-term memory (PSTM) refers to the ability to temporarily store and recall speech sounds for a short period of time (Baddeley 2015). This cognitive function is essential for the manipulation of auditory information involved in the understanding and production of language. Usually conducted in a speaker’s L1, the Wechsler Memory Scale (WMS) can be used to evaluate PSTM by having participants listen to and verbally recall stories told during the test (Gallagher and Azuma 2018). This provides insights into an individual’s memory retention and linguistic capabilities, which are crucial for diagnosing cognitive impairments, language disorders and other neuropsychological conditions.
Research on second-language acquisition has shown that PSTM also plays a significant role in L2 acquisition, supporting not only vocabulary acquisition but also general proficiency. For example, Kormos and Sáfár (2008) suggested that the ability to access and encode vocabulary quickly, and to fluently form lexical chunks, is related to phonological short-term memory. During the listening to, recalling and retelling of stories, L1 speakers have distinct advantages over L2 speakers. This is because listening relies much more on top-down processing than reading (Qian and Lin 2020) due to its cursory nature. Understanding of spoken language requires the activation of short-term working memory, and L1 speakers have a much more strongly developed ability of a language. Further, L1 speakers can draw upon a more extensive, and deeply entrenched, lexical and syntactic knowledge base, which allows them to encode and retrieve information more efficiently. They are also typically more familiar with the cultural and contextual nuances embedded in stories, aiding their recall and retelling abilities. Importantly, due to their large and strongly automatized phrasal knowledge, L1 speakers devote fewer cognitive resources to processing language. There is a clear advantage from all kinds of formulaic language, compared to non-formulaic language, to the speed of understanding (e.g., Carrol and Conklin 2020), freeing up more working memory (the fundamental cognitive apparatus for the storage and processing of information) which can be used to recall the details of a story. In contrast, L2 speakers often need to allocate more cognitive resources to understand and produce the language itself, leaving less available for recalling story details and structuring a coherent retelling.
This study examines the effects that the factors described above may have on L2 users at different proficiency levels. This is important since it is not uncommon for cognitive tests such as the WMS to be given to immigrants or minority language patients in their L2. Therefore, this study examines how the ability to use multiword expressions (MWEs) assists in the retelling of stories used in this kind of assessment on different groups of participants, including L1, medium- and high-proficiency L2 speakers.

2. Literature Review

2.1. Memory and Formulaic Language in Native Speakers and Second-Language Users

Research consistently shows that a substantial portion of language is composed of MWEs that frequently co-occur in both spoken and written texts. For example, a review of relevant research on the proportion of formulaic phrases in spoken and written discourse by Conklin and Schmitt (2012) indicated that they constitute up to about half of such texts.
The processing of both words and phrases in memory is essential to language use. However, non-native speakers are often at a disadvantage in the amount of exposure to a second language they receive, whereas monolinguals are able to process language faster thanks to far richer exposure which results in “lexical entrenchment” (Conklin and Thul 2023, p. 208). In particular, during listening and reading, MWEs are processed faster and to a greater degree, the higher their frequency is. Conversely, non-native speakers often do not process MWEs as units, mainly due to a lack of encounters. It is reported that even highly proficient learners do not derive as large a benefit from formulaic language as L1 speakers. For example, research by Lundell and Lindqvist (2013) with advanced French L2 learners examined whether lexical ability continues to develop after reaching a very high level of proficiency. Although their Swedish participants had lived, on average, 14.5 years in France, the results of their study showed no significant correlation between lexical ability and the length of residence, although several L2 subjects had reached a similar score as the NS subjects. Similar results are also found in more recent research. The results of a meta-analysis on the processing advantage of multiword expressions by Yi and Zhong (2023) also supported the centrality of formulaic language. Both L2 as well as L1 speakers derived a lot of benefit from the processability of such language. The authors stated that in certain tasks, including natural reading, “L2 speakers may also engage in implicit processing of MWSs” (p. 19), although the advantage was slightly greater for L1 speakers. Nevertheless, less proficient L2 learners usually process formulaic language on a word-by-word basis.
A significant benefit of formulaic language is that it can increase the amount of linguistic information that a speaker can deal with. Yang (2015) described how advanced Chinese L1 speakers of English were able to recall more words than mid-proficiency participants in a reading span task. Nevertheless, L2 speakers do not enjoy as strong an implicit lexical ability as L1 speakers (e.g., Conklin and Schmitt 2012).
Research by Sonbul (2015) showed a significant difference in the speed of processing of higher and lower-frequency collocations both in L1 and high-proficiency L2 speakers. Schmitt et al. (2004) compared the ability of L1 speakers, L2 post-graduate students and overseas scholars (with very advanced English proficiency). The test involved the participants listening to stretches of dictations which were beyond regular short-term memory span. Even some of the First-Language English (FLE) participants were unable to express the content of the texts with all the formulaic phrases intact. However, their ability was beyond that of the Second-Language English (SLE) participants. The latter participants recreated the texts by focusing on key content words, which was probably due to their lack of knowledge of recurrent clusters that had been derived from a corpus for the study.
Finally, and of particular relevance to the focus of this study, research by Kisser et al. (2012) examined the performance of FLE and advanced SLE participants on various neuropsychological tests. All were university students in America. Their results showed that while the results of non-language dependent tasks were at a similar level, there was a small but statistically significant gap between the two groups on tests involving language processing.
Taken together, the role of MWEs, including phrases, lexical bundles and collocations, can be clearly understood as facilitative of language processing, assisting with the understanding and production of language. Nevertheless, in second-language acquisition, a persistent gap often remains between L1 speakers and even very advanced L2 language users in receptive and productive skill associated with MWEs.

2.2. The Phrasal Expressions List

Collocations are “notoriously difficult for learners” according to Paquot and Granger (2012, p. 130) and “pose considerable difficulties, even for the advanced learner” (Nesselhauf 2005, p. 2). Studies have shown that L2 learners are just as likely to produce incorrectly collocated bundles as they are to produce correct collocations because there is a tendency on the part of L2 users to focus on the key words of the collocations rather than the whole formulaic expression (Hoang and Boers 2016). Because of this, it can be challenging to decide what constitutes formulaic language. In order to create a useful test of implicit collocational skill, it is crucial to select phrases that have been carefully validated. To this end, the Phrasal Expressions List (PHRASE list), created by Martinez and Schmitt (2012), is a highly valuable resource. It contains 505 MWEs that are used in spoken, written and academic English. The phrases mainly consist of words from the most frequent 2000 words. At least one of the following criteria for the inclusion of phrases in the list had to be met: (a) whether they were Morpheme Equivalent Units, meaning a phrase that is processed as one unit with a single meaning; (b) whether the expressions were semantically transparent; (c) whether they were ‘deceptively transparent’ (p. 308). In particular, the authors considered that it would not be useful to include phrases whose meanings can be easily guessed from their surface form. Rather, more opaque phrases such as every so often (which might be misunderstood by learners as meaning often whereas it actually means sometimes) were included.
In the PHRASE list, the typical discourse genre in which each of the 505 phrases is used is shown. These are either Spoken general, Written general or Written academic. Table 1 shows two examples from the list: the most common phrase (appearing over 83 thousand times per 100 million words) is have to. It is commonly used in Spoken general discourse, as indicated by the three stars (***) in that column. The phrase as long as is used commonly in both spoken and written genres, as also shown by the three stars in those columns. However, neither of these expressions are used commonly in written academic texts, as indicated by the single star (*) in that column.
As described below, for this research, phrases with a three-star rating on the Spoken general category were selected for use in the creation of the short stories for the EI retelling task. Phrases from the Spoken category were considered to be the most appropriate because the stories would be given orally and also because these are the phrases the L2 learners would be more likely to acquire in their daily lives in New Zealand.

2.3. Testing of Implicit Knowledge of Second Languages

Elicited Imitation (EI) is a testing method in which participants listen to a stimulus and are then instructed to repeat it as accurately as possible (Kostromitina and Plonsky 2022). It has been used widely as a methodology for assessing the implicit ability of second-language learners and users, particularly in the area of grammatical accuracy. According to Solon et al. (2019, p. 1030), the demands placed on non-native speakers by EI are greater than those on native speakers. The authors stated that native speakers can rely on automatic processing during EI, whereas the burden on the working memory capacity of SLEs may be higher. Kostromitina and Plonsky (2022) reported in their meta-analysis that the longer the sentences were, the greater the cognitive load for participants became. As such, their reliance on working memory decreased, and they had to rely more on their linguistic knowledge. Further, Solon and Park (2024) pointed out that while elicited imitation has a solid research background, more validation is necessary to investigate its complex nature. Particularly concerning the focus of this study, more research is necessary to clarify whether EI is as equally effective for learner populations beyond the usual focus on higher-proficiency participants. Concerning this, in research by Stålhammar et al. (2022), the ability of both native and non-native, but fluent, Swedish speakers was assessed using a neuropsychological test. It was found that the native speakers performed significantly better than the non-native participants on subtests with a verbal component. These individuals were from 34 different countries and so represented a wide range of language backgrounds. This is important as participants whose first language was further from Swedish scored lower (p. 832).
Research, such as that by Deygers (2020), has shown that native speakers are better able to achieve high scores on EI tests than non-native speakers, due to their solid command of the language that the EI test is given in. In particular, working memory span in humans is not long, and if the length of the test item exceeds this, testees have to rely more on using implicit linguistic knowledge to provide as accurate a response as possible. As Deygers reported (p.934), “the rationale for the use of EI as a measure of language proficiency relies on the consensus within the field of neuropsychology that working memory cannot hold on to aural information for very long”. This is relevant to the present study as there is a clear overlap between the EI methodology and the LMS test used in the WMS.

3. Background to the Current Study

It was the experience of the author to need to be clinically assessed on the Japanese version of the WMS. I was living in Japan at the time, and the assessment was only available to be conducted in Japanese, my third language, albeit one that I had been using for over 25 years.
As a non-native speaker of Japanese, the subtest which caused the most consternation was the LMS in which verbatim recall of several stories, one by one, is required. It became clear while trying to repeat the content of these stories that the clinician was discretely marking the number of correctly recalled parts (“idea units” as termed below) of the story. The stories correspond to a length of about 50 words in English. Therefore, this can be considered as prolonged form of the EI test format.
As seen in the review of related literature, this retelling approach is very challenging for non-native speakers, even for those with advanced command of the language being used. Upon receiving an initial explanation of the test content from the clinician, I expressed concern that having to recall the content in my third language was unlikely to result in a reliable assessment of my memory ability. While this concern about the validity of the WMS assessment was acknowledged, there was no alternative but to submit to this assessment as this was a Japanese-medium clinical environment.
As such, the clinical practice of assessing non-native speakers in their second language for memory skill, even for those of high proficiency, raises concerns about the validity of this approach. Concerning this, the main goal of this paper was to conduct an investigation into how well FLEs perform on such an assessment compared with SLEs. This issue is particularly relevant in an era of large-scale immigration in countries across the globe, which is only likely to increase in the near future.
In particular, as the focus of this assessment was to investigate short-term memory ability during retelling, it is clear that the role of MWEs is a very pertinent variable. As such, the research questions are:
  • How do SLE participants of different proficiency levels score on retelling stories of increasing length, compared to FLE participants?
  • How do high-frequency phrases contribute to the ability to retell the four stories of increasing length?
  • What is the contribution of the number of phrases used in each story to the score, and how does this relate to the proficiency level of the participants?

4. Participants

The SLE participants in this research were 42 L1 speakers of Mandarin who were enrolled as students in a university in New Zealand. There were two groups. Group 1 was composed of 20 undergraduate students with either IELTS 5.5 or 6.0 proficiency (referred to as ‘modest’ to ‘competent’ ability in IELTS bands). Their mean length of residence in New Zealand was 0.63 years (SD 1.19). Group 2 was composed of 20 mainly post-graduate students with IELTS 6.5 and above (‘competent’ to ‘good’ in IELTS). Their mean length of residence was 1.55 years (SD 1.34). Group 3 was composed of 12 FLEs.

5. Methodology

Four short stories were created for the story-retelling task (Appendix A). They were all 52 or 53 words in length. To investigate the retelling ability of participants at different levels of proficiency, the length of the EI sections increased across the four stories. Finally, story 4 approximately matched the length of the story retelling text used in the LMS.
Further, to investigate the contribution of phrasal knowledge to the recall of the stories, six MWEs from the PHRASE list were included in each of the four stories. To build confidence, and hopefully not overwhelm the participants, the first two stories were told in short sections, five in story 1 and four in story 2. The first story (five sections) is shown below. The pauses between the EI sections are marked by //, and the six phrases are highlighted in grey.
Story 1
Last year, John started university. // His mother said he should make meals on his own. // To make sure he did this, she taught him some recipes. // He managed to continue for a few weeks, but he soon gave this up. // In the end, he only ate high-calorie dishes in the campus cafeteria.
(52 words)
Story 3 (Medium) was told in two sections of 27 and 25 words. Finally, story 4 (53 words in length) was told with no pause. It is this fourth story which is closest to the conditions under which the story recall task in the LMS is conducted.
The six phrases in each story were selected from the ‘Spoken general’ column (each with three stars) in the PHRASE list. The average frequency of the phrases is 4939 per 100 million words. In addition, the four EI stories were written in simple English, with 88% of the words in the story coming from the first 1000-word band of the COCA list. The aim of this design was to strengthen the focus on the ability to retell the content and to assess the ability of the participants to productively recall the six phrases in each story.
The texts used in the four EI retelling tests for this research were written based on the general characteristics of the content of the Logical Memory Subtest. This is part of the WMS and is used in the assessment of memory disorders and cognitive impairment. Participants are asked to listen to short stories and then recall the content in as much detail as possible, both immediately and after a delay. The purpose is to assess attention, comprehension and working memory. The clinician briefly explains the procedure and simply asks that as much of the content as possible be repeated. Having taken the same LMS several times, I was able to construct similar texts with specific characteristics, described above, that were necessary for this research. It is important to state that these texts are completely new and do not infringe upon the copyright of the original tests.
The tests were conducted one-to-one in a quiet room. Participants’ responses to the EI tested were recorded. The participants were fully informed about the nature and purpose of the research. They all signed a consent form and were paid the equivalent of six US dollars for their time. Victoria University of Wellington, where the testing was conducted, gave the research procedure ethical approval. Short story 1 was read first and the participants’ recall was recorded. After breaks of about 20 seconds, stories 2, 3 and 4 were read. Participants’ recorded responses were subsequently checked carefully.
The test answers were assessed in the following way: the total number of “idea units” that could be repeated by the participants from the four texts was counted. Here, “idea units” refers to the discrete ideas contained in each of the texts to be used for the retelling task. The total maximum score for the four stories was 100 (Appendix B). In addition, the number of MWEs from the PHRASE list that were accurately recalled during the retelling was counted, with a maximum score of 24 (six in each story).
In summary, the retelling test developed for this study consists of four stories which were read out in sections of increasing length until the final uninterrupted whole story of 53 words was read. The participants were assessed on two measures: their immediate recall of the 100 idea units, and their accurate use of the 24 phrases while retelling the four stories.

6. Analysis and Results

Concerning the first research question (How do SLE participants of different proficiency score on retelling stories of increasing length, compared to FLE participants?), the results for the three groups (moderate, competent and FLE) on the three story lengths are shown in Figure 1. It can be seen that the scores for the moderate group fell most sharply after the short story, whereas the competent group showed a less pronounced decline. The scores for the competent and FLE groups showed a complex pattern. On the medium-length text (Story 3), the competent group scored higher than the FLE group. This did not occur for the long story 4. A more detailed analysis on this is provided below.
Multivariate Analysis of Variance (MANOVA) was conducted to examine the effects of proficiency levels and story length on students’ scores. There was a significant main effect for proficiency level, F(2, 235) = 120.681, p < 0.001, and a significant main effect for story length, F(2, 235) = 323.824, p < 0.001, on the total scores. However, the interaction between proficiency level and story length was not statistically significant, F(4, 235) = 1.968, p = 0.100. This indicates that while both proficiency level and story length independently affected students’ scores, the effect of story length on scores did not significantly differ across proficiency levels. That is, the change in total scores with increasing proficiency was consistent regardless of whether the story was short, medium or long.
Post hoc comparison tests were conducted with a Bonferroni correction to compare the mean differences in total scores across proficiency levels for each story length. The post hoc tests revealed that total scores significantly improved for all story lengths as proficiency levels increased: IELTS 5.5 and 6, IELTS 6 and above, and the FLE level. A significant improvement in total scores was observed as the proficiency level increased from 6.5 and above to FLE level for short story lengths, but no such significant improvement was recorded for the medium and long story lengths (Table 2).
Concerning the second research question (How do high-frequency phrases contribute to the ability to retell the four stories of increasing length?), a post hoc comparison test with a Bonferroni correction was conducted to examine whether the total number of phrases that participants used significantly increased as the proficiency level increased. The results showed that the number of phrases used increased significantly at the 0.01 level for all story lengths between the two IELTS groups (5.5 and 6 and 6.5 and above), and the IELTS 5.5 and 6 group and the FLE group. However, there was no significant increase in the number of phrases used by students between the 6.5 and above, and FLE proficiencies across all story lengths (see Table 3).
These results indicate that as proficiency level increased, students tended to use a greater number of phrases. This trend was consistent across all story lengths when comparing the IELTS groups 5.5 and 6 and 6.5 and above, and the IELTS 5.5 and 6 group and the FLE group. The fact that the IELTS competent group actually repeated more phrases in the long story than the FLE group (3.14 versus 2.80) suggests that while the FLE group may have been able to recall the events of the story better than the IELTS competent participants, this may have been due to the need to allocate fewer cognitive resources to process the linguistic features of the story. As a result, they may have been better able to recall the story correctly without the use of MWEs.
Concerning the third research question (What is the contribution of the number of phrases used in each story to the score, and how does this relate to the proficiency level of the participants?), in order to analyze the contribution of the number of phrases used in each story to the total idea unit score, a Linear Regression analysis was conducted utilizing Hayes’ Process v3.5. The aim of this analysis was to investigate the relationship between the number of phrases used in each story and the total idea unit score attained by the participants and to examine whether the proficiency level moderated this relationship.
For the first short story, the number of phrases explained 89.9% of the variability in scores (R2 = 0.899, F(1,59) = 522.50, p < 0.001). When both the number of phrases and proficiency level were included as predictors, they explained 93.6% of the variability (R2 = 0.936, F(3,57) = 279.75, p < 0.001). Similarly, for the second short story, the number of phrases explained 60.5% of the variability in scores (R2 = 0.605, F(1,59) = 90.448, p < 0.001), and the combined model with both predictors explained 72.3% of the variability (R2 = 0.723, F(3,57) = 49.557, p < 0.001). However, the interaction term between the number of phrases and the proficiency level was not significant for either of these stories, indicating that proficiency level did not significantly moderate the contribution of the number of phrases to the final score that participants received for retelling the four stories.
For the medium story, the regression analysis showed that the number of phrases accounted for 59.3% of the variability in scores (R2 = 0.593, F(1,59) = 86.06, p < 0.001). Including proficiency level as a predictor increased the explained variability to 78.2% (R2 = 0.782, F(5,55) = 39.520, p < 0.001). The interaction term was significant (∆R2 = 0.0877, ∆F(2,55) = 11.073, p = 0.0001), indicating that proficiency level significantly moderated the contribution of the number of phrases to the score for the medium-length story. The regression analysis showed that the number of phrases explained 67.0% of the variability in scores (R2 = 0.670, F(1,59) = 119.89, p < 0.001). The combined model with both predictors accounted for 79.4% of the variability (R2 = 0.794, F(5,55) = 42.51, p < 0.001). Similar to the medium-length story, the interaction term for the long story was significant (∆R2 = 0.0826, ∆F(2,55) = 11.05, p = 0.0001), showing that the level of proficiency significantly moderated the relationship between the number of phrases and the score for the long story.
Overall, the regression analyses indicated that the number of phrases used in the retelling of the stories significantly contributed to the total scores across all story lengths (see Figure 2). The moderation effect of proficiency level was not significant for the short stories but was significant for the medium and long stories. This suggests that higher proficiency learners were better able to use phrases effectively in more complex storytelling contexts, leading to higher scores. The number of recalled phrases was lowest for the long story, and this indicates how the highest-scoring participants also did not repeat all phrases verbatim. Conversely, on the short stories, accurate recall of the phrases was highest, almost certainly due to the lower burden on working memory.

7. Discussion

Irrespective of the length of the four stories in this experiment, higher scores were seen on the recall tests across the three groups in this research. The FLE participants scored higher and recalled more of the phrasal expressions used in the elicited texts. Concerning the SLE participants, the score on idea units of the moderate group on long story 4 was significantly less than that of the higher-proficiency competent group (4.00 vs. 10.91). This clearly shows that in a neuropsychological test such as the Logical Memory Test, there can be very significant gaps between what non-native speakers of varying second-language ability may be capable of. This is despite the fact that all the participants in this research were capable enough in English to be studying in an overseas university. This is a crucial finding that should be considered carefully in clinical assessment circles. Conversely, unexpected results also appeared. Particularly, the gap between the higher-proficiency competent group and the FLE group on the long recall test (shown in Figure 1) was only 0.69 points (10.91 and 11.60) and was very noticeable.
In sum, unless an individual has very well-developed ability in a second language, including highly developed receptive and productive collocational skills, assessment by clinical tests such as the WMS needs to be carefully examined more to ensure that the complex interaction between explicit knowledge and implicit skill can be reliably interpreted.
Concerning the gap in proficiency between the two SLE groups in this study, their average length of residence in New Zealand had not been long. The modest proficiency group had stayed 0.63 years, while the competent proficiency group had stayed 1.54 years. This signifies that while medium-term residence in a new country may sometimes indicate advanced proficiency, the reality may be much more complex. Similarly, some long-term immigrants never achieve such a high level of proficiency in the language of the country they live in despite living there for many years. In this way, clinicians should not make assumptions of memory skill based merely on length of residence.
Concerning the methodology employed in this study, the use of elicited imitation (listening to, and then recall of, four stories: two in short sections, one in the medium section and one without pauses) is supported by previous similar research. Although somewhat different from the present study, Janssen and Barber (2012) had their Spanish first-language participants recall various kinds of n-grams, and it was found that the speed of production was faster for the higher-frequency phrases. This indicates a processing advantage for more common phrases over less frequent ones. The importance of high-frequency MWEs, such as those in the PHRASE list, is clearly re-emphasized by such research. Siyanova-Chanturia and Janssen (2018) stated that more research is needed to ascertain whether this finding was not unusual. The present study has tried to address this issue, and the results align with their findings.
Further, this study has also provided grounds for believing that the use of the Logical Memory Subtest (LMS) from the Wechsler Memory Scale for the assessment of non-native speakers may not be invalid. This was shown by the somewhat unexpected ability of the competent group in this test to have scored higher than the FLE group on the medium-length story. This is impressive since the participants were speakers of Mandarin Chinese, which is a distinct language, grammatically and lexically, from English. However, the use of the LMS for neuropsychological assessment undoubtedly demands careful consideration of language proficiency. Within clinical assessment practice, such tests should be adapted to match the language and cultural background of the testees. In this way, test makers and clinicians should be aware of the potential biases present in these testing tools, and greater awareness should be paid to how to interpret the results of these tests to provide a more accurate and reliable evaluation of memory function.
Research reviewed in this paper (e.g., Stålhammar et al. (2022)) indicated a persistent gap between native- and non-native speakers on neuropsychological tests with a verbal component. Although speakers of languages further from the target language Swedish scored lower, the results of this study do not align with their findings in that Chinese L1 speakers started to come close to the “idea units” scores of the FLE participants despite the gap between these languages. Similarly, the research reported above by Lundell and Lindqvist (2013) showed that despite having lived in France for many years, no correlation between length of residence and lexical ability was found. The results of this study do not refute this, but although the SLE participants had lived only around 1.5 years in New Zealand, it was members of this group who started to approach the idea unit scores of the FLE participants, and actually surpassed them, as seen in Table 2, with their answers to the medium Story 3 (17.96 vs. 16.55).
An important limitation of this study is that, as an exploratory task design, the EI stories were written with intentionally simple English, both in the general text and the choice of MWEs. It would be very useful to conduct the experiment again with more advanced texts. Further, if texts composed of less frequent vocabulary were used, the hypothesis that a wider divergence between the FLEs and even advanced non-native participants could be investigated.
In addition, the present study also has significance beyond the goal of neuropsychological assessment. While conducting the study, feedback from the Mandarin native-speaker participants was informally gathered. As one significant insight, concerning the need to recall heard information in an academic setting, one participant commented,
I meet the same problem when I take part in course (sic). I can’t remember clearly. I just remember some words and I don’t remember the whole sentence.
It is clearly highly challenging for many students involved in academic study in their second or other language to keep in memory essential details and concepts during both listening and reading. In this way, it is essential to assess, without any implications for their grades, the ability of second-language users to function well in foreign-language study environments. As the findings by Siyanova-Chanturia and Janssen (2018) showed, how FLEs and SLEs can use phrasal language needs much more investigation. Future research could explore this issue, possibly through a similar elicited imitation task as used in this study.
Concerning implications for second-language learning, the results of this study indicate the need to reconsider the rate at which lexical skills of FLEs are thought to develop. In the EI method used in this study, points are allocated for recalled idea units rather than verbatim repetition of the language used. As reviewed above, while Kostromitina and Plonsky (2022) stated that longer texts place a heavier burden on test takers, these do illustrate the implicit ability of second-language learners. Indeed, in the long story, the competent SLE group used more of the phrasal expressions than the FLE group. These results suggest that the SLE participants had developed a high awareness of common phrases in English, and this enabled a more accurate recall of them. However, this observation cannot be taken as evidence that the SLEs’ English proficiency approximated that of the FLEs, who more frequently re-expressed the text meanings in summarized form. Rather, it likely reflects a tendency among the more proficient SLE group to employ greater attention to the lexical forms of the texts. Indeed, research by Lei and Yan (2022) showed this to be the case. They investigated the kinds of strategies that learners of Chinese and L1 speakers of Chinese used on EI tasks. Through interviews, it was found that higher-proficiency learners used far more strategies than native speakers and also lower-proficiency participants. They used cognitive strategies such as paying attention to the meaning of words, phrases or syntactic patterns in the texts, or the use of translation. However, the degree of variation in the EI scores was accounted for more by factors such as the length of the items on the test rather than the use of strategies. Nevertheless, the authors stated that greater use of strategies would lead to higher scores on EI tests. Prior to participation in this study, the SLE students benefited from living and studying in English-medium academic institutions (albeit only for up to a year and a half), and their scores indicated that their productive recall of common phrasal expressions improved as the length of the stories increased. Their proficiency was high, but in addition, investigating what forms of attention they paid to learning would be very useful.
There are important implications for language education from this study. The importance of MWEs needs to be more widely acknowledged by teachers and teacher trainers as these results show that higher proficiency coincides with the acquisition of common, useful phrases. While MWEs are essential in any language, the acquisition of such phrases is nevertheless slow and incremental. MWEs include various subtypes (Wolter 2020), some of which are less important at early stages of learning, and without good awareness of which to focus on, they tend to be underemphasized in second-language teaching. Also, the sheer number of such phrases without adequate recycling slows learning. Therefore, it is essential that language syllabi provide learners with exposure to MWEs, both to deepen learners’ knowledge and build awareness for how to go on learning them strategically. Peters (2014) found that learning activities that provided repeated encounters with MWEs allowed learners to acquire more of them, although the acquisition of words was stronger than MWEs in her study.
As such, a suitable approach to syllabus design is a combination of reading activities that promote incidental learning along with explicit instruction. A well-regarded approach is Nation’s Four Strands approach (e.g., Nation and Yamamoto 2012), in which learning activities can be characterized as either meaning-focused input, meaning focused output, language focused learning, and fluency development, or a combination of these four. This means that just learning through one method is inadequate to help learners acquire words. Nation and Yamamoto stress how learning occurs is more important than how teaching is done. As such, intentional learning of words should not use more than 25% of the time available for learning collocations, in this case. Rather, reading, listening and writing activities that support the development of lexical knowledge should be prioritized. In this way, Extensive Reading is a key strand in meaning focused input.
Other innovative teaching methods are also possible. Although the six phrases used in each of the four EI stories in this study are high frequency, the entrenchment in memory of such MWEs may remain shallow until learners become highly competent. To address this pedagogically, teachers can employ classroom activities such as dictogloss, in which both input through listening, output through writing and group discussion and feedback from the teacher are available. These fulfill at least three of the four strands. One such example of a pedagogic innovation is the modified dictogloss activity (Lindstromberg et al. 2016), whereby learners are given a list showing the MWE phrases to appear in the text before it is read out by the teacher, and the learners recreate the text. In doing so, the authors showed that learners were better able to use these phrases than a control group in posttests. Based on this approach, teachers could create dictogloss texts containing the most frequent MWEs in the PHRASE list, like those used in this study, and which were shown to become productively available to learners in the EI task. This could be subsequently tested with longer forms of elicited imitation than usually employed to investigate whether such important, useful phrases become productively available to learners earlier than usual.

8. Conclusions

Finally, while collocations and phrasal knowledge are central to language ability and second-language learning, this research has shown that the ability to recall heard phrasal expressions (even the comparatively common ones used in this research) is not assured. Clearly, acquisition of a second language and its lexical richness, among other features, requires a very large amount of exposure to the target language, as Conklin and Schmitt (2012) pointed out. The learning practices best suited to develop rich knowledge of a second language need greater emphasis in both schools and teacher training courses. In addition to these, there is also the need for a wider understanding of how second-language acquisition functions, and in this research, how second-language ability can influence potential outcomes in clinical neuropsychology assessments was investigated. It is hoped that this research will lead to further investigations in this area.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by Victoria University of Wellington Standing Committee of the Human Ethics Committee (approval Code 0000028128, approved 26 November 2019).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

The four stories used for the Elicited Imitation test are shown below. The six phrases in each story are highlighted in grey.
Story (1) Short Told in five short sections (// shows the sections)
Last year, John started university. // His mother said he should make meals on his own. // To make sure he did this, she taught him some recipes. // He managed to continue for a few weeks, but he soon gave this up. // In the end, he only ate high-calorie dishes in the campus cafeteria.
(52 words; Total score 25)
Story (2) Short Told in four short sections (// shows the sections)
There used to be a cake shop in my town. // The lady in charge always came up with new cakes. // No matter the weather, I went there to buy the new cake every weekend. // I kept on doing this until I left home, and I’ll remember it as long as I live.
(52 words; Total score 26)
Story (3) Medium Told in two halves
British people are fond of keeping pets. They get a sense of well-being from taking care of them. They say that pets don’t take them for granted. // Dogs especially can tell when their owners are sad. So, it’s no wonder that some people even sleep in the same bed with their pets.
(52 words; Total score 22)
Story (4) Long Told with no pauses
The Jones family went camping. On the way, their car ran out of gasoline. Instead of going where they planned, they camped in a small forest. They ended up having a BBQ and the children danced in front of the fire. It was a beautiful place, and it turned into a wonderful vacation.
(53 words; Total score 24)

Appendix B

Idea units in the four stories.
Story 1
last year said he did this for a few weeks only
John he should she taught him he soon high-calorie
started make meals some recipes gave this up dishes
university on his own he managed to in the end campus
his mother to make sure continue he ate in the cafeteria
       / 25
Story 2
there ~ be in charge no matter the new until
used to always the weather cake I left home
a cake shop came up with I went every I’ll
in my town new there weekend remember it
the lady cakes to buy I kept on doing this as long as
I live
       / 26
Story 3
British people a sense of pets can tell even
are fond of well-being don’t take for granted when their owners sleep
keeping taking care of them are sad in the same bed
pets them dogs it’s no wonder with their
they get they say that especially some people pets
       / 25
Story 4
Jones ran out of they camped the children place
family gasoline in a ~ forest danced turned into
went camping instead of small in front of wonderful
on the way going they ended up the fire vacation
their car where they planned having a BBQ beautiful
       / 24

References

  1. Baddeley, Andrew D. 2015. Working memory in second language learning. In Working Memory in Second Language Acquisition and Processing. Edited by Zhisheng Wen, Mailce Borges Mota and Arthur McNeill. Bristol: Multilingual Matters, pp. 17–28. [Google Scholar]
  2. Carrol, Gareth, and Kathy Conklin. 2020. Is all formulaic language created equal? Unpacking the processing advantage for different types of formulaic sequences. Language and Speech 63: 95–122. [Google Scholar] [CrossRef] [PubMed]
  3. Conklin, Kathy, and Norbert Schmitt. 2012. The Processing of Formulaic Language. Annual Review of Applied Linguistics 32: 45–61. [Google Scholar] [CrossRef]
  4. Conklin, Kathy, and Rüdiger Thul. 2023. Word and multiword processing. In The Routledge Handbook of Second Language Acquisition and Psycholinguistics. Edited by Aline Godfroid and Holger Hopp. London: Routledge, pp. 203–15. [Google Scholar]
  5. Deygers, Bart. 2020. Elicited imitation: A test for all learners? Examining the EI performance of learners with diverging educational backgrounds. Studies in Second Language Acquisition 42: 933–57. [Google Scholar] [CrossRef]
  6. Gallagher, Karen, and Tamiko Azuma. 2018. Analysis of story recall in military veterans with and without mild traumatic brain injury: Preliminary results. American Journal of Speech-Language Pathology 27: 485–94. [Google Scholar] [CrossRef]
  7. Haarman, Henk. 2013. Working Memory. 2012. In Routledge Encyclopedia of Second Language Acquisition. Edited by Peter Robinson. London: Routledge, pp. 694–99. [Google Scholar]
  8. Hoang, Hien, and Frank Boers. 2016. Re-telling a story in a second language: How well do adult learners mine an input text for multiword expressions? Studies in Second Language Learning and Teaching 6: 513–35. [Google Scholar] [CrossRef]
  9. Janssen, Niels, and Horacio A. Barber. 2012. Phrase frequency effects in language production. PLoS ONE 7: e33202. [Google Scholar] [CrossRef]
  10. Kisser, Jason. E., Carrington R. Wendell, Robert J. Spencer, and Shari R. Waldstein. 2012. Neuropsychological performance of native versus non-native English speakers. Archives of Clinical Neuropsychology 27: 749–55. [Google Scholar] [CrossRef]
  11. Kormos, Judit, and Anna Sáfár. 2008. Phonological short-term memory, working memory and foreign language performance in intensive language learning. Bilingualism: Language and Cognition 11: 261–71. [Google Scholar] [CrossRef]
  12. Kostromitina, Maria, and Luke Plonsky. 2022. Elicited imitation tasks as a measure of L2 proficiency: A meta-analysis. Studies in Second Language Acquisition 44: 886–911. [Google Scholar] [CrossRef]
  13. Lei, Yuyun, and Xun Yan. 2022. An Exploratory Study of Strategy Use on Elicited Imitation Tasks. Frontiers in Psychology 13: 917168. [Google Scholar] [CrossRef]
  14. Lindstromberg, Seth, June Eyckmans, and Rachel Connabeer. 2016. A modified dictogloss for helping learners remember L2 academic English formulaic sequences for use in later writing. English for Specific Purposes 41: 12–21. [Google Scholar] [CrossRef]
  15. Lundell, Fanny Forsberg, and Christina Lindqvist. 2013. Lexical aspects of very advanced L2 French. The Canadian Modern Language Review 70: 28–49. [Google Scholar] [CrossRef]
  16. Martinez, Ron, and Norbert Schmitt. 2012. A phrasal expressions list. Applied Linguistics 33: 299–320. [Google Scholar] [CrossRef]
  17. McManus, Kevin, and Yingying Liu. 2020. Using elicited imitation to measure global oral proficiency in SLA research. A close replication study. Language Teaching 55: 116–35. [Google Scholar] [CrossRef]
  18. Nation, I. S. Paul, and Azusa Yamamoto. 2012. Applying the four strands to language learning. International Journal of Innovation in English Language Teaching and Research 1: 167–81. [Google Scholar]
  19. Nesselhauf, N. 2005. Collocations in a Learner Corpus. Amsterdam: John Benjamins Publishing Company. [Google Scholar]
  20. Paquot, Magali, and Sylviane Granger. 2012. Formulaic language in learner corpora. Annual Review of Applied Linguistic 32: 130–49. [Google Scholar] [CrossRef]
  21. Peters, Elke. 2014. The effects of repetition and time of posttest administration on EFL learners’ form recall of single words and collocations. Language Teaching Research 18: 75–94. [Google Scholar] [CrossRef]
  22. Qian, David D., and Linda. H. F. Lin. 2020. The relationship between vocabulary knowledge and language proficiency. In The Routledge Handbook of Vocabulary Studies. Edited by S. Webb. London: Routledge, pp. 66–80. [Google Scholar]
  23. Schmitt, Norbert, Sarah Grandage, and Svenja Adolphs. 2004. Are corpus-derived recurrent clusters psycholinguistically valid? In Formulaic Sequences. Edited by Norbert Schmitt. Amsterdam: John Benjamins, pp. 127–51. [Google Scholar]
  24. Siyanova-Chanturia, Anna, and Niels Janssen. 2018. Production of familiar phrases: Frequency effects in native speakers and second language learners. Journal of Experimental Psychology: Learning, Memory, and Cognition 44: 2009–18. [Google Scholar] [CrossRef]
  25. Solon, Megan, and Hae In Park. 2024. Elicited imitation in second language acquisition research: New insights to advance methodological rigor (Introduction to the special issue). Research Methods in Applied Linguistics 3: 1–4. [Google Scholar] [CrossRef]
  26. Solon, Megan, Hae In Park, Carly Henderson, and Marzieh Dehghan-Chaleshtori. 2019. Revisiting the Spanish elicited imitation task: A tool for assessing advanced language learners? Studies in Second Language Acquisition 41: 1027–53. [Google Scholar] [CrossRef]
  27. Sonbul, Suhad. 2015. Fatal mistake, awful mistake, or extreme mistake? Frequency effects on off-line/on-line collocational processing. Bilingualism: Language and Cognition 18: 419–37. [Google Scholar] [CrossRef]
  28. Stålhammar, Jacob, Per Hellström, Carl Eckerström, and Anders Wallin. 2022. Neuropsychological test performance among native and non-native Swedes: Second language effects. Archives of Clinical Neuropsychology 37: 826–38. [Google Scholar] [CrossRef] [PubMed]
  29. Wolter, Brent. 2020. Key Issues in teaching multiword items. In The Routledge Handbook of Vocabulary Studies. Edited by Stuart Webb. London: Routledge, pp. 493–510. [Google Scholar] [CrossRef]
  30. Yang, Pi-Lan. 2015. Interaction of working memory capacity in foreign language proficiency. Concentric: Studies in Linguistics 41: 95–115. [Google Scholar]
  31. Yi, Wei, and Yanlu Zhong. 2023. The processing advantage of multiword sequences: A meta-analysis. Studies in Second Language Acquisition 46: 1–26. [Google Scholar] [CrossRef]
Figure 1. Scores on idea units from retelling at different story lengths.
Figure 1. Scores on idea units from retelling at different story lengths.
Languages 09 00337 g001
Figure 2. Relationship between the number of phrases and scores for different story lengths.
Figure 2. Relationship between the number of phrases and scores for different story lengths.
Languages 09 00337 g002
Table 1. Examples from Martinez and Schmitt’s PHRASE List.
Table 1. Examples from Martinez and Schmitt’s PHRASE List.
PhraseFrequency
(per 100 Million)
Spoken GeneralWritten GeneralWritten AcademicExample
HAVE TO83902*  *  **  **I exercise because I have to.
AS LONG AS5084*  *  **  *  **It makes no difference as long as it’s done.
Table 2. Total score across different proficiency levels.
Table 2. Total score across different proficiency levels.
Mean Total Score (SE)Changes in Total Score
Proficiency Level
Story LengthIELTS ModestIELTS CompetentFLE5.5~6 and 6.5 and Above6.5 and Above to FLE5.5~6 to FLE
Short16.55 (0.53)23.18 (0.49)25.23 (0.52)6.629 **2.043 *8.672 **
Medium9.89 (0.75)17.96 (0.70)16.55 (0.73)8.060 **−1.4056.655 **
Long4.00 (0.75)10.91 (0.70)11.60 (0.73)6.909 **0.6917.600 **
Note: Numbers in parentheses are standard errors. * p < 0.05 (Bonferroni correction) and ** p < 0.001 (Bonferroni correction).
Table 3. Numbers of phrases used across different proficiency levels.
Table 3. Numbers of phrases used across different proficiency levels.
Mean Number of Phrases (SE)Changes in Number of Phrases
Proficiency Level
Story LengthModestCompetentFLEModest to CompetentCompetent to FLEModest to FLE
Short4.26 (0.18)5.64 (0.16)6.00 (0.17)1.373 **0.3641.737 **
Medium1.58 (0.25)3.41 (0.23)3.65 (0.24)1.830 **0.2412.071 **
Long0.58 (0.25)3.14 (0.23)2.80 (0.24)2.557 **−0.3362.221 **
Note: Numbers in parentheses are standard errors. * p < 0.05 (Bonferroni correction) and ** p < 0.001 (Bonferroni correction).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Coulson, D.G. Retelling of Stories with Common Phrasal Expressions by High-Proficiency Learners: Implications for Learning and High-Stakes Testing. Languages 2024, 9, 337. https://doi.org/10.3390/languages9110337

AMA Style

Coulson DG. Retelling of Stories with Common Phrasal Expressions by High-Proficiency Learners: Implications for Learning and High-Stakes Testing. Languages. 2024; 9(11):337. https://doi.org/10.3390/languages9110337

Chicago/Turabian Style

Coulson, David Gregory. 2024. "Retelling of Stories with Common Phrasal Expressions by High-Proficiency Learners: Implications for Learning and High-Stakes Testing" Languages 9, no. 11: 337. https://doi.org/10.3390/languages9110337

APA Style

Coulson, D. G. (2024). Retelling of Stories with Common Phrasal Expressions by High-Proficiency Learners: Implications for Learning and High-Stakes Testing. Languages, 9(11), 337. https://doi.org/10.3390/languages9110337

Article Metrics

Back to TopTop