Comparing Different Methods That Measure Bilingual Children’s Language Environment: A Closer Look at Audio Recordings and Questionnaires
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThank you for submitting this very interesting paper which I consider to be worthy of publication, after some minor amendments. The paper is well written and covers an area of international interest, adding something to the field in this area. My comments on the small things that I feel need to be tweaked slightly prior to publication are as follows:
The introduction needs to discuss study in the past tense – by the time this is published the study has been done so needs discussing throughout the paper in past tense.
The literature review would benefit from some exploration of the term “vocabulary scores” – what do you mean by this term? Quantity of utterances? Quality of vocabulary? What does the literature say about ways to record and analyse this? It is noted that you begin to talk about this in the methods section but it is not until here that we understand what you mean by this. It needs addressing earlier in the paper to provide clarity.
How did you sample your families?
Ethics section would benefit from discussion around conflict between confidentiality and safeguarding. What would you have done if concerns for child welfare? How did you address this complexity with participants? Also the intrusion into a families home requires sensitivity – how did you deal with power issues and reassure participants of the “do no harm” principles?
Some clarity required about the presentation of the data - Table 4 would benefit from a key to what M, SD, min and max refer to and in table 5 the r and p would benefit from some explanation to make it really clear as to what you are referring to. Table 7 & 8 talk about t and p and z and p respectively and it is not clear what these refer to and why they differ from one table to the next. The tables all therefore need some kind of key or further explanation
These small corrections, I believe, would enhance your paper and once undertaken I feel that this will be of interest to many. Thank you.
Author Response
Reviewer 1:
Thank you for submitting this very interesting paper which I consider to be worthy of publication, after some minor amendments. The paper is well written and covers an area of international interest, adding something to the field in this area. My comments on the small things that I feel need to be tweaked slightly prior to publication are as follows:
RESPONSE: We would like to thank the reviewer for their comments and general enthusiasm towards the study. Below, we will address each of the individual comments.
The introduction needs to discuss study in the past tense – by the time this is published the study has been done so needs discussing throughout the paper in past tense.
RESPONSE: The past tense has been applied throughout the paper.
The literature review would benefit from some exploration of the term “vocabulary scores” – what do you mean by this term? Quantity of utterances? Quality of vocabulary? What does the literature say about ways to record and analyse this? It is noted that you begin to talk about this in the methods section but it is not until here that we understand what you mean by this. It needs addressing earlier in the paper to provide clarity.
RESPONSE: To provide clarity to the reader we have added a brief introduction to the term vocabulary scores to the introduction. Here we also mention that we distinguish between expressive and receptive vocabulary and give a short description of how both modalities are measured (55-60).
How did you sample your families?
RESPONSE: Because we wanted to keep our dataset as homogeneous as possible and measure the vocabulary scores in both languages of the bilingual child, we limited the second languages spoken to only two languages. We chose Polish and Turkish because these communities are well represented in the Netherlands, making it feasible to reach a larger sample size.
Additionally, participants in this study are part of a larger overarching project that investigates language mixing behavior in both parents and children. As Turkish and Polish are both typologically different from Dutch and from each other, they pose an interesting group to investigate language mixing behavior. Furthermore, vocabulary tasks and research assistants were available in both Polish and Turkish to help gather the data. A brief explanation of these two points has been added to the text (204-208).
We sampled children aged three to five years old because within this age range, bilingual children will often experience a shift in their language input. Children in the Netherlands go to school at the age of four. As a consequence, their language input in the majority language will increase drastically. The overarching project might conduct future research whether distinctions between children who go to school or not could be made.
Families were recruited via schools, (local) events, online calls on social media platforms and personal networks.
Ethics section would benefit from discussion around conflict between confidentiality and safeguarding. What would you have done if concerns for child welfare? How did you address this complexity with participants? Also the intrusion into a families home requires sensitivity – how did you deal with power issues and reassure participants of the “do no harm” principles?
RESPONSE: To safeguard the privacy of the participants, only fragments of 30-seconds were listened to. Consequently, the researcher was unable to gain an understanding of the full context of the conversations. It was made clear to the participants that the focus of the study was on the languages being spoken, rather than on the content of the conversations. Furthermore, parents were given the opportunity to have parts of the audio removed before it would ever be listened to. One family made use of this option and had two hours of audio deleted. This has been clarified in the Procedures (section 2.2, 402-408).
Luckily, no situation of concern was encountered in our data. However, we do have the following policy stated in our Data Protection Impact Assessment regarding the possible concerns for a child’s welfare:
- It is possible that methods of research may reveal information that suggests violence or (child) abuse. In those cases, we will determine the urgency within the core group of researchers. If deemed necessary the researchers will call with a confidential doctor of “Veilig Thuis” (“Safe Home”), which is an organization in the Netherlands that focuses on addressing and preventing domestic violence and child abuse, for advice about the next steps, without sharing any information about the identity of the participant.
- It may be necessary for the lawfulness of the processing of such accidental findings to find a supplementary legal basis. Depending on the situation this might be consent, or vital interest of the abused person or otherwise. The research team will act to the best of its conviction.
Some clarity required about the presentation of the data - Table 4 would benefit from a key to what M, SD, min and max refer to and in table 5 the r and p would benefit from some explanation to make it really clear as to what you are referring to. Table 7 & 8 talk about t and p and z and p respectively and it is not clear what these refer to and why they differ from one table to the next. The tables all therefore need some kind of key or further explanation
RESPONSE: Clarifications of abbreviations have been added to the table notes. Table 7 reports z values, because the comparison is between two independent groups, which uses Pearson & Filon’s z-test. Table 8 reports t values, because the comparison is between two dependent groups, which uses William’s t. We have added this information to the respective table notes.
These small corrections, I believe, would enhance your paper and once undertaken I feel that this will be of interest to many. Thank you.
Reviewer 2 Report
Comments and Suggestions for AuthorsThank you for the opportunity to review this manuscript. The study examines three different methods of measuring language input in bilingual children and their potential relationship with vocabulary scores in both languages. Method 1 is a parent questionnaire (Q-Bex) in which parents report the time spent in each language, method 2 is a naturalistic recording method that records daily live interactions within the family's home, and method 3 is a combined method from methods 1 and 2, with some variations.
This is an important contribution to the study of bilingualism, specifically in a difficult area: measuring bilingual input.
However, some minor changes need to be made before accepting it for publication.
- The literature review covered in the introduction is not complete. Only results from previous studies are stated, but no details are provided. Providing more details on how these studies were conducted would give the reader a stronger background on the importance of the present study. I suggest that the author provide details on the most relevant studies in each section (Q-BEx and LENA).
- 105-107: Expand on this last sentence. At first, it sounds contradictory. Are you speculating that fewer parents would want to participate by increasing the number of days?
- 170-171: This hypothesis is not directly related to the research question, as you only stated that you would compare the correlations of the 3 methods with vocabulary scores. You did not mention majority vs minority scores.
- Also, you may need to reformulate your research question. It states PREDICTS, but you did not run any regression analysis to test for prediction. You ran correlations that show potential predictions, but this is not always the case.
- 179, participants: either say something about the final total number here or add the word FINAL to the last sentence. The reader needs to go to the table to verify the final N.
- How was language dominance determined?
- Table 3. What exactly does context mean in the combined method step 2?
- 380-381. Specify if the correlations stated here (1,2,3) correspond to the numbers in Figure 1 and what numbers correspond to what language.
- Page 13, there is no need to have footnote 1 as this information can be incorporated in the text.
Author Response
Thank you for the opportunity to review this manuscript. The study examines three different methods of measuring language input in bilingual children and their potential relationship with vocabulary scores in both languages. Method 1 is a parent questionnaire (Q-Bex) in which parents report the time spent in each language, method 2 is a naturalistic recording method that records daily live interactions within the family's home, and method 3 is a combined method from methods 1 and 2, with some variations.
This is an important contribution to the study of bilingualism, specifically in a difficult area: measuring bilingual input.
However, some minor changes need to be made before accepting it for publication.
RESPONSE: We want to thank the reviewer for their useful comments. Below, we will address the individual comments.
- The literature review covered in the introduction is not complete. Only results from previous studies are stated, but no details are provided. Providing more details on how these studies were conducted would give the reader a stronger background on the importance of the present study. I suggest that the author provide details on the most relevant studies in each section (Q-BEx and LENA).
RESPONSE: We are not sure which sections are being referred to, as the sections of Q-BEx and LENA belong to the method sections and not the introduction. But we interpreted this comment as referring to section 1.3 that discusses the results of studies that previously compared parental questionnaires to audio recordings. To provide the reader with a stronger background, we have added more details about the studies that have correlated estimates from parental reports with estimates observed with LENA (page 3). However, these do not include details about the Q-BEx in specific, as to date no studies have correlated estimates from the Q-BEx questionnaire with estimates from daylong audio recordings or vocabulary scores.
- 105-107: Expand on this last sentence. At first, it sounds contradictory. Are you speculating that fewer parents would want to participate by increasing the number of days?
RESPONSE: We agree with the reviewer and think the way it was previously phrased was indeed unclear. We have now stated it more directly: “it is conceivable that the increased effort and invasiveness for families can lead to smaller sample sizes (e.g. Casillas et al., 2020; Cychosz et al., 2021; Marchman et al., 2017).” (113-114).
- 170-171: This hypothesis is not directly related to the research question, as you only stated that you would compare the correlations of the 3 methods with vocabulary scores. You did not mention majority vs minority scores.
RESPONSE: We have added an additional research question to account for the hypotheses regarding the majority vs minority language input (180-181)
- Also, you may need to reformulate your research question. It states PREDICTS, but you did not run any regression analysis to test for prediction. You ran correlations that show potential predictions, but this is not always the case.
RESPONSE: We have changed the terms of prediction to correlation.
- 179, participants: either say something about the final total number here or add the word FINAL to the last sentence. The reader needs to go to the table to verify the final N.
RESPONSE: We have added an extra sentence to clarify the final sample size (217).
- How was language dominance determined?
RESPONSE: Dominance was determined based on the proportion of current overall language exposure according to the Q-BEx questionnaire. This explanation has been added to the text (219-220).
- Table 3. What exactly does context mean in the combined method step 2?
RESPONSE: A context is a language environment with a unique combination of speakers. We argue in section 2.2.2.4 that the language environment could differ in these various contexts. For example, the language environment of a child alone with their mother might differ from the language environment when the child is with both their mother and sibling. We distinguish between nine possible contexts in which we expect the language environment could differ: mother; father; mother + sibling; father + sibling; mother + father; mother + father + sibling; sibling; school; community. We have now specified these nine contexts in the text (367-369).
- 380-381. Specify if the correlations stated here (1,2,3) correspond to the numbers in Figure 1 and what numbers correspond to what language.
RESPONSE: The numbers 1, 2 and 3 indeed correspond to the numbers in Figure 1. We have added that for clarity in the text (430-431). All numbers are available in the majority language (Dutch) and minority language (Polish/Turkish) separately.
- Page 13, there is no need to have footnote 1 as this information can be incorporated in the text.
RESPONSE: We have incorporated the footnote in the general text.
Reviewer 3 Report
Comments and Suggestions for AuthorsComments for author File: Comments.pdf
Author Response
The manuscript compares different methods that are used to measure the quantity of input in bilingual children’s environments. As the link between input frequency and lexical development in particular has been well established, a comparison of measures to determine input frequency is important as different measures are used by different studies.
The study specifically compares the use of three measures: parental questionnaires, audio recordings and a combination of both. These measures are the most commonly used when estimating input quantity in both languages of a bilingual. The instruments included here (Q-Bex and LENA) are recently developed and are validated tools.
The three measures are correlated with children’s receptive and expressive vocabulary scores. There is one research question which focuses on the best input measure for predicting children’s vocabulary scores. However, hypotheses are given at other points in the text (p. 2 states the hypothesis that the combined method would give the strongest correlations, p. 9 that estimates are better for minority than majority language). The research question, together with hypotheses relating to productive and receptive vocabulary in the majority vs minority language could have been outlined in the same place so the reader knows better what to expect.
RESPONSE: We have added an additional research question to account for our hypothesis regarding the majority vs minority language input (180-181), and we briefly introduce the corresponding hypothesis earlier in the paper (page 2, 67-70). There were no hypotheses relating to expressive and receptive vocabulary, as we only conducted these analyses exploratorily based on the outcomes of the first research question, therefore we have not added this aspect of the paper in the introduction.
More information about the participants would have been useful. They were multilingual 3-5-year olds living in the Netherlands who were exposed to Turkish and Dutch or Polish and Dutch. The children were grouped by dominance (>60% Dutch, 40-60 % Dutch and <40% Dutch). It is not clear whether all children were exposed to both languages from birth and whether all children had exposure to both languages in the home.
RESPONSE: We agree that more information about the participants is of added value to the reader. We have added the following text to section 2.1 (220-226): “In most families (n = 31), children were first exposed to Dutch in the home environment. The other children were first exposed to Dutch at pre-school (n = 23), school (n = 3) or another location (n = 1). All children were first exposed to Turkish or Polish in their home environment. While most families used both Dutch and the heritage language in the home environment at time of testing (n = 42), there were some families that used only the heritage language at home (n = 10).”. We hope that this provides sufficient in-depth information about the language situation of the participants.
Vocabulary was measured in both languages using the Cross- Linguistic lexical task (CLT) and different scores (out of 64) for receptive and productive vocabulary were calculated. While the Q-Bex questionnaire includes details of language exposure during a typical week, weekends as well as holidays, the LENA measure was based on 270 30-second segments of recordings for one (day (weekend). Table 4 gives the weighted proportions for each language for each participant for the Q-Bex and LENA. The combined measure on the other hand includes the quantitative measure ‘adult word count’ combined with the hours of input in each language and is given as a figure. Could this be converted to the proportion of the total adult word count for both majority and minority language to enable a direct comparison with the other two measures?
RESPONSE: We understand that the values for the combined method were difficult to interpret the way they were presented in table 4. We agree that a converted proportion is insightful to allow comparison between the combined method and the LENA and Q-BEx. We have added a row in the table where we converted the combined method into proportions for majority and minority language based on the total adult word count. We also kept the original information of the combined method in the table since this was the variable with which the analyses were conducted. We have added an explanation for the addition of the proportion in the table note.
Table 4 shows that the different measures give different estimates of input quantity for the majority and minority language. This is acknowledged by the authors and section 4.1 gives the explanation that the recordings ‘might not have been representative of the average input during an entire week’. Are there other possible reasons?
RESPONSE: Section 4.1 previously only addressed possible reasons why the combined method did not work better than LENA and Q-BEx (as this was our hypothesis). We have now also added text that explains the difference between the proportional estimates between the LENA and Q-BEx (539-544). The different estimates are likely due to the fact that the LENA recording only measures the language input in the home environment. Language input outside of the home environment will most likely consist of relatively more majority language input. The Q-BEx does include the language input outside of the home environment in its estimate and hence likely results in a larger proportion of majority language input compared to the LENA recording.
Table 5 shows the correlations between the different input measures and expressive and receptive vocabulary scores. These show stronger correlations for the minority language. This corresponds to the hypothesis but reasons for this result could be discussed.
RESPONSE: We believe these reasons are discussed in section 4.2. Here we mention that the correlations are stronger in the minority language because we expected parents to be better at estimating the language input in the minority language than in the majority language. This is due to the fact that children receive a relatively large amount of input in the majority language from sources outside of their home environment, which complicates tracking the input quantity (564-567).
The finding that correlations were stronger for expressive vs receptive vocabulary is explained in terms of the Weaker Links Hypothesis. While all three measures showed significant correlations between input quantity and lexical levels, the parental questionnaire on its own was more strongly correlated with vocabulary scores than the other measures. Possible reasons for this could be discussed more. Is it the fact that the questionnaire was more representative of the average input over a longer period of time or are there other reasons?
RESPONSE: If we understand correctly, you are referring to possible reasons why the parental questionnaire correlated more strongly to vocabulary scores than the other two methods. First, we think it is important to note that the parental questionnaire did not significantly correlate more strongly with vocabulary scores than the other two methods (despite numeric differences in the correlation coefficient). Second, we have added possible reasons for this difference in correlational strength in section 4.1 (as mentioned in the previous comment).