Compactness of Native Vowel Categories in Monolingual, Bilingual, and Multilingual Speakers: Is Category Compactness Affected by the Number of Languages Spoken?
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsIt is a very well written paper, to which I have only minor comments or suggestions.
In the introduction (lines 26-39), and perhaps in the abstract as well, one would expect to see references to papers on category density or at least a promise to deliver them in subsequent paragraphs. Otherwise, the overview of previous literature is very informative and to the point.
More information on language history profiles of the participants would be helpful. The reader does not know whether the participants simply learnt the L2 o Lns in the classroom or immersion setting, which might influence new vowel category formation, as formal learners are often exposed to accented input in the foreign language due to large portions of interactions with a non-native teacher and fellow students.
What was the amplitude contour of the synthesized stimuli?
Lines 352-354 I have doubts regarding your method of determining the number of phonemes in a given participant's vowel shared vowel space. Certainly the acoustic realization of a vowel may vary from language to language, but many of foreign language vowels undergo equivalence classification, which leads to merged categories. Then, instead of having more vowel categories in the system, a learner actually has a category that is less compact. Would it be at all possible for you to try and find out how the vowels of the languages your participants spoke are usually assimilated to L1 Spanish vowels? I assume that would be possible for English, but I am not sure whether one can find perceptual assimilation data for other languages.
Line 356 et seq. -- Was it an online experiment without supervision by the experimenters?
I'd consider moving lines 332-339 to section 3.1.
Lines 401-406 -- What predictions do you have with regard to participant gender and age and their relationship to the effect of the size of the inventory and the overall compactness index that made you use these as the fixed effects?
Lines 411-412 -- I'd delete the comment about a "suggestive trend with personality factor" -- it simply isn't significant.
Lines 415-416 -- same as above.
Lines 462-471 -- you could also discuss how the methodology of the comparison can influence the results. The participants listened to isolated synthesized vowels where there was probably little undershoot, whereas the vowels that are produced are rarely produced in isolated citation form.
Also, there is an issue of the methodology of the decision in the perception task -- the participants were not asked to rate the goodness of fit and we do not know whether they were more or less tolerant of the variants they accepted. It seems that listeners tend to accept more variation than they actually produce, as you point out in lines 472-476.
Lines 535-540: you can also see Wrembel (2015) In search of a new perspective: Cross-linguistic influence in the acquisition of third language phonology for earlier reference of VOT specifications in multilinguals.
Line 557 - I think many studies suggest that.
The paper is concluded with well though out remarks, including limitations of the study.
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript examines the compactness of phonetic categories in the psycho-acoustic space of native speakers of Spanish speaking other languages and assesses whether the number of phonemes a speaker has across the languages (she speaks) affects the compactness of the native categories. The study examines an interesting research question, it is very well written and covers relevant literature. However, there are a number of short-comings with respect to framing of participants and tested variables, to the statistical analyses that don’t control for the rate of false positives and in the interpretation of the results that prevent me from recommending this manuscript for publication in its current form. My main points are enumerated below and are followed by minor suggestions.
Abstract & Introduction
The authors hypothesize that bilinguals shall have more compact categories, as they would need to fit their phonemic categories in the same space; however, there is abundant research indicating that bilinguals’ re-utilize phonemic categories of one language to produce identical and/or similar sounds in another language, which would assume no differences in compactness between bilinguals and monolinguals. The ‘more compact categories in bilinguals’ lies in the assumption that bilinguals produce differently all the phonemic categories in their L1 and L2. This is related to how the authors operationalize the number of phonetic inventories in their participants. They assume that L2 speakers have distinct phonemic representations for each sound of the languages they speak; yet, this has not been tested within the current participants and might not be the case even in proficient early bilinguals (leaving the possibility of merged categories, considering that participants in the current study are not very proficient speakers (cf my below comment)). I would like the authors comment on this and adjust the text accordingly and acknowledge this in the limitations, as the main result lies in the assumption that participants in the current study do have these shared phonemic representations, however, given the very little information we have about these participants and their language(s) use, it is uncertain.
The authors defined bilinguals as individuals who have reported their language skills at B2 level, although this level supposes good mastery of a language, hence, good L2 skills, it cannot be considered that the level is sufficiently high to reach a bilingual criteria, which would be C1-C2 in my opinion. There is a huge gap between B2 and C2 as can be found on the Common European Framework of Reference for Languages (CEFR), where a person at C2 level, for example, “has no difficulty in understanding any kind of spoken language, whether live or broadcast, even when delivered at fast native speed”, whereas a person at B2 level “can understand most TV news and current affairs programmes. I can understand the majority of films in standard dialect.”, her skills are relatively limited in all domains, including, reading, writing, listening and speaking. A bilingual individual is a person who masters two languages and uses them on a daily basis, while B2 level accurately describes intermediate level of L2 knowledge. I am not sure that even individuals with B2 level in another language would consider themselves bilinguals. I urge the authors to reconsider the terminology they use, as it is confusing and might not represent a bilingual person as defined in research (including the studies the authors cite, for example, with Spanish-Catalan or Spanish-Basque bilinguals). The individuals examined in current research are L1-Spanish speakers of L2/Ln.
It would be important to clarify, throughout the manuscript, that compactness can be in perception and in production and that the current work examines the compactness of perceptual representations, given a predefined psycho-acoustic space.
In general, the authors use the term ‘L2 acquisition’, given the specificity of this type of research, it would be important and helpful to know when the authors refer to L2 perception or L2 production or both (acquisition).
The conclusion in the abstract ‘This finding suggests the existence of composite or merged categories among bilingual and multilingual speakers’ is unclear.
In Introduction, several claims lack references, for example, lines 27-48 have no references to support the text, for example, factors explaining L1 compactness, future research on individual differences in compactness, relationship with general auditory capacity etc.
Do the authors refer to L1 acquisition in infants here? (“Each individual naturally assimilates the phonetic patterns of their particular speech”). What is meant by “idiosyncratic anatomical, cognitive, and neural attributes into the process”? Maybe the authors could expand on those?
The authors oppose the prototype vs exemplar-based theories, but the former is not sufficiently covered. What framework supports the DIVA model?
It would be useful to have the hypotheses and more information about the current study before the Methods.
Methods: If my understanding is accurate, the task was run in blocks, per vowel. However, I wonder why the authors did not opt for a 3-forced-choiced identification task, where participants would be exposed to all the stimuli and would have to choose the best /i/, /e/ and /a/, or “none of the three”. This would provide a better assessment of phonetic boundaries (similar to other classical 2-forced choice discrimination tasks), as compared to a blocked design, where the participants are perceptually biased to “look for” the target vowel they are asked about. I see the advantage of the design the authors opted for, but I would be curious to hear about the authors’ thoughts on the choice.
It is unclear how the category compactness index was computed.
Analyses: The inclusion of many factors in the analyses is not motivated and needs to be amended accordingly. For example, the variable “foreign countries experience” comes as a surprise in the analyses. It needs to be motivated. The variables openness, conscientiousness, extroversion, agreeableness, and neuroticism are included with no clear explanation. The number of variables is too high, given the low number of data per participants, which might lead to low power of the results, in addition to collinearity. Also, a high number of variables might increase the rate of false positives. If the authors want to keep all the variables in the analyses, then I strongly encourage them to perform a full-null model comparison, where the full model would contain all the factors the authors are interested in and the null would exclude the main factor of interest which is the number of phonemes in the shared inventories. If the anova(full,null) is significant, it would mean that the number of phonemes does significantly affect compactness in the native language, in addition to other factors the authors want to control for.
I am surprised the vowel was not included as a fixed factor in the model. I am curious to hear the authors thoughts on that. Given the within-subject nature of the design, it is statistically not sound to run separate analyses for each vowel. This needs to be clarified.
The authors included proficiency level in the analyses, but what does it refer to? Is it proficiency in Spanish? Or in other languages? If so, then how was it computed if several languages were reported?
How the variable “living in other countries” may impact the compactness in a native language? This needs to be clearly motivated.
Discussion: The conclusions of the paper need to be toned down, as the study only tested the compactness of the Spanish 3 categories, which does not indicate how these categories are organized in relation to the phonemic categories in other languages the participants speak or to their compactness. Some of the assumptions need to be justified, as, for example, “the relationship between compactness and the number of phonetic categories within an individual's vowel space”, as the latter were not assessed specifically. The current research lacks precise measures of language proficiency and language use to ascertain that L2 speakers tested in the study indeed had separate phoneme categories.
Minor points:
This statement “more compact native (L1) categories can enhance the discrimination of non-native sounds” needs to be amended, as this implies a causal relationship, and we don’t know that yet. A softer version “more compact native (L1) categories have been related to the discrimination of non-native sounds” would be more accurate.
latter scenario => latter hypothesis?
The subheading “Individual differences in the same L1” is unclear, individual differences in what?
The authors mention (line 133) that “Their findings indicated that only acoustic memory significantly contributed to the L2 discrimination task interacting with L1 compactness”, could they clarify how specifically they interacted?
What determined the size of the sample?
Was each stimulus (out of 32 per vowel) presented only once?
Was the order of vowels assessed in the task counterbalanced?
It would be useful to have more bins in the histograms to capture better the distribution of the score.
Comments on the Quality of English LanguageVery good.
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for Authorssee attached PDF file
Comments for author File: Comments.pdf
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe current version of the manuscript addresses a number of issues raised in previous review, e.g., introduces a new variable (speaker type) that was recommended by reviewers and reports the results of new analyses. The discussion is richer and there is a number of improvements. So Bravo to the authors. However, there are several issues that prevent me from recommending the current manuscript for publication, that I would like the authors to address in a revision. I list them below.
Abstract
The authors write, in the abstract that “This finding suggests the existence of composite or merged categories in bilinguals”, however, doesn’t this contradict the hypothesis that Flege and Bohn 2021 posited, that would predict more compact categories in bilinguals? This is unclear. I suggest to either include the hypothesis and then state that it was not confirmed, or remove the ditto sentence, as, without any extra information, it is an overinterpretation of the current data. In addition, if bilinguals had merged categories, then we would also expect merged categories in multilinguals, however, it was not found. So, I would recommend to drop the “merged categories” assumption here.
Introduction:
The current version of the first section looks weak to me in terms of theoretical framing; it was useful to have the DIVA model and related studies on L1 production that suggested a mechanism for more compact/distinct L1 categories.
The two paragraphs in sections 1.1 have almost exactly the same closing sentences.
Section 1.4 – I am not sure that this section is about bilinguals’ compactness, but rather about bilinguals’ phonetic production (mean F1/F2, mean VOTs, phonetic drift, etc.). The relevance of this section to the bigger questions is a bit unclear and needs to be either dropped or edited to make it more relevant.
It was useful to have separate sections for compactness in production and perception. I recommend to add a concluding sentence for the production section.
The division bilingual/multilingual is unclear: the authors mention that the smallest number of languages in the multilingual group was two, but then why they were not included in the bilingual group?
To assess vowel compactness, the authors asked participants whether a given vowel token belonged to a given category. Other research studies used a goodness of fit task. I wonder whether the authors could comment a bit on the choice of this instruction, as I assume if they asked participants “whether a given token was a good example of /i/ (for example), then the compactness measure would be lower (more compact vowel categories). This would also align with the compactness measure in production, when a standard deviation over a mean of a category is typically computed (not all tokens). This does not necessarily weaken the design, but I believe this provides slightly different measure of compactness and it would be great to bring readers’ attention to this here (and probably in discussion).
In order to compute the global compactness score, the authors opted for a sum of three vowel categories. It would be great to have a brief explanation of the choice, as compared to, for example, a mean score. Given that the analyses are performed on the compactness index individually for each vowel, the relevance of the global score is unclear. I recommend dropping it or explaining where it is used.
In the statistical analyses, the factors “speaker type” and “number of languages spoken” are highly correlated and redundant, either of them needs to be excluded from the model. It would be useful to have the modalities for “speaker type” included in the parentheses, some readers might have forgot about them by then. Is it monolingual, bilingual and multilingual? This needs to be clarified.
Oh, now I see that you used the global score here. However, it was not clear to me what variables and which factors were included in which model. It looks like the global compactness score was included in the main analyses, but then the factor “vowel” would be irrelevant there, as the score would be a sum of the scores across the three vowels.
The inclusion of additional factors (language proficiency, age of acquisition, mode of language instruction, and experience in the target language country) in the last model (line 575) is not motivated well enough; the reason for including them needs to be provided. Also, it is unclear how the mode of language instruction in L2/Ln might impact compactness in L1.
I must confess that it was not easy (and confusing at some times) to follow the text with the changes being tracked and all being marked in the same blue ink. It would have been useful to have a final clean version.
I recommend the authors to refrain from discussing non-significant results with the RT. A propos this analysis, it is unclear what the model was, and the results (i.e., coefficients), even in the absence of significance.
Line 40: “endeavors seek to” – maybe chose one of them?
Line 78: can lead …. Leading -> please revise to avoid redundancy
Line 81: “different processing strategies in application to novel” -> consider changing to “differences in the acquisition of novel”
Line 161: “speakers of the same L1 differed in their judgments of which stimuli made up this category.” - > this is unclear, maybe replace by “the extend to which a given sound token maps a category”.
Line 269: “It would be particularly intriguing to explore the distinctions in L1 category compactness among individuals with phonological systems of varying sizes, such as monolingual versus multilingual speakers.” -> the authors need to strengthen the motivation here, why is it intriguing?
Line 300: the authors mention that participants would “intentionally compact” their categories; but I am not sure whether this was examined in the cited studies. But I might be wrong. It would be great to double-check this.
Line 381: please check the sentence, it is difficult to parse
Line 396: “the average age of acquisition (AoA)” of what? Of L2 in bilinguals, I guess. But what about multilinguals? Does it refer to L2 or L3/L4? It would be useful to have this information separately for L2, L3 and L4 (when relevant).
Line 410: consider replacing “To assess the perceptual space's compactness of the native Spanish vowels /i/, /e/, and /a/” with “To assess the compactness of the native Spanish vowels /i/, /e/, /a/ in perceptual space's”, as it is a bit easier to parse.
Line 615: “the size or compactness of their native vowel categories” -> consider changing to “the size or compactness of their native vowel categories in perceptual space.” (and then you can drop the next sentence.
Line 622: “perceptual compactness” sounds a bit odd, maybe compactness of vowel categories in perception? Similar concern applies to “perceptual range”.
Line 674: vowel spaces.
Line 740: this statement “This finding is unexpected, as one would anticipate individuals at this proficiency level to have a relatively high mastery of L2 phonology and a distinct set of L2 vowels formed apart from L1” is an overinterpretation of the results and must be toned down.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 3
Reviewer 2 Report
Comments and Suggestions for AuthorsI thank the authors for addressing my comments. I am mostly satisfied with the answers and would like to congratulate the authors on making the amendments. The manuscript is stronger and provides a sound contribution to the understanding of speech categories in monolinguals and multilinguals. However, there are still minor issues that need to be solved.
1. The DIVA model is misinterpreted, some of the claims need to be revised and/or removed. For example, to the best of my knowledge, the model does not claim that articulation is based on “of abstract motor plans”, or that “individuals with more sharply defined or compact auditory spaces have a heightened ability to discriminate sounds and produce more distinct sound differences.”, this is oversimplifying the model, which is more complex and includes a number of subsystems (feedforward, feedback, sensory error component, etc.). I believe the authors did imply a tight relationship between speech sound realizations and the auditory maps (speech sound maps), but not necessarily making predictions on the size (or compactness) of the sound maps and discrimination/production. This needs to be amended. Lastly, what are “higher-level cortical areas?”, the cortical levels are the highest levels of the brain structure.
2. Line 106 “perceptual system is finely tuned to distinguish between acoustically similar sounds” – this is a very strong and general claim that needs support, as we are not yet sure what underlies compactness and whether it is due to perceptual system being superior or tuned, it is rather the fact that when L1 categories are compact, there are more chanced that the new categories will fall within unoccupied spaces and relatively ‘far’ from native sounds, resulting in less assimilation, facilitating discrimination, so ‘similar L2 sounds’ would be acting as ‘new’. So I would just remove what is on line 106 and keep the rest.
3. These claims are rather strong “without conflicting with existing native categories. Therefore, individuals with more “empty” acoustic spaces in between their native categories may have an advantage in accurately perceiving and producing unfamiliar L2 sounds because they have more room in their perceptual system to accommodate and distinguish these new sounds.” and need to be carefully revised. We do know, from SLM, that when a phonemic difference between L1-L2 sounds is perceived, then individuals “create” new categories, there are different mechanisms for L2 creation, including drift of L1 categories, or accommodation, but I am not sure that conflict is an appropriate word. Although the “room” metaphor is beautiful, it might be slightly misleading. I would rather argue that with more compact L1 categories the likelihood that a similar L2 sound is perceived as not belonging to the L1 category is higher, facilitating L1-L2 discrimination and establishment of L2 speech sounds for perception and production.
4. It would be helpful to use kHz2 for compactness score, to get rid from extra numbers, e.g., 211542.4 Hz = 211.5kHz2. Notice, that the compactness score (as a measure of an area) is in square.
5. Session 3.2 is not clear to me, if there was only one value per participant (global compactness score), then how is it possible to have an estimate for the intercept in model 1 that requires at least 2 data points per participant to estimate it? Idem for the second model, which then shall not be called mixed-effect model.
6. Regarding the statistical analyses, I think it is a pity to compute a global measure score when the authors have tested compactness of three separate vowels. It would have been statistically sound to examine compactness as a function of speaker_type and vowel (and maybe interaction), to account for differences in compactness between vowels, but also include speaker-specific variability across vowels (it would be an intercept). This are my two cents, not meaning that the authors need to amend their analyses.
Author Response
Please see the attachment.
Author Response File: Author Response.docx