2. Experiment 1
The first experiment consisted of a categorization task with goodness-of-fit rating. This experiment was designed to provide information on how Hungarian listeners map the Portuguese [ɐ] and [ɨ] into their L1 system. To obtain a complete picture and identify possible cases of perceptual overlap, we included the nine EP oral vowels: [a], [ɐ], [ɛ], [e], [ɨ], [i], [ɔ], [o], and [u]. Hungarian participants were presented with auditory tokens of these vowels and asked to identify each one of them as instances of Hungarian vowel categories, represented orthographically. As described before, Hungarian contains seven short vowels and seven long vowels, which correspond to only nine different vocalic qualities: [ɒ], [aː], [ɛ], [eː], [i], [i:], [o], [ø], [u], and [y]. Considering that our goal was to observe perception based on vowel quality, and not on quantity, we only included the nine Hungarian vowels that differ in this respect.
2.1. Method
2.1.1. Participants
We recruited Hungarian native speakers with no previous or present contact with EP as listeners in our experiment through a link shared in Hungarian universities. To select participants, we collected sociolinguistic data by means of a questionnaire. We included 72 Hungarian speakers in the data analysis. These participants were aged between 18 and 45, and 47 were female speakers. All participants reported intermediate or advanced proficiency in English. German was the second most frequently reported additional language, also at an intermediate or advanced level (n = 22), followed by French (n = 10), Spanish (n = 9), Italian (n = 6), and Russian (n = 3).
To control for stimuli nativeness, we ran the tests with 30 EP native speakers (21 female) as well. They were all from the standard dialectal area
3 and had no previous contact with Hungarian. Participants in this group were also aged between 18 and 45.
Neither the Hungarian nor the Portuguese speakers reported hearing problems.
2.1.2. Stimuli
Auditory stimuli comprised the target EP oral vowels inserted in a CV structure. We opted to use vowels in context based on the fact that vowels are better perceived in consonantal context than in isolation (
Deme 2014;
Rakerd 1984) and, crosslinguistically, the CV structure is the most frequent unmarked syllable structure (
Rice 2007). To select the consonant, sequential searches were run in the Hungarian corpus “Szószablya” (
Halácsy et al. 2003) and the Portuguese corpus “FrePoP” (
Frota et al. 2010) in order to find a CV syllable that complies with both the L1 (Hungarian) and the target L3 (EP) phonotactics and has no meaning in either language. This way, we could prevent the influence of word meaning affecting our results. Using this method, we selected the structure [ɡV]. We also recorded stimuli for the familiarization tasks, produced in the L1 of the participants (Hungarian and EP). For these stimuli, we selected a highly frequent consonant in both languages, [t], and created [tV]-shaped items.
We recruited three female EP native speakers (aged 42, 42, and 45), also from the standard dialectal area of Lisbon, to record the EP stimuli. The target [ɡV] and [tV] syllables were recorded in carrier sentences to control for vowel quality and intonation. The speakers were also provided examples of real words containing the target syllables, as exemplified in (1). The tokens selected for the experiment were the last in each phrase so that they had similar intonation patterns (a falling pitch contour) and speech rate.
(1) | a. | GUERRA [ɡɛ]. Diga [ɡɛ], por favor: [ɡɛ]. |
| | “WAR [ɡɛ]. Say [ɡɛ], please: [ɡɛ]”. |
| b. | TELA [tɛ]. Diga [tɛ], por favor: [tɛ]. |
| | “CANVAS [tɛ]. Say [tɛ], please: [tɛ]”. |
Recording sessions took place individually and at each speaker’s home, due to the lockdown restrictions imposed at the time of the recordings. For the purposes of the recordings, a TASCAM DR-05 V2 digital recorder and a Beyerdynamic MCE 85 BA condenser microphone were used. The file format was set to .wav, with the sampling frequency set to 44,100 Hz, mono, and 32-bit depth. Background noise was eliminated individually for each recording with Audacity (
Audacity Team 2020). Special attention was given to normalization of vowel duration, since Hungarian listeners display sensitivity to vowel length. Vowel length was manipulated using Praat (
Boersma and Weenink 2020). Mean intensity averaged over the total course of each syllable was equalized across tokens.
Other than the EP stimuli, we also recorded [tV] syllables in Hungarian for the familiarization task presented to the Hungarian participants. These tokens were recorded by three female Hungarian native speakers (aged 27, 35, and 47), and similar preparation steps were taken as for the EP stimuli.
The final set of stimuli consisted of 81 tokens: 54 produced by female Portuguese speakers ((9 [ɡV] for the main trials + 9 [tV] for the familiarization trials with the Portuguese participants) × 3 speakers) and 27 produced by female Hungarian speakers (9 [tV] for the familiarization trials with the Hungarian participants × 3 speakers).
Additionally, we looked into the EP tokens’ acoustic parameters to investigate the relationship between the Hungarian and the EP acoustic vowel space. Vowel onset was identified at the first voicing bar of a complete formant structure. Intensity, fundamental frequency (f0), and the first two formant frequencies (F1 and F2) of each vowel were calculated from the median of the values measured in a 10% window in the midsection of the vowel. For f0 measures, the pitch floor was set to 100 Hz and the ceiling to 500 Hz, as recommended for female voices. To estimate formant frequencies, we used the Burg algorithm and set the maximum number of formants to 5 below 5500 Hz, while we used the default settings in every other parameter (
Boersma and Weenink 2020). A summary of the acoustic data for the stimuli included in the main trials ([ɡV]) is presented in
Table 1.
The orthographic stimuli consisted of sets of real words written in the L1 of the participants. To select the Hungarian words, we carried out a sequential search conducted in the Hungarian corpus “Szószablya” (
Halácsy et al. 2003). We opted to use words that had [ɡV.CVC] structure and were similar in token frequency. To ensure comparable familiarity across words presented to Hungarian listeners, a questionnaire was conducted with 25 Hungarian native speakers (aged 18–45), who rated how familiar they were with different words with a [ɡV.CVC] structure on a scale from 1 (not familiar at all) to 4 (very familiar). All words selected for the main trials presented a median rating of 4 (very familiar). In the familiarization trials, we used real words with a [tV.CVC] structure to comply with the words for the main trials. The set of words we presented to the Portuguese participants to test stimuli nativeness included real words with [ɡV] or [tV] as a first syllable and that were believed to be easily identifiable by Portuguese listeners.
Appendix A displays the words presented to both groups of participants.
We placed each set of words in 3 × 3 grids, with the first (target) syllables highlighted in yellow, as exemplified in
Figure 4.
2.1.3. Procedure
The experiment was built in PsychoPy, version 2020.2.3 (
Peirce et al. 2019), and hosted online by pavlovia.org. Two versions were created, one for the Hungarian listeners and one for the Portuguese listeners. The two versions had similar structure and design, differing only in the language of the instructions and auditory stimuli for the familiarization trials, which were presented in the respective L1.
We asked the participants to match the EP auditory stimuli to L1 real words, displayed in the grids. Participants were presented with each token twice. After the first audition, listeners were asked to choose a word from the grid. Following this, a second audition of the same token followed, and participants had to rate the goodness-of-fit of the auditory stimulus on a four-point scale (1 = very bad example, 2 = bad example, 3 = good example, 4 = very good example). By using a four-point Likert scale, our goal was to force participants to express a clearly positive (3 and 4) or negative (1 and 2) judgment and to avoid the “face-saving don’t know” effect (
Sturgis et al. 2014). Before the main task, participants had to complete two familiarization tasks with nine trials each, with one trial for each [tV] stimulus. The first consisted of identification with feedback (“Correct!” or “Incorrect… try again”.). In the case of an incorrect response, participants repeated the same trial immediately after. No limit for attempts was set. The second familiarization task was similar, with [tV] stimuli and nine trials, but it included a goodness-of-fit rating, and no feedback was given to the participants. The main task consisted of 27 trials with the [gV] stimuli (9 EP oral vowels × 3 speakers), randomized across participants. No feedback was provided in these trials.
2.1.4. Analyses
Raw data were processed and further analyzed in R (
R Core Team 2021). We collected participants’ answers (number of times a real word was chosen for each EP vowel and respective goodness-of-fit rating).
For the confusion matrixes, we calculated mean percentages of identification and medians of rating for each EP vowel per L1 group. We used one-sample Wilcoxon signed-rank tests to determine when these percentages were significantly above chance level (i.e., 11.1%, as calculated by dividing one EP vowel by nine possible Hungarian categories).
We also looked into possible speaker effects in the identification of the EP vowels using chi-square tests.
A further aim of our study was to observe whether knowledge of additional L2s has an effect on how Hungarians perceive the EP [ɐ] and [ɨ] vowels. Considering that all participants had experience with English, it was not possible to assess whether knowledge of this language influenced the results. Consequently, we only looked into the possible effect of knowledge of German, since 31% of the participants reported speaking this language. To this purpose, we also conducted chi-square tests.
2.2. Results
Results collected for the Portuguese listeners (
Table 2) revealed that the EP vowels were identified as the expected EP category, except for [o], which was identified as /o/ (50.0%) or /u/ (46.7%). Furthermore, all tokens were perceived by Portuguese participants as being a good or very good example of the given category (median = 3 or 4).
Results for the Hungarian listeners, as we can see in
Table 3, were more dispersed. Four EP vowels—[a], [e], [i], and [u]—were systematically identified as four Hungarian categories, /aː/, /eː/, /i/, and /u/, respectively. Similar to what was observed with the Portuguese listeners, [o] was identified as /o/ (57.4%) or /u/ (42.1%). Chi-square tests revealed that the perception of this EP vowel was affected by the identity of the speaker who produced the tokens, in the case of both Portuguese listeners (
χ2 = 45.067,
p < 0.001) and Hungarian participants (
χ2 = 113.87,
p < 0.001). Regarding the critical target vowels, [ɐ] and [ɨ], the first was identified as /ɛ/ (68.5%) or /ø/ (30.1%), while the identification of the second was split between /y/ (73.1%) and /ø/ (25.0%). In what concerns the perception of the vowel [ɛ], Hungarian listeners identified this EP vowel as /ɛ/ (58.8%) or /eː/ (39.8%). In this case, the goodness-of-fit rating values (median = 2) suggest that the EP [ɛ] was not considered a good example of either the Hungarian /ɛ/ or the /eː/. In the remaining EP vowels, ratings for the more systematic choices were positive (median = 3), or even very positive (median = 4), as in the case of [a].
Table 2.
Mean percentage identifications and median of goodness-of-fit ratings for the Portuguese listeners (n = 30). Portuguese words are presented in capital letters with the correspondent phonemic notation of the first syllable above. Numbers in boldface represent the stimuli identified above chance level (11.1%), according to Wilcoxon signed-rank tests. The rectangles identify the expected answers.
Table 2.
Mean percentage identifications and median of goodness-of-fit ratings for the Portuguese listeners (n = 30). Portuguese words are presented in capital letters with the correspondent phonemic notation of the first syllable above. Numbers in boldface represent the stimuli identified above chance level (11.1%), according to Wilcoxon signed-rank tests. The rectangles identify the expected answers.
| | Orthographic Stimuli |
---|
| | /ɡa/ | /ɡɐ/ | /ɡɛ/ | /ɡe/ | /ɡɨ/ | /ɡi/ | /ɡɔ/ | /ɡo/ | /ɡu/ |
---|
| | GATO | GAVETA | GUERRA | GUÊ | GUERREAR | GUITO | GOLA | GOTA | GULA |
---|
Auditory stimuli | [ɡa] | 98.9% (4) | 1.1% (2) | | | | | | | |
[ɡɐ] | 2.2% (3) | 95.6% (4) | | | 2.2% (2.5) | | | | |
[ɡɛ] | | | 90.0% (3) | 8.9% (3) | 1.1% (2) | | | | |
[ɡe] | | | 4.4% (3) | 87.8% (3) | 5.6% (2) | 2.2% (2) | | | |
[ɡɨ] | | | 2.2% (3.5) | 5.6% (2) | 92.2% (3) | | | | |
[ɡi] | | | | 8.9% (2) | 1.1% (2) | 90.0% (3) | | | |
[ɡɔ] | | | | | | | 94.4% (4) | 5.6% (3) | |
[ɡo] | | | | | 1.1% (4) | | 2.2% (3) | 50.0% (4) | 46.7% (3) |
[ɡu] | 1.1% (1) | | | 1.1% (3) | | | | 4.4% (3) | 93.3% (3) |
Table 3.
Mean percentage identifications and median of goodness-of-fit ratings for the Hungarian listeners (n = 72). Hungarian words are presented in capital letters with the correspondent phonemic notation of the first syllable above. Numbers in boldface represent the stimuli identified above chance level (11.1%), according to Wilcoxon signed-rank tests.
Table 3.
Mean percentage identifications and median of goodness-of-fit ratings for the Hungarian listeners (n = 72). Hungarian words are presented in capital letters with the correspondent phonemic notation of the first syllable above. Numbers in boldface represent the stimuli identified above chance level (11.1%), according to Wilcoxon signed-rank tests.
| | Orthographic Stimuli |
---|
| | /ɡa/ | /ɡɐ/ | /ɡɛ/ | /ɡe/ | /ɡɨ/ | /ɡi/ | /ɡɔ/ | /ɡo/ | /ɡu/ |
---|
| | GÁBOR | GARÁZS | GERELY | GÉPÉSZ | GITÁR | GONOSZ | GÖRÉNY | GULYÁS | GÜGYÖG |
---|
Auditory stimuli | [ɡa] | 97.7% (4) | 2.3% (3) | | | | | | | |
[ɡɐ] | 0.5% (2) | | 68.5% (3) | | | | 30.1% (3) | 0.5% (2) | 0.5% (4) |
[ɡɛ] | | | 58.8% (2) | 39.8% (2) | | | 1.4% (1) | | |
[ɡe] | | | 1.4% (3) | 81.9% (3) | 16.2% (2) | | 0.5% (2) | | |
[ɡɨ] | | | | 1.9% (1.5) | | | 25.0% (2) | | 73.1% (3) |
[ɡi] | | | | 1.9% (1) | 98.1% (3) | | | | |
[ɡɔ] | | 78.7% (3) | | | | 21.3% (3) | | | |
[ɡo] | | | | | | 57.4% (3) | 0.5% (3) | 42.1% (3) | |
[ɡu] | | | | | | 10.6% (3) | | 89.4% (3) | |
To investigate the relationship between the results for the Hungarian listeners and the acoustic distance between L1 and EP vowels, we plotted the F1 × F2 vowel space for Hungarian vowels in perception; EP [ɐ], [ɛ], [e], and [ɨ] tokens produced for the experiment; and EP reference values obtained from a standard variety of EP for production (
Figure 5).
As we can see from
Figure 5, [ɐ] tokens used in the experiment (colored triangles) are closer to the Hungarian /ɛ/, which explains this L1 category as first choice (68.5%) for the Hungarian listeners when perceiving the EP [ɐ]. However, the identification of [ɐ] as /ø/ in 30.1% of cases cannot be explained by the distance between the two vowels. Furthermore, even though the EP [ɨ] seems to be located near the Hungarian /eː/ in the acoustic vowel space, it was identified as /y/ in 73.1% of cases and as /ø/ in 25% of cases.
A possible explanation may be that tokens recorded for the experiment (colored triangles) were deviant from the EP vowels we expect in female speakers (grey triangles). However, as shown in
Figure 5, this explanation does not hold up, as the vowels in our tokens align well with the reference vowels in the acoustic vowel space. An alternative explanation may be related to the Hungarian perceptual map. The F1 and F2 frequency values for the tokens used in the experiment (produced by female speakers) seem to be shifted towards higher F1 and F2 frequency values compared to values for the Hungarian vowels in the reference map we presented in
Figure 5. If the values for the Hungarian vowels in perception were estimated based on measures taken from one (or several) male Hungarian speaker(s), this would mean that due to the larger vocal tract males generally have, we would find lower formant frequencies for the Hungarian vowels than for the Portuguese tokens, which might provide an explanation for the results. To test this, we calculated formant dispersion (FD) values (calculated for individual speakers and averaged in the case of the 3 Portuguese speakers), as these are indicators of body size (
Fitch 1997).
4 The results (see Data Availability Statement) support our assumption: The FD value calculated for the Hungarian vowels in perception (1017 Hz) was lower than for the Portuguese tokens (1226 Hz), which indicates a longer vocal tract—that is, a male speaker, in the case of Hungarian. If we place the reference values for vowels produced by male EP speakers in the vowel space in
Figure 5 (grey squares), we see that [ɐ] and [ɨ] produced by male EP speakers fall between /ɛ/ and /ø/, and /y/ and /ø/, respectively, which explains the unexpected results we obtained in our study.
The high frequency of [ɛ] in Hungarian may provide an explanation of the finding that the EP [ɐ] was more systematically identified as /ɛ/ than as /ø/ (68.5% vs. 30.1%). According to
Gósy (
2004), [ɛ] is the most frequent Hungarian vowel, with an occurrence of 11.4%, while [ø] has a frequency of only 1.1%. This suggests that vowel perception of Hungarian speakers may be biased towards [ɛ] (p. 89). Furthermore, contrary to /ø/, neither the EP [ɐ] nor the Hungarian /ɛ/ are rounded; hence, these latter two may be considered more similar to each other (than to Hungarian /ø/).
Lastly, the identification of the EP [ɛ] was balanced between /ɛ/ and /eː/. As shown in
Figure 5, reference F1 and F2 frequency values for EP [ɛ] produced by male speakers (grey square) are in the vowel space between the Hungarian values for /ɛ/ and /eː/, but closer to /eː/. Recall that the median goodness-of-fit rating in the identification of the EP vowel [ɛ] was 2, both as /ɛ/ and /eː/, suggesting that Hungarian listeners did not consider EP [ɛ] to be a good fit for the Hungarian /ɛ/ or /eː/, the two most frequently picked candidates.
Considering that 22 participants reported having an intermediate or advanced level in German, we also investigated the possible effect of this knowledge in the categorization of the two EP vowels absent in the participants’ L1, [ɐ] and [ɨ]. Additionally, due to the difficulties observed with the perception of the EP /ɛ/ and the acoustic distance between this vowel and the correspondent Hungarian category, we also analyzed the data for the categorization of [ɛ]. Results from chi-square tests run for each vowel’s trials revealed no effect of German knowledge in any of the cases (no significant effects were found).
2.3. Perceptual Overlap and Predictions for Discrimination
The data collected in the categorization experiment allowed us to put forward predictions for the discrimination of EP vowel contrasts involving [ɐ] and [ɨ] directly or indirectly by Hungarian listeners. Following
Guion et al. (
2000), we calculated the fit index (proportion of categorization × median of goodness-of-fit rating) for the identification of each EP vowel (
Table 4).
5 Although the calculation of the fit index presents some methodological questions (
Tyler 2021) and thus, the information displayed in
Table 4 should be interpreted cautiously, it remains an effective way of showing problematic contrasts.
As shown in
Table 4 below, the EP vowels [a] and [i] (and, to a lesser extent, [e]) reached the highest fit indexes. This suggests that these were accepted as good instances of the respective Hungarian category, and conversely, the remaining EP vowels, [ɐ], [ɛ], and [ɨ], which presented the lowest fit index, were perceived as “non-native” or deviant categories. We can also observe several situations of perceptual overlap, that is, when two EP vowels are perceived as a single Hungarian category: [ɐ] and [ɛ] (identified as /ɛ/), [ɛ] and [e] (identified as /eː/), [ɐ] and [ɨ] (identified as /ø/), and [e] and [i] (identified as /i/).
With respect to predictions for EP contrast discrimination, we calculated the perceptual overlap scores following
Faris et al. (
2018). To this purpose, we took the sum of the smaller percentages when both EP vowels in a given contrast were identified as the same L1 category, excluding identification rates below chance level, since these are not systematic choices and might correspond to involuntary responses (due to tiredness, for example). These perceptual overlap scores were calculated based on group means (displayed in
Table 3). For example, the EP [ɛ] was identified by the Hungarian listeners as /ɛ/ (in 58.8% of the trials) or as /eː/ (in 39.8% of the trials). On the other hand, the EP [e] was also identified by the Hungarian listeners as /ɛ/ or as /eː/. However, given that identification of [e] as /ɛ/ was below chance level, the probability of both the EP [ɛ] and [e] being perceived as /ɛ/ should not be considered for the calculation of the perceptual overlap score. Therefore, only the probability of the [ɛ]-[e] contrast being perceived as /eː/ was considered. The overlap score for this contrast was 39.8%, corresponding to the lowest value between 58.8% (identification of [ɛ] as /eː/) and 81.9% (identification of [e] as /eː/). In other words, there was a 39.8% probability that the [ɛ]-[e] contrast would be perceived as a unique vowel by the Hungarian listeners: /eː/.
Table 5 displays the perceptual overlap scores for each of the contrasts in focus.
Other than observing performance at the group level, we also aimed at investigating how the scores calculated based on group means relate to perceptual overlap at the individual level, and which of the two measures—based on group means or based on individual performances—is a better predictor for L3 vowel discrimination. To this purpose, we grouped Hungarian participants by levels of perceptual overlap scores for each EP contrast. The levels of overlap scores were calculated in the following way: For each pair of EP vowels, each participant completed six trials (2 vowels × 3 tokens). If, in the six trials, the participant identified both vowels of the contrast as one L1 category, it means that 100% overlap occurred. If, in six trials, the two vowels were identified five times as one Hungarian category, then 83.3% overlap occurred, and so on. We then determined the proportion of Hungarian listeners for each level, in each EP contrast. For example, in the case of the EP [ɐ] and [ɛ], only one participant identified these vowels always as /ɛ/; that is, only in 1.4% of the participants did we observe a total (100%) perceptual overlap. On the other hand, 65 participants (90.3%) identified the two EP vowels as /ɛ/ in only three of the six trials (i.e., 50% of the cases). Therefore, for most of the participants, a 50% perceptual overlap was observed. Results for perceptual overlap scores based on individual performances are presented in
Table 6.
As shown in
Table 5 and
Table 6, the values obtained from individual scores are not entirely in line with the scores obtained from group mean results. First, according to the overlap scores based on group means, we obtained the value of 58.8% for the [ɐ]-[ɛ] contrast, which was the highest perceptual overlap score based on group means. However, according to individual results, 90.3% of the participants identified both [ɐ] and [ɛ] as /ɛ/ only in half of the trials. Regarding the [ɛ]-[e] contrast, when both vowels were identified as /eː/, we obtained a 39.8% score based on group means. However, 33.3% of the participants revealed a 50% perceptual overlap, and 43.1% of the participants showed an 83.3% overlap score. This suggests that, if we consider individual answers, the EP [ɛ]-[e] contrast may be, in fact, more problematic than the [ɐ]-[ɛ] contrast. A similar situation was observed for the [ɐ]-[ɨ] and [e]-[i] contrasts. If we consider overlap scores based on group means, we can conclude that the [ɐ]-[ɨ] contrast should pose more problems to Hungarian listeners than the [e]-[i] contrast, since the former exhibited a higher perceptual overlap score than the latter (25.0% against 16.1%). However, when we look at the proportion of participants for each level of overlap, we should conclude that the [e]-[i] contrast will be more problematic for Hungarian listeners than [ɐ]-[ɨ]. In this contrast, for 95.8% of the participants the overlap level was equal to or higher than 50%. As for [ɐ]-[ɨ], only 4.2% of the participants displayed an overlap level of 50%, while the remaining section of participants displayed lower levels of overlap.
In summary, it follows from both measures that the [ɐ]-[ɛ] and [e]-[ɛ] EP contrasts pose more difficulties in discrimination for Hungarian listeners than the [ɐ]-[ɨ] and [e]-[i] contrasts. According to group means’ results, we obtained the following hierarchy for the predicted discrimination difficulties (from most difficult to easiest): [ɐ]-[ɛ] > [ɛ]-[e] > [ɐ]-[ɨ] > [e]-[i]. However, according to individual results, the hierarchy is as follows: [ɛ]-[e] > [ɐ]-[ɛ] > [e]-[i] > [ɐ]-[ɨ]. To assess whether perceptual overlap affects EP contrast perception and which perceptual overlap measure—scores based on group means or on individual results—does more accurately predict discrimination difficulties, we designed an oddity discrimination task, which we describe next (Experiment 2).
3. Experiment 2
Experiment 2 consisted of an oddity discrimination task in which participants were presented with the EP vowel contrasts where perceptual overlap was observed in Experiment 1: [ɐ]-[ɛ], [ɛ]-[e], [ɐ]-[ɨ], and [e]-[i]. To test the effect of perceptual overlap, we added three contrasts in which Hungarian listeners did not display perceptual overlap: [a]-[ɐ], [ɛ]-[ɨ], and [e]-[ɨ]. These contrasts correspond to the relevant EP stressed–unstressed pairs involving the target EP vowels, [ɐ] and [ɨ]. The trials consisted of sequences of three EP vowels, two vowels of the same category and one of a different category (the odd). For example, for the [ɐ]-[ɛ] contrast, the following sequences were created: [ɐ]-[ɛ]-[ɛ], [ɐ]-[ɐ]-[ɛ], [ɐ]-[ɛ]-[ɐ], [ɛ]-[ɛ]-[ɐ], [ɛ]-[ɐ]-[ɛ], and [ɛ]-[ɐ]-[ɐ]. Other than change trials, we included catch trials in the material as well, in which participants were presented with sequences of three tokens from the same category (e.g., [ɐ]-[ɐ]-[ɐ] and [ɛ]-[ɛ]-[ɛ]). Consequently, in each trial, participants had two tasks: decide whether the three tokens belong to a unique category or not, and if not, identify the odd token.
3.1. Method
3.1.1. Participants
For the second experiment, a second group of Hungarian participants was recruited. Considering that this experiment was part of a wider project in perceptual training for L3 Portuguese, participants were recruited from beginner Portuguese language courses in different universities across Hungary at the onset of learning. Seventy participants completed the experiment and reported not having attended a Portuguese course before or having any significant (previous or present) contact with Portuguese. Participants completed the experiment in the first week of the course. They were aged 18 to 44 (mean age = 22.1), and 51 were female speakers. All participants reported intermediate or advanced knowledge of English. Other languages reported at an intermediate or advanced level were Spanish (n = 21), German (n = 18), Italian (n = 12), French (n = 7), Romanian (n = 3), Slovak, Catalan, and Dutch (n = 1 for each of these languages).
Furthermore, a second group of EP native speakers was also recruited to serve as a baseline group. The Portuguese group included 13 participants (10 female), all from the dialectal area of Lisbon, and aged 21 to 43 (mean age = 25.1).
Neither the Hungarian nor the Portuguese participants selected reported hearing problems.
3.1.2. Stimuli
The stimuli were the same as in Experiment 1, and each trial included a sequence of three tokens, one from each of the three Portuguese female speakers. Following previous studies (
Escudero and Wanrooij 2010;
Flege and Mackay 2004), we set the inter stimulus interval (ISI) to 1200 ms. By combining a longer ISI with speaker variability, we aimed at promoting high-level acoustic processing (categorical processing) of speech sounds, rather than low-level phonetic discrimination of stimuli (that also includes speaker identity) during task completion.
3.1.3. Procedure
The experiment was built and conducted online in Gorilla Experiment Builder (
Anwyl-Irvine et al. 2020). Similar to Experiment 1, we built two versions, one presented to the Hungarian participants and another to the baseline group, with Portuguese native speakers. These versions were identical in structure and design, except for the instructions and auditory tokens for the familiarization tasks, which were in the L1 of each group.
The main trials were preceded by a familiarization task, in which participants were presented with eight trials containing [tV] tokens produced in the participant’s L1. The familiarization task included sequences that were easier to discriminate than others (e.g., [to]-[ti]-[to]), as well as sequences that were acoustically more similar than others (e.g., [tɛ]-[te]-[tɛ]). Immediate feedback was provided after each trial in written form (“Correct!” or “Incorrect… try again”). If the answer was incorrect, participants had to repeat the same trial immediately after, until they reached the correct answer. No limit of attempts was set. The main task consisted of the 48 trials with the EP [ɡV] tokens: 42 change trials (7 contrasts × 6 orders: AAB/ABA/BAA/BBA/BAB/ABB) and 6 catch trials (6 vowels × 1 order). These trials were presented in a single block and were randomized between participants. No feedback was provided in the main trials.
Figure 6 displays a computer screen of one trial.
3.1.4. Analyses
Data were processed and analyzed with the software R version 4.4.1 (
R Core Team 2021). We collected correct answers and analyzed the results in two ways. First, we calculated A-prime (
A’) scores for each participant and each contrast, integrating results from catch trials and change trials. With this, we aimed at obtaining “an unbiased measure of perceptual sensitivity by taking into account the responses to the different trials and the catch trials” (
Guion et al. 2000, p. 2718).
A’ scores were calculated as the proportion of hits and false alarms (correct and incorrect selection of an odd item, respectively), and they ranged between 0 = chance-level discrimination and 1 = perfect discrimination (
Makowski 2018)
6. We then ran linear mixed-effect models with the LMER function (lme4 package,
Bates et al. 2015) on these scores. Second, we analyzed responses in change trials and catch trials separately by building linear mixed-effect logistic models (correct answer = 1, incorrect answer = 0) with the GLMER function (lme4 package,
Bates et al. 2015). For both analysis—
A’ scores and responses in change and catch trials—we created null models (with the participant as the random effect) and conducted successive ANOVA tests on the log-likelihood ratio to assess whether adding the fixed effects significantly contributed to explaining the variance. In
Appendix B, we report the best fitting models obtained. We also conducted pairwise comparisons of least-square means with the EMMEANS function (emmeans package,
Lenth 2024), with Bonferroni corrections.
3.2. Results
A-prime scores are presented in
Table 7. While in some contrasts the score was near 1 (indicating a very good discrimination), in other cases it was below 0.7 or even close to 0.5 (indicating limited sensitivity to the contrast, to different extents). Furthermore, although Portuguese listeners outperformed the Hungarian listeners in every contrast, in some cases differences between Portuguese and Hungarian listeners were more pronounced than in others, especially in two contrasts: [ɐ]-[ɨ] and [ɐ]-[ɛ].
We first compared results while taking into consideration the presence/absence of perceptual overlap—that is, comparing
A’ scores for the [ɐ]-[ɛ], [ɛ]-[e], [ɐ]-[ɨ], and [e]-[i] contrasts to scores for the [a]-[ɐ], [ɛ]-[ɨ], and [e]-[ɨ] contrasts (
Figure 7). We ran a linear mixed-effect model with “contrast type” (contrasts with perceptual overlap vs. contrasts without perceptual overlap) and “L1” (Hungarian vs. Portuguese) as the fixed effects and found a significant contrast type×L1 interaction (
χ2(2) = 30.871,
p < 0.001).
Pairwise comparisons revealed that Hungarian listeners performed significantly worse than Portuguese listeners in both the “contrasts with perceptual overlap” and “contrasts without perceptual overlap” conditions (p < 0.001 in both cases). Furthermore, the Hungarian group displayed a significantly lower accuracy in the “contrasts with perceptual overlap” condition than in the “contrasts without perceptual overlap” condition (p < 0.001).
Regarding discrimination of vowel contrasts with perceptual overlap (
Figure 8), linear mixed-effect modeling revealed a vowel contrast×L1 interaction (
χ2(4) = 42.878,
p < 0.001). The pairwise comparisons showed that in two EP contrasts, [ɐ]-[ɛ] and [ɐ]-[ɨ], Hungarian listeners displayed significantly lower accuracy in discrimination than the Portuguese participants (
p = 0.0074 and
p < 0.001, respectively). As for differences between contrast discrimination, for the Hungarian listeners, pairwise comparisons showed that the [ɐ]-[ɨ] contrast had a significantly lower accuracy than the remaining contrasts ([ɐ]-[ɨ] vs. [ɐ]-[ɛ]:
p = 0.0075; [ɐ]-[ɨ] vs. [ɛ]-[e]:
p = 0.0048; [ɐ]-[ɨ] vs. [e]-[i]:
p < 0.001). Moreover, the [e]-[i] contrast was significantly higher in accuracy than the [ɛ]-[e] and [ɐ]-[ɛ] contrasts (
p = 0.0058 and
p < 0.001, respectively).
Other than looking into the
A’ scores, we also investigated the mean accuracy of Hungarian listeners for change and catch trials separately (
Figure 9).
A linear mixed-effect logistic model showed that the trial type (change vs. catch) significantly affected mean identification accuracy (%) (χ2(1) = 32.216, p < 0.001), with participants showing more difficulties in the change trials than in the catch trials. In the change trials, Hungarian listeners displayed a significantly lower accuracy in the [ɐ]-[ɛ] and [ɛ]-[e] contrasts compared to [ɐ]-[ɨ] and [e]-[i] (p < 0.001 in all pairwise comparisons). Regarding the catch trial results, we observed that accuracy in the perception of the vowel [ɨ] was significantly lower (below 40%) than in the remaining vowels ([ɨ] vs. [ɐ]: p = 0.0026; [ɨ] vs. [ɛ]: p < 0.001; [ɨ] vs. [e]: p = < 0.001; [ɨ] vs. [i]: p = < 0.001). The EP [ɐ] also posed problems to the Hungarian participants, although to a lesser extent ([ɐ] vs. [ɛ]: p = 0.0018; [ɐ] vs. [ɨ]: p = 0.0026; [ɐ] vs. [i]: p = 0.0062).
Additionally, we looked at the effect of non-nativeness (absence or presence of the vowels in the L1 vocalic system), comparing accuracy in discrimination between trials that included the EP vowels [ɐ] or [ɨ] with the remaining trials. In the analysis of the A’ scores, we found a significant effect of non-nativeness (χ2(1) = 22.821, p < 0.001); that is, Hungarian listeners had significantly more difficulties in the perception of contrasts that included [ɐ] or [ɨ] than in other contrasts. As for the results of the separate change trials and catch trials, we did not find any significant effect of non-nativeness when analyzing change trials only. However, in the catch trials, comparing the accuracy rates for [ɐ] and [ɨ] with the accuracy rates for the vowels [ɛ], [e], and [i], we found a significant effect (χ2(1) = 56.182, p < 0.001).
Finally, similar to the analysis of the results collected in Experiment 1, we also investigated a possible effect of other languages spoken by the participants. In the present experiment, 30% of the participants reported knowledge of Spanish and 26% knowledge of German. Regarding the effect of knowledge of Spanish, we did not find any significant result. As for the effect of German, results from a linear mixed-effect model in the
A’ scores showed a significant interaction, L3 German knowledge×non-nativeness, in discrimination of the EP contrasts (
χ2(2) = 22.861,
p < 0.001). Pairwise comparisons showed that while participants without knowledge of German had significantly more difficulties in the contrasts with vowels that are absent from the Hungarian vowel system than in contrasts with familiar vowels, this was not observed in the results of participants with German (
Figure 10, on the left). Furthermore, we also investigated the results of the catch trials for the EP [ɐ] and [ɨ]. We found a significant effect of L3 German knowledge in the perception of [ɐ] (
χ2(1) = 4.1741,
p = 0.041;
Figure 10, on the right side), but not in the case of [ɨ].
4. General Discussion
The connection between the findings from the categorization and the discrimination tasks conducted in the present study provides insight into how perceptual overlap can predict discrimination in L3. Our first aim was to observe whether perceptual overlap affects discrimination in L3 Portuguese. The data analysis revealed that Hungarian listeners display significantly more difficulties when discriminating EP contrasts that are perceived with overlap than contrasts in which perceptual overlap is absent. Such results are in line with previous research on L2 speech perception (
Flege and Mackay 2004;
Tyler et al. 2014;
Faris et al. 2018;
Elvin et al. 2021). Additionally, we aimed at investigating the extent to which the amount of perceptual overlap relates to difficulties in the discrimination of L3 Portuguese contrasts. To this purpose, we compared two measures for perceptual overlap scores: (i) We calculated the overlap scores considering group means (
Table 5), and (ii) we determined the possible levels of overlap scores for each EP contrast, calculating the proportion of participants for each case (
Table 6). With the latter, we aimed to account for intersubject variability. According to both approaches, the [ɐ]-[ɛ] and [ɛ]-[e] contrasts displayed higher perceptual overlap than [ɐ]-[ɨ] and [e]-[i], and consequently, we expected the former two contrasts to cause more discrimination difficulties to Hungarian listeners. However, comparison between the two approaches presented the following discrepancy: If we consider perceptual overlap scores that are measured based on group means, [ɐ]-[ɛ] should present more difficulties than [ɛ]-[e], and [ɐ]-[ɨ] should present more difficulties than [e]-[i]. If our predictions are based on the measures of perceptual overlap calculated from individual results, [ɛ]-[e] should present more difficulties than [ɐ]-[ɛ], and [e]-[i] should present more difficulties than [ɐ]-[ɨ].
A-prime scores obtained in the oddity discrimination task were not completely in line with either measure. First, based on the data collected, we obtained the following hierarchy of discrimination difficulties (from most difficult to easiest): [ɐ]-[ɨ] > [ɐ]-[ɛ] = [ɛ]-[e] > [e]-[i]. Accuracy for [ɛ]-[e] was not statistically different from that obtained in [ɐ]-[ɛ], and [e]-[i] was discriminated significantly better than [ɐ]-[ɨ]. Second, in the [ɛ]-[e] and [e]-[i] contrasts, Hungarian listeners did not perform differently than Portuguese listeners. In summary, the predictions for discrimination based on perceptual overlap scores calculated from the categorization task results were not confirmed by the A’ scores.
The analysis of mean accuracy for separate change and catch trials may provide further insight into the results. Accuracy rates for the change trials were in line with the predictions established from the categorization task, considering both group means and individual performances: [ɐ]-[ɛ] and [ɛ]-[e] form one group, with higher perceptual overlap scores and a lower accuracy in discrimination, while [ɐ]-[ɨ] and [e]-[i] constituted another group, with a lower level of perceptual overlap and higher accuracy rates. The results for the catch trials are less straightforward. The high accuracy rate for [i] was in line with the high categorization score obtained in Experiment 1 (98.1%). After [i], [ɛ] reached the highest accuracy, followed by [e]. These results contrast with the poor discrimination abilities in the [ɛ]-[e] contrast in the change trials. Considering that, in the categorization task, nearly 40% of the trials with [ɛ] and [e] were identified by the Hungarian participants as /eː/, low accuracy in the change trials and high accuracy in the catch trials was to be expected. As for the two remaining vowels, [ɐ] and [ɨ], they exhibited the lowest discrimination accuracy rates in the catch trials. Recall that these two vowels are the EP vowels that are absent from the Hungarian listeners’ L1, and, in our analyses, there was a significant effect of non-nativeness. Moreover, in the case of [ɨ], the mean accuracy was only 32.9%, indicating a categorization problem. Although this vowel was identified as /y/ in 73.1% of the trials, this result merely informs us that, for Hungarian listeners, /y/ was the closest category to the EP vowel. If participants had been presented with a “not a speech sound” option, for example, identification as /y/ may not have been so robust.
In addition to observing the relation between perceptual overlap, categorization, and discrimination, we also aimed at investigating the effect of knowledge of other languages in the perception of the EP [ɐ] and [ɨ] vowels. Other than English, a considerable number of participants in our study reported knowledge of two other languages. In Experiment 1, German was the second most frequently spoken additional language. The analysis of the results showed that the identification of EP vowels into Hungarian categories was not influenced by knowledge of German. Regarding Experiment 2, Spanish was the most frequently reported additional spoken language, followed by German. The analysis revealed that knowledge of Spanish did not affect the
A’ scores. However, we found that knowledge of German positively influenced the perception of the EP contrasts that included non-native vowels ([ɐ]-[ɛ] and [ɐ]-[ɨ]). The vowels [ɐ] and [ɨ] are absent form participants’ L1 system, as well as from English (
Carr 2019) and from Spanish (
Torres-Tamarit 2020). However, [ɐ] is present in the German vowel inventory (
Wiese 2000), and better accuracy in the perception of vowel [ɐ] was observed in the participants with German knowledge (
Figure 10), suggesting that these participants may have used their previous experience with this vowel. It is, however, important to keep in mind that, even though specific vowels are transcribed with the same IPA symbol, they may have slightly different realizations in different languages. Likewise, vowel qualities transcribed using different symbols may be more similar in their quality than the phonetic notation may suggest. As a result, we should be cautious while trying to draw conclusions based on the IPA symbols. Furthermore, since we did not collect data on the perception of German vowels by the Hungarian participants, we cannot assume that participants who reported knowledge of German have in fact established a target-like perceptual representation of the German [ɐ]. Nevertheless, the improved performance in discrimination of the [ɐ]-[ɛ] and [ɐ]-[ɨ] contrasts, as well as for the vowel [ɐ], suggests that Hungarian participants with some knowledge of German may have already created a new category for a sound close to the EP [ɐ]. This, in turn, helped these participants to better identify and distinguish EP [ɐ] than those who did not have previous experience with German.
One question arises: Why was there an effect of knowledge of German in the discrimination task but not in the identification task? These results may be due to the activation of different language modes (
Grosjean 2001). Evidence of language modes in L2 was found in
Yazawa et al. (
2020). In this study, Japanese native speakers who were learners of English L2 were tested in the discrimination of the contrast /iː/-/ɪ/. The results showed that participants relied more on duration (a key feature in their L1) when they were told they would hear Japanese sounds, and on spectral information (as in the L2) when instructed that that they would hear English sounds. However, the stimulus set was identical in the two sessions. In our study, in the identification task, participants were not only presented with instructions in their L1, but they also had to categorize the EP vowels in real words from the L1. Thus, it is logical to assume that they activated their L1, and consequently, they resorted to the L1 inventory in that task. As for the discrimination task, although the instructions were also given in Hungarian, in the main trials, the participants’ L1 was absent from the stimuli. It is possible, then, that knowledge of other languages, namely, their knowledge of German vowels, was more readily available. In other words, the activation of the L1 or L2, or the L1 and L2, may be related to the nature of the task: A task using L1 references may induce the activation of the L1, and a discrimination task may promote the use of previously learned languages other than the L1. An example for the latter is the study by
Luo et al. (
2020). When conducting an AXB task with native Mandarin speakers who had English as the L2, the authors found a combined L1/L2 transfer in the perception of L3 Cantonese vowel length contrasts.
Another interesting point in our results is the fact that the advantages displayed by Hungarian participants with knowledge of German were only observed for the EP vowel [ɐ], which, as we reported, also exists in the German inventory. The fact that those participants did not perform better than the others in the perception of [ɨ] contradicts
Onishi’s (
2016) assertion that the knowledge of an L2 entails a general advantage in perceiving novel L3 contrasts.
In summary, the results of the present study suggest that, although perceptual overlap affects L3 perception, the amount of perceptual overlap may not explain, per se, the differences among L3 contrast discrimination, and other factors may also play a role. First, the robustness of categorization can affect perception, even when the vowels exist in the L1. For example, [i], [e], and [ɛ] were categorized by the Hungarians into the correspondent L1 categories in 98.1%, 81.9%, and 58.8% of the trials, respectively. The results from the discrimination task showed that Hungarians were able to discriminate the [e]-[i] contrast significantly better than [ɛ]-[e]. Second, non-nativeness, that is, the absence of the vowels in the L1 system, also affects perception, hindering discrimination. In our study, participants displayed significantly more difficulties in discriminating contrasts with the EP vowels [ɐ] and [ɨ], which are absent from their L1, than other contrasts. Lastly, knowledge of other previously acquired languages should also be accounted for in L3 discrimination. In the discrimination task, Hungarian speakers with knowledge of German, a language that has the vowel [ɐ] in its system, performed significantly better in trials with this vowel.
A possible limitation of the present study is that we did not test and quantify participants’ knowledge of additional spoken languages. For the present sample, administering a language proficiency test for each reported language would have been impossible, as the participants reported too many languages. Moreover, a general proficiency test might not even capture the relevant phonological functioning of the L2s. To assess the possible effects of the representation of L2 phonology, one should test perception (and production) in these additional languages as well (
Cabrelli 2013). However, similar to testing language proficiency, this was not feasible in the present experiment and could serve as a possible line of inquiry for future research.
A second remark regarding the design of our study is related to the auditory stimuli. Although we introduced speaker variability in the stimuli, vowels were produced in one context, [ɡV]. It would be valuable to collect perception data from EP vowels produced in other consonantal contexts, since this can be a significant factor in non-native vowel perception (
Bohn and Steinlen 2003).
Third, previous studies have pointed out the importance of accounting for individual differences. In our study, we attempted to include intersubject variability by establishing a measure for perceptual overlap based on individual performances. However, this measure was not a better predictor for discrimination difficulties than the measure based on group means. This may result from the number of trials each participant completed for the categorization of each EP vowel, which was only three. A higher number of repetitions may contribute to more robust results at the individual level, hence leading to more reliable predictions for discrimination based on these results. Further investigations considering the limitations mentioned above would greatly contribute to our understanding of perception processes in the L3 context.
Lastly, different groups of participants were used for Experiment 1 and Experiment 2. Experiment 1 was designed as a preliminary test to identify problematic contrasts that were then tested in Experiment 2. Therefore, there was a time interval between the completion of the two experiments, and due to the recruiting method and anonymization of data, it was not possible to contact and ask the participants who completed the identification task to complete a second experiment. However, we believe that this option did not compromise the results, due to the robust number of participants in each group. Moreover, the use of different groups prevented a potential learning effect.