1. Introduction1
Descriptive accounts of Australian English vowels are commonly framed with reference to the lexical sets devised by
Wells (
1982). For example, there are discussions of a chain shift involving the short front vowels
kit,
dress and
trap (e.g.,
Cox and Palethorpe 2008;
Docherty et al. 2019;
Grama et al. 2019) and of emergent regional variation in the realisation across the lexical sets
goat, near, goose and
thought (
Cox and Palethorpe 2019). The lexical set categories are typically presented as the basic components of a vowel system and analysed and distributed accordingly in two-dimensional acoustic or auditory space.
However, although the Wells lexical sets are widely used and in many ways convenient, it should be borne in mind that they were not initially conceived as equivalent to phonological categories. They were devised primarily as a heuristic to counter “the incoherent mess of symbols used in … contemporary publications” at the time (
Wells 2010). There are several limitations to the lexical sets. For example, they are based on the standard accents of the UK and US and have been acknowledged to need modification to serve as an adequate frame of reference for other varieties (e.g.,
D’Onofrio et al. 2019;
Grama et al. 2019). Furthermore, with the exception of comm
a, lett
er and happ
y, they refer only to the vowel bearing primary stress in the citation form of a word. While these issues are well-known—albeit rather tacitly—the lexical set system is frequently invoked in studies of phonological variation and change (e.g.,
Kerswill et al. 2008;
Sóskuthy et al. 2015;
Loakes et al. 2017;
Penney et al. 2023), leaving a number of issues unclear. Among these are the following: To what extent do lexical sets capture the cognitive and phonetic categories used by speakers and listeners? Are the lexical sets adequate to capture the systematic influences on phonetic variation that speaker–listeners experience? To what extent is sound change in vowels enacted through phonological categories equivalent to lexical sets? How does word frequency interact with macro-level category labels such as lexical sets? Intersecting with this last point, how do grammatical (or function) words behave relative to lexical words? There is no
prima facie reason why grammatical words such as
it,
them and
that should not be included within the
kit,
dress and
trap lexical sets, respectively. However, as they are typically produced in unaccented form, they are generally excluded from variationist and corpus-based phonetic studies. They have therefore almost entirely fallen out of the scope of previous investigations (some exceptions are discussed in
Section 2 below). That they are also often highly frequent words potentially obscures the role that lexical frequency or predictability might play in sound change, for example.
Recent work has pointed persuasively to speaker–listeners’ representation and processing of phonological categories being influenced by a wide range of factors woven through their experience as participants in natural spoken communication—factors such as lexical frequency, predictability, phonological environment and social-indexicality (e.g.,
Cohen Priva 2017;
Foulkes and Docherty 2006;
Foulkes and Hay 2015;
Shaw and Kawahara 2018). It therefore seems prudent to question the extent to which the application of monolithic analytic categories such as lexical sets, especially when judged across large and diverse corpora, can provide the explanatory power that is needed in accounts of variation and change. In this article, we illustrate such issues, with reference to the case of
kit in Australian English. In particular, we consider the relationship between
kit and /ə/ (i.e., comm
a and lett
er in the Wells lexical set system) and whether they display a merger in certain contexts.
In the next section (
Section 2), we summarise what is known about the
kit–schwa relationship in Australian English. We highlight a number of complexities that cause us to look beyond the lexical set as an analytic category, in particular, with reference to the place of grammatical words in broader sound changes.
Section 3 summarises the aims of this study in detail. In
Section 4, we outline the methodology adopted to analyse
kit in a corpus gathered in Perth, Western Australia. We consider various sub-categories of
kit, including the pronoun
it, and compare them to other vowels as reference points (
fleece and /ə/ in a range of phonological contexts). The Perth data are then illustrated and discussed in
Section 5. The implications of the data are discussed in
Section 6, and concluding comments are drawn in
Section 7.
2. kit–Schwa in Australian English
Conventional accounts of Australian English vowels (e.g.,
Cox and Palethorpe 2007;
Cox and Fletcher 2017;
Cox 2019) state uncontroversially that there is a class of lexical items in that variety that are realised with [ɪ] in the accented nucleus (words such as
bid,
bitter,
skill). Collectively, this set of words makes up the
kit lexical set. It is also often observed that the [ɪ] realisation is not found in unstressed syllables in Australian varieties.
Cox (
2019, p. 22), for instance, notes that “schwa /ə/ is most commonly used, and it does not functionally contrast with
kit in this context”. Examples of words that would have this more central unstressed nucleus include
rabbit,
waited,
races, etc.
Wells (
1982, p. 167) labels this absence of contrast between /ə/ and /ɪ/ in unstressed syllables as “Weak Vowel Merger”. This generates pairs of items that are not contrastively differentiated in Australian English (although they might be for many speakers of other varieties, such as some in the UK;
Fabricius 2002;
Tasker 2020;
Butcher and Stoakes 2024): for example,
Rosa’s v.
roses,
Lennon v.
Lenin,
boxers v.
boxes. Wells’ use of the term “Merger” in this context is unpacked to a degree by
Trudgill (
2006, p. 117ff.), who notes that Australia’s early colonial settlers included speakers from areas in the British Isles marked by a presence of an /ə/-/ɪ/ contrast (in unstressed syllables) and others from areas where that contrast was absent, with the latter vowel configuration becoming prevalent in the variety that became established across the settler population.
The quality of the vowel in unstressed syllables is not phonetically [ə] in every instance, however. A close front [ɪ] realisation is found in unstressed nuclei preceding a velar or palato-alveolar coda (e.g.,
panic,
earwig,
kidding,
radish,
cabbage;
Wells 1982, p. 601;
Cox 2019, p. 23), all contexts where those historical varieties with an unstressed /ə/-/ɪ/ contrast would presumably have retained an unstressed /ɪ/ in the final syllable. The examples of this exception condition provided by
Cox (
2019) also include the item
stomach, suggesting that the [ɪ] realisation might also be found in items that, in other varieties of English, would have an unstressed /ə/ as the coda nucleus. Thus, the picture that emerges from descriptive accounts is one of schwa being the norm for all unstressed nuclei, irrespective of any historical differentiation, apart from in the exceptional pre-velar and palato-alveolar contexts where a closer [ɪ] realisation is found.
Our understanding of these vowel alternations in Australian English is limited by the fact that they have been subject to relatively little quantitative investigation.
Cox and Palethorpe (
2018) focus on the acoustic characteristics of word-final schwa realisations, comparing F1 and F2 of (a) word-final tokens of lexically determined schwa (e.g.,
Rosa), (b) tokens with lexical coda schwa that were followed by a possessive suffix (e.g.,
Rosa’s), and (c) other tokens with an unaccented coda plural suffix /-əz/ (e.g.,
roses). This last category reflects the context in which the progenitor varieties of Australian English would typically have a realisation akin to [ɪ] (
Fabricius 2002;
Tasker 2020) but in which conventional accounts of Australian English varieties would predict a more schwa-like realisation. Cox and Palethorpe noted variable realisations across the three experimental conditions but reported no significant difference between the latter two contexts (
Rosa’s and
roses), a finding that is in line with the predictions of a loss of contrast in those environments. However, the findings also suggest that the realisation of the nucleus in those two contexts yielded a more [ɨ]-like realisation that differed significantly from that of the word-final lexical schwa tokens (i.e., the
Rosa context). Cox and Palethorpe’s findings are echoed in an acoustic study by
Butcher and Stoakes (
2024). Also based on F1 and F2 estimates of unstressed schwa nuclei, this study found no significant difference between “lexical” vs. “affix” schwa realisations (both with a following consonant). Butcher and Stoakes also identified a more open and retracted realisation of schwa when it occurs in word-final position (e.g., in
butter as opposed to its occurrence in
buttered).
The results of both studies suggest that the vowel realisation patterns in the unstressed contexts investigated may not be quite as straightforward as the conventional account would have it. For example, while Cox and Palethorpe can point to a neutralisation of the
kit/schwa contrast in the
Rosa’s/
roses contexts, the realisation is not the same as that found for /-ə/ in word-final/pre-pausal contexts (
Rosa); indeed, the [ɨ] realisation highlighted by Cox and Palethorpe suggests a quality that is closer to that found in the realisation of
kit (but since the scope of that study did not extend to
kit, it is not possible to know for sure). Of course, as a phonological category, /ə/ is well-known for its variable realisation (e.g.,
Van Bergem 1994;
Bates 1995;
Flemming 2009), and Cox and Palethorpe rightly note the possibility of the closer realisation found in the
Rosa’s/
roses contexts being driven by the coronal place of articulation of the surrounding consonants. Likewise, they note (as do
Butcher and Stoakes 2024) that the more open word-final
Rosa context is consistent with reports that phrase-final comm
a and lett
er in some varieties of Australian English can be realised with a vowel variant that is rather more open than [ə], towards [ɐ] (e.g.,
Cox 2019, p. 22;
Grama et al. 2020).
Cox and Palethorpe’s study did not consider the realisation of pre-velar unstressed tokens. However, Butcher and Stoakes suggest that in this context, too, the realisation of the unstressed nucleus may be less stable than conventionally thought. Inter-speaker variability was observed in the realisation of the unstressed nucleus (panic was consistently realised with what Butcher and Stoakes describe as [ɛ], but there was considerable inter-speaker variability with respect to paddock and stomach). This, in turn, begs the question of whether the more advanced and possibly also raised pre-velar tokens are simply contextually conditioned variants of a neutralised kit/schwa contrast or whether they are genuine exceptions to the general merger process.
Many of the questions emerging from this existing work are reminiscent of those arising in studies of unstressed vowels in other varieties of English (
Fabricius 2002;
Flemming and Johnson 2004;
Gordon et al. 2004;
Tasker 2020). In the context of work on variable /ɪ/~/ə/ neutralisation in unstressed syllables in UK varieties of English,
Tasker (
2020, p. 27) notes that “[d]escriptions of variation in unstressed vowels have … implied that it is categorical; that speakers always aim for either /ɪ/ or /ə/. This idea implies that all other vowel differences are completely neutralised, and does not consider the idea that there could be any other intermediate variants. These ideas are based more on native speaker intuition than empirical evidence, and there is reason to suppose that there could be more fine-grained variation in unstressed vowels”. In this light, alongside the aforementioned questions arising from
Cox and Palethorpe (
2018) and
Butcher and Stoakes (
2024), the present investigation focuses on the nature of those unaccented realisations in Australian speakers of English and their relationship to the realisations found in the accented syllables (to which they are related at least historically if not synchronically).
As a further dimension to this study, we note that previous studies of these vowel alternations in Australian English have, to date, not considered a range of other factors that elsewhere have been shown to play a role in the development of sound changes. These factors include lexical frequency and grammatical word class. While lexical frequency has generated considerable debate (e.g.,
Dinkin 2008;
Hay and Foulkes 2016;
Hay et al. 2015;
Labov 1994;
Phillips 2006;
Pierrehumbert 2001), very few studies have considered the place of grammatical words in sound change. Among those who have are
Bybee (
2002,
2017) and
Phillips (
1983, with reference to text-based evidence regarding distant historical changes). A number of other works have analysed phonetic variation within grammatical words (e.g.,
Bell et al. 2003;
Grama et al., submitted;
Shi et al. 2005). Working within an exemplar-based framework,
Bybee (
2002,
2017) hypothesises that sound change should operate faster on words or phrases that occur frequently in the favouring context. The change is then generalised to other words and contexts. This could in principle include grammatical words if they are biased to occur in favouring contexts. In support of this hypothesis,
Bybee (
2002) shows that /-t, -d/ deletion in English coda obstruent clusters applies to reduced negatives (e.g.,
didn’t) more frequently than phonologically parallel lexical words (such as
student). This is not simply because negatives are frequent, but because they tend to occur in the context most conducive to deletion, i.e., preceding consonants. In her data, negative verb forms occurred far more often before consonants than did the other lexical classes analysed (80% of tokens against 42–47%), and had the highest deletion rate.
Phillips (
1983) offers a more nuanced view, concluding that frequency and grammatical word class act independently as factors in sound change. In her historical (text-based) data, grammatical words are sometimes affected early in sound change but sometimes late. She observes a pattern in terms of the type of change involved. Interestingly, she notes that cases where grammatical words are in the vanguard of a change are invariably weakening processes—precisely the kind of change that has been described as (historical) weak vowel merger for
kit/schwa (see also
Phillips 2006 for a more detailed argument on the role of frequency in change). We are unaware of any quantitative studies of
kit which include high-frequency grammatical items such as
it,
this,
with. Such words are typically excluded from variationist and corpus-based phonetic studies (as they were in
Docherty et al. 2019 and
Grama et al. 2019, for example). In connected speech, these words are typically produced as unaccented syllables, which justifies their exclusion in an analysis based on lexical sets. However, they can also be produced as accented (e.g., for the purposes of emphasis—
that’s it! or
this is the one), in which case they would potentially meet the criteria for inclusion and would unambiguously have an [ɪ] realisation.
One further point of comparison relevant to contextualising our understanding of the realisation of short vowels in the close front quadrant of the Australian English vowel space is the close [i] variant found in unaccented syllables in the so-called happ
y-tensing environments, such as
happy,
movie,
ready. These are characterised by
Wells (
1982, p. 602) and
Cox (
2019) as being phonetically aligned to the /i:/ phoneme or the
fleece lexical set. However, the fact that they are not prone to diphthongisation in the same way as other
fleece nuclei and are deemed “metrically weak” (
Cox and Fletcher 2017, p. 119) suggests that they should perhaps not be classified straightforwardly as members of the more general /i:/ vowel category associated with the
fleece lexical set in this variety, albeit they are rarely discussed in that light, if at all. An alternative account is offered by
Butcher and Stoakes (
2024), who suggest that the close variant arising from happ
y-tensing can best be described as an allophonic variant of the unstressed nucleus, although this is beyond the scope of their empirical study. More generally, the configuration of the
kit and
fleece lexical sets is also of interest for other reasons. Some accounts suggest that
kit retains a relatively raised quality (“a fronter, higher position than the nucleus of fleece”;
Purser et al. 2020, p. 278) or is differentiated in quality from the long close front
fleece vowel primarily by the target for the latter being achieved later following an on-glide (e.g.,
Cox 2019). Several other studies have reported lowering of
kit as part of the short front vowel chain shift (e.g.,
Cox and Palethorpe 2008;
Grama et al. 2019).
In sum, while overview accounts of Australian English /ɪ/ and its relationship to schwa in unstressed contexts typically give the impression that there is relatively little complexity, the sparse quantitative data that are available suggest that the picture may in fact not be quite so straightforward. Further investigation is therefore warranted to enhance our understanding of short front vowel variability in this variety but also to contribute to exploration of the superordinate questions discussed above regarding the explanatory power of lexical set-based analytic categories and the position of grammatical words in sound changes.
3. Aims of This Study
In this study, we aim to shed new light on variation and potential change in the realisations of /ɪ/ from an analysis of the conversational speech of speakers from Perth, Western Australia. Our focus is on comparing vowel realisations across items which are uncontroversially classified as /ɪ/ and which have retained a realisation around [ɪ] and others where the contrast between /ɪ/ and /ə/ is conventionally said to have been lost but where the little quantitative data that exist suggest a little more complexity.
Our study departs in a number of ways from the two previous studies with a specific focus on this topic,
Cox and Palethorpe (
2018) and
Butcher and Stoakes (
2024). First, we compare the contexts thought to be associated with the loss of contrast with other contexts where the contrast is maintained (not simply with /ə/, as in both previous studies—Butcher and Stoakes do briefly refer to the acoustic properties of /ɪ/ but with data from an earlier study with different participants). We also report comparisons with the realisations of the
fleece lexical set and with the unstressed coda nuclei of items with word-final happ
y-tensing. Second, our data are drawn from conversational speech. Previous work has largely focused on analysis of /hVd/ keywords or, in the case of
Cox and Palethorpe (
2018), on a set of 12 isolated words identified to represent the phonological environments under investigation. Likewise,
Butcher and Stoakes’ (
2024) dataset consists of repetitions of 18 lexical items produced as the final item in a short carrier sentence. There is value in investigating highly controlled contexts such as these, but it unavoidably limits the dataset to contexts bearing a phrase-level accent and/or citation-form data, thereby limiting the phonetic variation that can potentially be observed across different phonological environments. However, in order to understand variation and change in general—and of /ɪ/ versus /ə/ in particular—it is valuable to see what happens to a vowel when its prosodic and phonological circumstances vary. Third, we have based this study on a sample of speakers of English from Perth, a location which has not previously been studied in relation to the variation that is in focus here, (indeed, there has been very little research on this variety at all, but see, for example,
Docherty et al. 2015,
2018,
2019;
Cox and Palethorpe 2019). Finally, we include tokens from the high-frequency grammatical item
it in order to test the extent to which grammatical words participate in the loss of contrast with /ə/. As noted above, grammatical words are typically excluded from studies of phonological variation and change. This is despite their high frequency and the fact that, in principle, they meet the same structural conditions determining variation as are found in lexical words. We decided to consider a single grammatical word at this stage of our investigation in order to test how such items pattern in relation to the alternation in focus and thereby assess the value of sampling from a wider range of grammatical items in future work.
It was chosen because it was the most frequent grammatical word in the
kit category. It posed fewer segmentation difficulties than other candidates such as
in or
with, where the target vowel is flanked by approximants, liquids or nasals. Furthermore, it is not subject to the types of categorical reduction processes (potentially including full vowel deletion) that affect some other grammatical words to the extent that they are reflected in spelling (e.g.,
will >
-’ll,
is >
-’s). Extracting reliable vowel measurements in such cases would therefore depend on the precision or otherwise of the segmentation used as the basis for the forced alignment. Note that in this study, all of the alignment was manually checked (see further in
Section 3).
The three questions we address are the following:
- (1)
What is the relationship between the acoustic properties of the vowels associated with the fleece and kit lexical sets for contemporary speakers of English in Perth?
- (2)
Is the realisation of unstressed vowels consistent with conventional accounts of a loss of contrast between /ɪ/ and /ə/ arising from a process of “weak vowel merger”?
- (3)
Does the status of the word as grammatical or lexical impact significantly the realisation of the unaccented syllables? To what extent do grammatical words participate in the putative vowel weakening?
4. Materials and Methods
The material analysed in this study was drawn from a corpus of recordings collected in Perth in 2014–2016. The materials consist of twenty pairs of young speakers (aged 18–22) engaged in same-sex unscripted conversations. Each of the conversations lasted around 30 min. There were equal numbers of males and females, and all of the participants had been fully schooled in Perth (from age 5). While social class is not a focus of the investigation reported here, twenty of the speakers were residents of neighbourhoods ranked by the Australian Bureau of Statistics to be in the top socio-economic decile, and the remaining twenty were from neighbourhoods with a lower socio-economic ranking. (Social class effects on short front vowel realisations are reported in
Docherty et al. 2018,
2019.) The majority of speaker pairs knew each other in advance but to varying degrees. A fieldworker was present in the same room as the participants in order to initiate and conclude the conversation recording process but only intervened if the participants’ conversation subsided and they were in need of a prompt.
The recordings (44 Khz, 16 bit) were made using Sennheiser EW112-P-G3 lapel microphones and an Edirol R44 digital recorder. Conversations were transcribed with
ELAN (
2022), starting five minutes into each recording, thus skipping over the initial negotiation of the nature of the task and allowing the participants to relax into the conversation. They were then force-aligned within LaBB-CAT (
Fromont and Hay 2012) using HTK (
Young et al. 2006), with manual correction of misaligned segment boundaries.
In order to address the research questions, we created a subset of the corpus with tokens occurring across a range of contexts relevant to the research questions itemised above. Each context is identified henceforth by the relevant acronym shown in bold in the list below.
Three of these were contexts where conventional accounts would predict an [ɪ] realisation and where we did not expect to encounter any centralisation of the vowel nucleus in focus:
- (a)
MONO: nucleus of monosyllabic content word (e.g., bid, trip).
- (b)
POLY_ACC: accented nucleus of a polysyllabic content word (e.g., bitter, issue).
- (c)
PREVEL_UNACC: unaccented nucleus in polysyllabic lexical items with a following velar or post-alveolar context (e.g., panic, earwig, kidding, radish).
A fourth context comprised tokens in unstressed environments where previous accounts typically indicate that there has been a loss of contrast between /ɪ/ and /ə/:
- (d)
UNACC: unaccented nuclei contained within a polysyllabic lexical item (e.g.,
massive2,
rabbit,
races). Note that this condition excludes any pre-velar contexts, as these are covered by condition (c).
Two subsets of the grammatical word it were generated. They were differentiated because initial auditory analysis of the data suggested that it in phrase-final position might be associated with greater levels of reduction in the quality of the vowel nucleus.
- (e)
PHRINT_IT: tokens of grammatical item it occurring phrase-internally (e.g., because it was a cool name).
- (f)
PREPAUS_IT: tokens of grammatical item it occurring phrase-finally (in most, but not all cases, also pre-pausally). Such cases are typically either clitics (e.g., then you can do it #) or tags (e.g., it will be like permanently cancelled won’t it #; NB # is used here to indicate a pause).
Three further conditions were investigated in order to provide points of reference for our comparative analysis of the /ɪ/ conditions:
- (g)
FLEECE: tokens of the fleece lexical set realised in monosyllabic lexical words (e.g., beach, keep, see).
- (h)
HAPPY_T: tokens where we expected that the vowel nucleus would be raised and fronted as per the pattern of unstressed happy-tensing reported for this and other varieties of English (e.g., city, movie, ready).
- (i)
SCHWA: tokens of lexical /-ə/ in unstressed syllables of polysyllabic content words, equating to the comma and letter lexical sets (undifferentiated in Australian English, e.g., bitter, wonder, pasta, Asia, society, fatigued).
Using default settings in Praat (
Boersma and Weenink 2018), F1 and F2 values were estimated for each monophthong at the midpoint of each token (see
Cox and Docherty 2023 for an overview of the caveats that apply to this static approach to vowel description). As an exception to this measurement protocol, F1 and F2 estimates for
fleece were extracted at the point 80% through the duration of the token. This allows for the on-glide that tends to characterise realisations of this vowel in Australian English speakers (
Cox et al. 2014) and ensures that the estimates were taken close to where F2 reached its peak. Pre-/l, w, j/, pre-nasal and post-/w, j, r/ environments were excluded, as well as post-nasal and post-lateral tokens where segmentation could not be undertaken reliably. In order to allow for comparison with
Cox and Palethorpe (
2018) and
Butcher and Stoakes (
2024), the formant estimates were not normalised, and consequently the findings for male and female speakers are reported separately below (note that for both of the aforementioned previous studies, the speech sample consisted exclusively of female speakers). Vowel duration measurements were also extracted, but (in contrast to the analysis presented by
Docherty et al. (
2019) focusing on the realisation of
kit tokens) statistical analysis revealed no correlation between duration and the vowel quality of the set of /ɪ/ tokens.
3 For the sake of exposition, we therefore ignore duration in the data and discussion that follow.
Table 1 provides a summary of the dataset, itemising the number and percentage of tokens in each category.
5. Findings
Figure 1 shows separately for female and male speakers the distribution in F1/F2 space (Hz) of the realisations corresponding to all of the conditions set out above. The condition labels are centred on the mean F1/F2 value with ellipses plotted at ±1 standard deviation.
An immediate observation is that a representation in F1/F2 space creates challenges for visualising the relationship between the individual conditions. This is not only due to the number of conditions contained within the plots but also to a large extent due to the considerable overlap of the realisations across those distributions, something which was expected given that the material analysed is from conversational speech where the factors driving token-to-token variation are more prevalent than in isolated word tokens. Therefore, for our analysis of the acoustic data, we adopted an approach previously deployed by
Labov et al. (
2013) and
Grama et al. (
2019), calculating F2 − (2 × F1) as a derivative indicator of relative location along the front diagonal of the vowel space. The F2 − (2 × F1) value provides a single metric of the acoustic properties of each token and, in the process, provides clearer comparative visualisations of the various conditions under investigation. Use of this unitary metric also has the advantage of allowing us to measure the dimension that we are interested in without having to carry out quantitative analyses of F1 and F2 separately and assuming no covariance, when in fact they are both closely determined by the overall shape of the vocal tract. We refer to this measure henceforth as
F2deriv (Hz). As explained in detail by
Labov et al. (
2013, p. 40), higher values of F2deriv equate to a relatively closer and fronter vowel quality, precisely the dimension that is the focus of this study, as is evident from the overall distributions in
Figure 1.
Figure 2 shows violin plots of the distributional density of F2deriv for each of the full set of conditions making up the dataset (females in the top panel, males in the lower panel). Viewing the data through the F2deriv lens in this way provides a more tractable means of addressing our research questions.
While the overlap across conditions that was evident in
Figure 1 is still readily apparent within the violin plots,
Figure 2 brings to the foreground a degree of clustering that substantially aligns with previous accounts of variation across the different contexts investigated. Thus, for female speakers, the FLEECE and HAPPY_T tokens (plots 1 and 2 to the extreme left of
Figure 2, top panel) have the highest distribution of F2deriv values, with a second somewhat more open cluster being formed by the three conditions in which the
kit/schwa contrast is said to be maintained (MONO, POLY_ACC and PREVEL_UNACC tokens; plots 3, 4 and 5). Less close and front realisations are found for the three conditions associated with a loss of the
kit/schwa contrast (UNACC, PHRINT_IT and PREPAUS_IT: plots 6, 7 and 8), and the lowest F2deriv distribution—but also the most variable—is found for SCHWA (plot 9 on the extreme right). For male speakers, the patterns are largely the same. The principal difference is for PREPAUS_IT tokens, such that the nucleus of phrase-final
it has a distribution that is skewed somewhat lower in F2deriv than is found in the other two environments associated with loss of the
kit/schwa contrast (UNACC and PHRINT_IT). This distribution is also somewhat lower than is found for the female speakers; the distribution appears to reflect a higher proportion of PREPAUS_IT tokens closer to the centre of gravity of the SCHWA condition than is found in the PHRINT_IT condition.
The most strongly anticipated contrast-loss condition, UNACC, shows a good deal of overlap with SCHWA (as shown by comparing plots 6 and 9 in each panel). However, with most of the UNACC tokens falling within the higher end of the range of the SCHWA distribution and many SCHWA tokens falling at the lower end or outside of the UNACC range, it seems unlikely (at least based on visual scrutiny of the plots) that the UNACC and SCHWA tokens are components of a single distribution as might be expected if this alternation could faithfully be referred to as a weak vowel
merger. This difference could reflect the fact that a good number of SCHWA tokens occur phrase-finally, eliciting a vowel variant that is more open than [ə], towards [ɐ]. (We did not explore this issue quantitatively within our data set, but the spread of data in
Figure 2 suggests that a good number of SCHWA tokens were relatively open and also slightly back; our impression from auditory analysis is that open variants are found variably across speakers and perhaps less consistently than reported for other Australian varieties.) A further potential factor here is the impact on the SCHWA realisations of differences in place of articulation of adjacent consonants; in this regard, we note
Penney et al.’s (
2021) finding that /ə/ is somewhat retracted under the influence of adjacent bilabial plosives.
In order to gauge the contribution of the various conditions to the overall distribution of F2deriv across the full dataset, linear mixed-effects models were calculated using the lmer function as part of the lme4 package (
Bates et al. 2015) in
R (
2020). Probability values were calculated using the lmerTest package (
Kuznetsova et al. 2017). F2deriv was configured as the dependent variable. Speaker and word were included as random effects, and the condition (with its nine levels) was included as a fixed effect. MONO was chosen as the reference predictor for the model, as it is the archetypal context in which [ɪ] is encountered, and it therefore provides a useful basis on which to make statistical comparisons across conditions. The data for females and males were modelled separately.
The parameters of the models that were generated can be found in
Appendix A. For ease of interpretation, they are depicted visually in
Figure 3 (drawn using sjPlot—
Lüdecke 2018). The quantities shown for each condition in
Figure 3 are the estimates for F2deriv, indicating the difference between each condition and the reference condition MONO (i.e.,
kit in stressed monosyllabic words), which is represented in
Figure 3 as the vertical zero line. The length of the horizontal line for each estimate indicates the 95% confidence interval (CI), the full details of which are provided in
Appendix A. Estimates that fall below and above the reference intercept are shown in different colours. Thus, for example, the estimate for F2deriv of FLEECE for the female speakers was 336 Hz higher (shown in blue) than the reference value for MONO, while the estimate for POLY_ACC was 16 Hz lower (shown in red) than MONO. The asterisks indicate the level of the probability value associated with each of the predictors (*
p < 0.05, **
p < 0.01, ***
p < 0.001).
The statistical analysis provides a clear indication of the divergence of MONO, FLEECE and SCHWA conditions, notwithstanding the overlap in F2deriv distributions evident in
Figure 2. The comparison of MONO with FLEECE is in line with accounts of changes in Australian English short front vowels that point to the
kit lexical set lowering and distancing from the
fleece lexical set, although the extent of overlap evident in
Figure 1 and
Figure 2 suggests that in this variety, the change is at a relatively early stage: Many MONO and FLEECE tokens yield similar F1/F2 values. Note also that, in line with the clustering shown in
Figure 2, the estimates for HAPPY_T pattern closely with FLEECE. This does not indicate that the two vowels are identical, however; recall that values for HAPPY_T were taken at the midpoint, while those for FLEECE were taken 80% through the duration in order to avoid possible onglides. There is also a difference in duration, as expected, with HAPPY_T shorter than
FLEECE overall, although the margin of that difference is relatively modest (mean durations of 104 ms v. 125 ms for females and 83 ms v. 113 ms for males).
The estimates for the two conditions where [ɪ] is anticipated (POLY_ACC, PREVEL_ UNACC) do not diverge significantly from the MONO reference intercept. This is not surprising in the case of POLY_ACC since tokens in this category, along with those contained within the MONO context, sit transparently within the
kit lexical set. It is arguably more interesting in the case of PREVEL_UNACC. Despite being unstressed, tokens in this category also appear to align phonetically with vowels in the
kit lexical set, thereby suggesting the retention of an underlying /ɪ/ in those items. This finding supports previous accounts (e.g.,
Cox 2019) indicating that the PREVEL_UNACC condition is a straightforward exception to the historical unstressed centralisation/merger process.
The modelling of the two
it conditions paints a more complex picture. While the distribution of PHRINT_IT tokens (i.e., phrase-internal
it) is skewed lower than MONO, as can be seen in both
Figure 2 and
Figure 3, the modelling suggests that this difference is not significant for either males or females. This seems to be largely attributable to the much lower precision associated with the estimate for PHRINT_IT, which in turn is suggestive of high variability in the realisation of that condition, possibly reflecting the fact that this condition does not differentiate across the various functions associated with phrase-internal
it (see below for further discussion). For PREPAUS_IT tokens (phrase-final
it), on the other hand, the model delivers a significant difference between the relevant estimate and that for MONO, reflecting (along with
Figure 2) the somewhat lower distribution of realisations as measured by F2deriv.
Finally, in order to test for the statistical relationship between the SCHWA condition and UNACC condition, we ran an additional mixed-effects model using the same specification as described above, but in this case, with SCHWA chosen as the reference predictor. The data for females and males were again modelled separately. The parameters of the models that were generated can be found in
Appendix B. The comparison of the F2deriv estimates for the reference vowel (SCHWA) with the UNACC condition yielded a significant difference for both males and females, suggesting that despite the significant overlap in their distributions, the two conditions were generating divergent patterns of realisation. The difference is also clearly visible in
Figure 2. This divergence was in the same direction as found in the earlier study by
Cox and Palethorpe (
2018), with the UNACC tokens tending to be closer and fronter overall than the SCHWA tokens. The SCHWA-referenced mixed-effects model also shows that the estimates for PREPAUS_IT tokens do not differ significantly from those of the SCHWA condition, a finding that is in line with the somewhat lower F2deriv values for PREPAUS_IT tokens referred to above. This is the case for both male and female speakers.
6. Discussion
In this study, we set out to address three questions:
- (1)
What is the relationship between the fleece and kit lexical sets for contemporary speakers of English in Perth?
Our data suggest that there is only a modest lowering and retracting of
kit vis-à-vis
fleece in this variety. The F1/F2 values for
kit (as represented by the MONO and POLY_ACC conditions) largely overlap those of
fleece but with
kit realisations concentrated in the lower end of the
fleece distribution. Despite this overlap, statistical comparison of these distributions suggests that the realisations corresponding to the two lexical sets are not samples of the same distribution. This result is broadly in line with comparisons of
kit vis-à-vis
fleece made in some previous studies (e.g.,
Billington 2011;
Cox and Palethorpe 2008;
Grama et al. 2019). These studies report varying degrees of divergence between the two, with
kit positioned in a lower and more open area of F1/F2 space while retaining a high level of acoustic proximity (and presumably significant overlap, although that is not so easy to discern when visualisations focus solely on F1/F2 means). One caveat applying here is the need to consider whether the 80% measurement point deployed for FLEECE tokens is the optimal basis on which to gauge the quality of that vowel category. Comparisons between
kit and
fleece are heavily contingent on differing assumptions made about the nature of the
fleece realisations. For example,
fleece is classified as a member of the set of monophthongs in the “HCE” vowel taxonomy that has become the de facto standard for describing Australian English (
Harrington et al. 1997), but as a diphthong in some other studies (e.g.,
Elvin et al. 2016;
Grama et al. 2021;
Penney et al. 2023)—and analysed as such. This variability can render a direct comparison with
kit a little problematic. When classified as a diphthong, the acoustic measures capturing the starting point and trajectory of a glide do not provide a consistent basis for comparing with the midpoint of monophthongs.
- (2)
Is the realisation of unstressed vowels consistent with conventional accounts of a loss of contrast between /ɪ/ and /ə/ arising from a process of “weak vowel merger”?
In general, the findings are in line with existing accounts of the conditions under which the contrast between /ɪ/ and /ə/ is said to diminish and with the findings of the previous acoustic study by
Cox and Palethorpe (
2018). MONO and POLY_ACC tokens are, as predicted, fronter and closer overall than those found in unaccented syllables in the UNACC content words (with the anticipated exception of the PREVEL_UNACC condition which aligns to the realisation of MONO and POLY_UNACC tokens).
The comparison between the UNACC and SCHWA conditions directly addresses
Cox and Palethorpe’s (
2018) observation that the reduction of contrast in unstressed syllables often results in a vowel which has more of an [ɨ] quality than [ə]. While there is a good deal of overlap in the vowel realisations across these two conditions, the statistical analysis points to there being a significant difference between the UNACC and SCHWA distributions, with the former condition being associated with a closer vowel quality overall. As noted above, one potential explanatory factor for this difference is the extent to which the realisation of the SCHWA condition is itself influenced by a tendency for a more open vowel in word-final/pre-pausal contexts (reported for many speakers of Australian English, e.g., by
Grama et al. 2020, as well as by both
Cox and Palethorpe 2018;
Butcher and Stoakes 2024). Further investigation is needed to ascertain the extent to which this is a factor in the current dataset, especially given that in both previous studies, the word-final tokens are also pre-pausal, thus making it difficult to discriminate between an explanation focused on lexical vs. phrasal phonological context. But, overall, the present findings do suggest that caution is needed in conceiving of the unstressed realisations as a simple merger of /ɪ/ and /ə/. If the contrast is functionally absent as suggested by
Cox (
2019) but subject to some predictable allophonic variation as proposed by
Butcher and Stoakes (
2024), there are evidently a number of factors at play in determining the phonetic properties of the phonological category arising from the fusion of /ɪ/ and /ə/.
One of the most prominent features of the results is the extent of variability and overlap across all of the conditions (not least across conditions which are differentiated statistically). For both female and male speakers, there is a substantial range of F2deriv values that is shared by all of the conditions investigated. This is perhaps not too surprising given that the data have been sampled from conversational speech, which is inherently more variable than the more controlled isolated word material that is prevalent in previous work, and also given the known proclivity of /ə/ variants to be strongly context-dependent (
Cohen Priva and Strand 2023;
Penney et al. 2021;
Tasker 2020). These overlapping distributions, however, do beg questions for further research regarding the factors that drive this variability and their impact on the likelihood of a token being closer and fronter. It would be instructive to consider not only the immediate phonological environment, as tested to an extent in this study, but also phonological and social factors such as speaker, speech rate, prosodic conditioning, etc. We should also note that while we have referred to “conversational” speech (in contrast to the read passages or isolated word styles that characterise a lot of existing work in this area), in reality, each of our conversations comprises a number of sub-styles (e.g., “banter” between the interlocutors, narrative story-telling, sharing of information, etc.). It is important not to simply assume that conversational speech is a unitary style. The validity of pooling tokens across long conversations such as those used in this study is a matter for further investigation.
- (3)
Does the status of the word as grammatical or lexical impact significantly on the realisation of the unaccented syllables? To what extent do grammatical words participate in the putative vowel weakening?
Tokens of the grammatical word
it do appear to pattern overall with the F2deriv distribution found in the UNACC condition where a reduced nucleus is found, although the quantitative modelling is not conclusive about whether the
it distributions are divergent from those of the MONO condition. The model estimates for phrase-final
it conditions in
Figure 3 (PREPAUS_IT) do reach significance, while those for phrase-internal
it tokens (PHRINT_IT) do not—although the tendency observable in this data set is clearly intermediate between MONO and the significantly weakened conditions. In both cases, the confidence of the fit to the mixed-effects model is relatively low (as reflected by the CIs observable in
Figure 3). As noted above, this may reflect complexity within the distribution of realisations arising from the range of different grammatical functions being undertaken by
it; for example, tokens include
it acting as subject pronouns, thus bearing a degree of stress, and others where
it is cliticised and unstressed as in
“I chose to believe it” or
“you’ve gotta love it”.What do the data suggest about the place of
it in relation to the overall unstressed “merger” process? Recall that
Bybee (
2002,
2017) argues that change should operate faster on units (sounds, words or phrases) that occur frequently in favouring contexts. In principle, this could include grammatical words. Adopting a more nuanced position,
Phillips (
1983,
2006) suggests that grammatical words are only likely to be affected early by weakening or lenition changes. A similar interpretation might be made of our Perth data. In the categories of data that we analysed for signs of participation in the putative weak vowel “merger”, the most frequent is unaccented /ɪ/ (UNACC). Note in
Table 1 that this category provides 16.5% of the female data analysed and 14.3% of the male data. Tokens in our PREPAUS_IT condition are much less frequent (just under 4% for both sexes) and are also less frequent than
it in phrase-internal contexts (PHRINT_IT). But note also that, with the exception of the phrase boundary, the UNACC and PREPAUS_IT categories largely share the same prosodic structure: In both cases, the vowel in focus is unaccented and in the second (weak) syllable of a trochaic foot (e.g.,
ˈraces,
ˈdo it #). These two categories also show the lowest F2deriv estimate in our statistical model, closest to lexical SCHWA (
Figure 3). Frequency alone might therefore help explain the development of a partial weak vowel merger, as the /ɪ/-/ə/ contrast diminishes in the very frequent unaccented position within the foot. The grammatical word
it clearly follows the general pattern, occurring with high frequency in unaccented positions.
It can thus be seen to be participating in the weakening process. The high overall item frequency of
it may further contribute to the spread of the reduced variant. Of interest, too, is the relative position of
it in phrase-internal contexts (PHRINT_IT). Although the F2deriv estimates for this category were not significantly different from the MONO reference condition, the estimate values are intermediate between MONO and the two clearly differentiated categories just discussed. For the female speakers, for example, the F2deriv estimate for PHRINT_IT is −155 Hz (cf. −284 Hz and −312 Hz for UNACC and PREPAUS_IT, respectively). The data for PHRINT_IT also display a relatively wide confidence interval (presented as the wide lines in
Figure 3). This wide CI likely reflects prosodic variability in the underlying data: Tokens in the PHRINT_IT condition were not differentiated by stress in the analysis and therefore combine both stressed and unstressed examples. The latter again typically occur in the weak position within the foot. Although we did not test for this explicitly, we can reasonably assume that the unstressed tokens contained more centralised schwa-like variants.
It is not possible, given the data analysed here, to draw any firm inferences on cause-and-effect in how the weakening process has spread. One possibility is that it is simply the foot structure that is responsible for the weakening: Any weak environment is opportune for the weakening of any /ɪ/ vowel, including grammatical words such as
it. Another intriguing possibility, though, is that there is a contribution from the most frequent words occurring in that prosodic context. Although the token numbers for the PREPAUS_IT condition are relatively small in our data set, if we consider the trochees in aggregate across the PREPAUS_IT + UNACC conditions,
it is the most frequent lexical item in these categories. This leaves open the possibility that the word
it is in fact a driver behind the weakening change. Its frequent occurrence in unstressed and phrase-final context is especially conducive to reduction. Once weakening has taken hold in that context, the reduced form then might then have spread to
it in other contexts (i.e., those contained in the PHRINT_IT category, the diverse functions of which provide a less consistently trochaic metrical setting for
it), subsequently generalising to other words in the same prosodic context (UNACC). It is noteworthy too that the collapse of this contrast could potentially be facilitated by the existence of very few minimal pairs of the type
market~mark it,
planet~plan it,
fillet~fill it, thus minimising the risks of misinterpretation on the part of listeners. While this account is speculative and would require testing with further analysis, the participation of
it in the change accords with
Bybee (
2002,
2017) and
Phillips (
1983): The grammatical word occurs frequently in the context conducive to change, and as this is a weakening change, we can expect grammatical words to be affected early in the life cycle of the change.
As a general point, it is certainly of interest to consider the potential role of grammatical words as a participant—or even driver—of change. A further possibility is that other frequent grammatical words are also contributing to the change. This question also remains for future work. Crucially, however, eliminating grammatical words a priori from analysis within a lexical set-based framework would preclude consideration of such factors. Even if we are unable to determine its precise role, our data show without question that it is participating in the change from /ɪ/ to [ə].