Next Article in Journal
Exploring Refugee School Teachers’ Roles in Culturally Diverse Adult Classrooms in Greece
Previous Article in Journal
Integrating ICT to Adopt Online Learning in Teacher Education in Ghana
Previous Article in Special Issue
The Effects of Invented Spelling Instruction on Literacy Achievement and Writing Motivation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Presence and Progression of Rare Vocabulary in Texts Across Elementary Grades and Between Genres

1
TextProject, Santa Cruz, CA 95060, USA
2
Department of Educational Psychology, Neag School of Education, University of Connecticut, Storrs, CT 06269, USA
*
Author to whom correspondence should be addressed.
Educ. Sci. 2024, 14(12), 1314; https://doi.org/10.3390/educsci14121314 (registering DOI)
Submission received: 30 June 2024 / Revised: 8 November 2024 / Accepted: 13 November 2024 / Published: 29 November 2024
(This article belongs to the Special Issue Building Literacy Skills in Primary School Children and Adolescents)

Abstract

:
This study analyzed the prevalence and characteristics of low-frequency and rare words, together described as rarer words, in elementary-level texts, examining both narrative and expository materials to assess their vocabulary demands. By mapping the nature of shifts in rarer words across grade levels and text types, this research aimed to better understand the lexical challenges students face as they progress as readers. Analyzing a corpus of 300,000 words from narrative and expository texts at grades 1, 3, and 5, the research employed both quantitative and qualitative methods. Quantitatively, a digital program categorized words into four frequency groups—high, medium, low, and rare—while examining features of word length and age of acquisition that can influence word meaning recognition. Qualitatively, the analysis classified rarer words into 13 lexical categories and assessed their morphological family membership. The findings reveal an increase in total rarer words from 5.7% to 8.7% across grades, alongside a major rise in unique rarer words (32% to 43%). The complexity of features predicting word recognition—word length and age of acquisition—also escalates with grade level. Notably, 23% of rarer words are forms typically not taught in vocabulary instruction, such as proper names, while 76% of rare words belong to morphological families in the high-, medium-, and low-frequency word zones. These results highlight the need for targeted vocabulary instruction that incorporates the complexities of rarer word usage in authentic texts, ultimately aiming to enhance students’ reading comprehension.

1. The Presence and Progression of Rare Vocabulary in Texts Across Elementary Grades and Between Genres

As in most languages, a small percentage of the English lexicon accounts for the majority of words in texts [1]. The remaining words in the dictionary appear rarely. In a corpus of 20 billion words [2], 83% of the approximately 500,000 unique words appeared less than once per 10 million words, and another 11% appeared only once in samples of 1 to 9 million words.
Individually, rare words may not account for high percentages of total words in texts, but their sheer number and ubiquitous presence make them a central concern in reading instruction. Alongside syntax, the number of rare words has been one of the most consistently used factors in text complexity formulas [3,4]. The presence of rare words figures prominently in teachers’ perceptions of what makes students unable or reluctant to read texts [5]. Yet the current model of vocabulary selection in American reading education provides little support to educators, whether classroom teachers or curriculum designers, on how to treat rare words in texts.
In another field of education oriented to the teaching of non-native-English-speaking young adults, English as a second language (ESL) lessons and texts address bands of vocabulary that account for significant percentages of the words in texts. Scholarship from this field has informed the word zone framework [6,7]. While this framework shares similarities with the word bands used in ESL [8], it differs in its foundation. Rather than relying on corpora derived from adult-oriented texts, as is common in ESL, this approach is based on word occurrences in school texts. This approach marks a departure from the original source for school vocabulary, Thorndike’s [9] The Teacher’s Word Book. Thorndike’s work prioritized words deemed important for students to learn, drawing from sources like the Bible and the Farmer’s Almanac. However, only 20% of the words in that compilation came from student-oriented texts, and many of those were from books aimed at advanced readers capable of tackling full-length novels. The database underlying the word zones [10] represents a significant shift. It was created from a representative sample of texts spanning from first grade through early college. Importantly, each grade level is given equal representation in this database, which provides a grade-level-aligned representation of the vocabulary students encounter throughout their school careers.
An investigation of the distribution and feature of rarer words over the elementary grades—the period when foundational reading proficiency is a priority—is especially critical in light of the shift in genre emphasis in U.S. educational policies. The National Assessment of Educational Progress (NAEP) [11] exemplifies this change, moving from a primary focus on narrative texts to an equal balance between narrative and expository texts. This shift underscores the need for a more nuanced understanding of vocabulary expectations. To truly grasp the current vocabulary demands in American schools, we need a deeper examination of the distribution and characteristics of the words in texts. An emphasis on rarer words is especially crucial, because these words can often pose challenges for beginning and struggling readers [12]. By exploring these patterns across elementary school grades and text genres, we can gain a clearer picture of the evolving vocabulary landscape that students navigate throughout their education.
In this study, we examine the distribution and features of rare words in three grade levels of school texts—the end of grades 1, 3, and 5—and in two genres, narrative and expository. The need for such an analysis becomes clear when we examine the “authentic text” perspective that has dominated text selection in U.S. education over the past four decades. This review will first explore the authentic text movement within U.S. reading instruction, providing a context for current practices. Following this, we will discuss how the word zone framework can contribute to our understanding of vocabulary expectations in education. This approach offers valuable insights into potential directions for curriculum development and instructional strategies in vocabulary learning.

2. The Current Perspective on Vocabulary Learning and Instruction

The Content of Authentic Text

The current model of vocabulary instruction emanates from the authentic texts that are prominent in American reading instruction. The authentic text perspective emerged as a reaction to the long-standing promotion of texts where vocabulary was controlled to comply with word lists and readability formulas. Beginning in the 1930s [13] and extending into the 1980s, American reading texts at the elementary level drew heavily on Thorndike’s [9] analysis of the frequency of individual words. Thorndike’s efforts spawned numerous lists aimed at educators, the most enduring of which is Dolch’s [14] compilation of the most frequent words in beginning basal readers.
By the early 1980s, research based on cognitive science showed that manipulating texts to comply with readability formulas resulted in lackluster vocabularies [15]. When this research was disseminated in a national report [16], the nation’s two largest states mandated that only authentic texts would be acceptable in the programs adopted for instruction [17,18]. Authentic texts were defined as those produced by the trade divisions of publishers, not their educational divisions. The vocabulary of school texts would no longer be dictated by algorithmic applications of vocabulary lists. Since that shift, the vocabulary in American reading texts has reflected the choices made by writers of children’s texts and magazines.
Comparing excerpts from the same publisher’s third-grade programs—one from 1974 before the shift to authentic texts, and another from 1997 after mandates for such texts—illustrates differences in vocabulary of controlled and authentic texts.
That afternoon Stevie went back to visit Billy. Billy was cross. He couldn’t find anything to do. “This old cast is heavy”, he complained. “I can hardly drag it around”. Then he told Stevie that his brother Ed and some of his friends had written their names on it. “I wish you could write yours”, Billy said. [19]
The tank was made of big sheets of metal fastened together with rivets. Patrick could see the large, round heads of the rivets pounded in neat rows along the seams. Painted in big letters on the round sides of the tank were the words, PURITY DISTILLING COMPANY. This giant tank was filled with molasses. [20]
The text from the 1974 program that was written specifically for instruction uses mostly common words: 79% of its words are from the 1000 most frequent English words and only one word is rare—Stevie. The text from a trade book, Patrick and the Great Molasses Explosion [21], has 10% fewer high-frequency words than the 1974 excerpt and five rare words—molasses, purity, seams, rivets, and distilling. These examples instantiate Gardner’s [22] observation that “authentic reading materials are fairly unpredictable in terms of the language demands they place on readers” (p. 98).
The Common Core State Standards (Common Core) [23] significantly influenced education policies across the U.S. [24] and further promoted the use of authentic texts in classrooms. Developed through collaboration between U.S. governors and chief state educational officers of states, the Common Core aimed to establish uniform learning goals and rigorous standards for students nationwide. These standards were then implemented in assessments. Initially, all but four states adopted the standards and assessments. Over the following decade, more than half of the states chose to develop their own assessments. Nonetheless, the overall impact of the Common Core has been to elevate student proficiency expectations [24].
One aspect of the Common Core that continued the previous approach of minimal control over vocabulary is its recommendation of text complexity systems, such as the Lexile Framework [25], that use a global measure of vocabulary. Word complexity within the Lexile Framework is measured by the mean log word frequency (MLWF), which is derived from computing the frequency of each word in a text sample [4]. Even after a logarithmic transformation for word frequency, the inclusion of all words in the calculation results in limited variability in the MLWF relative to sentence length, the other variable in the calculation. As a result, the MLWF fails to predict the nature of vocabulary across different grades or content areas [26].

3. An Alternative Perspective on Vocabulary

3.1. The Word Zone Framework

The purpose of the word zone framework is to assist educators in assessing the reading proficiency required for specific texts, not to remove challenging words from texts. It also aids in choosing materials that help students build the foundational skills needed for successful reading across various texts. A key principle of the word zone approach is that establishing thresholds of vocabulary that need to be recognized for comprehension is crucial for the design of effective instruction and selection of texts for students.
Betts [27] proposed three levels of word recognition for comprehension: independent (95–100%), instructional (90–94%), and frustration (less than 90%). These levels were based on his clinical work and a single study involving 41 students reading both aloud and silently [28]. Betts’s model has been criticized for lacking robust empirical support and oversimplifying the complex relationship between word recognition and comprehension [29]. Beyond Betts’s initial proposal, the concept of vocabulary thresholds has not been a major focus in research on reading pedagogy or psychology.
The lack of research on the topic of thresholds of vocabulary required for automaticity and comprehension from researchers of reading has led us to draw on the work of researchers within the field of ESL for establishing thresholds of vocabulary required for comprehension. Laufer’s [30] lexical threshold hypothesis exemplifies this view, arguing that a critical percentage of the lexicon needs to be known to comprehend texts. Her research showed that university students who were learning English as a second language had significantly better reading comprehension scores when they had 95% lexical coverage of a text than those who did not attain this vocabulary threshold. Proficiency with 95% of the vocabulary in a text may be considered a threshold for basic comprehension, but this level does not guarantee effortless reading [31]. Nation and colleagues [32,33] argue that readers should be familiar with 98% of a text’s vocabulary for fluent reading. In Hu and Nation’s [33] study of adults attending a pre-university English course, no students with 80% vocabulary knowledge achieved adequate comprehension, and only a minority did so with 90 to 95% vocabulary knowledge. Only at 98% vocabulary recognition did participants comprehend proficiently.
The word zone perspective builds on the work of ESL researchers, but its purpose is to identify the critical thresholds of vocabulary required to comprehend texts in different zones of school texts [6]. Features of words, such as morphological family membership and age of acquisition, are also considerations in identifying vocabulary thresholds at different grade levels. Descriptions of the two dimensions of the word zone framework follow: (a) distributions of words in texts by frequency zones and (b) features of words within word zones.

3.2. Distributions of Words in Texts

The word zones draw on an analysis of the words in a large sample of texts (17.5 million words) used in American schools from grades 1 to college entry [10]. Predictions of the number of appearances of a word in a million-word sample of school texts, the U function, were used to create four groups, as shown in Table 1 [6,7,34].
A distinguishing feature of the word zone construct is its orientation around morphological families. Numerous features can influence word complexity [35] and can be used to identify words for instruction from different word zones, as described in the subsequent section. However, one feature is critical for the formation of the word zones—the presence of morphological families. Nagy and Anderson [35] identified this feature as foundational for understanding the vocabulary demands of texts. Based on a sample of words in school texts for grades 3–9, Nagy and Anderson estimated that approximately 609,606 graphically distinct words could be grouped into 88,533 word families. Even most adults are unlikely to encounter all the distinct word families in an entire lifetime of reading, but, facing a gargantuan lexicon, guidance in morphological relatives can aid in establishing the meaning of unknown words.
As is evident in Table 1, the four word zones vary considerably in the numbers of unique words and their overall appearances in texts. The number of expected appearances of words in the first two word zones is close to the 95% level that has been proposed as essential for comprehension [30]. To achieve the 98% recognition of word meaning that Hu and Nation [33] described as needed for high levels of comprehension, proficiency with at least some of the words in the low-frequency and rare word zones is needed.
Words may move about in their rankings within a word zone, but we can anticipate that there will be consistency among the words that make up the high- and medium-frequency zones. In Hiebert et al.’s [34] comparison of the rankings of the words in the high- and medium-frequency zones of words with similar rankings from three databases of written language corpora, correlations were consistently at 0.95.
The contents of the third word zone, that of low-frequency words, would be expected to exhibit considerably greater variation than the first two word zones. Words are likely to decrease or increase in their ranking in the low-frequency zone as a function of the size and content of the sample (e.g., text genres, complexity levels of texts, and publication dates). For example, the proper nouns Beezus and Ribsy are in the low-frequency zone of the Zeno et al. [10] database, rather than in the rare word zone where they would be expected to appear. The higher-than-expected frequency of these proper nouns suggests that narratives written by Beverly Cleary were prominent in the text sample.
New words to the lexicon (e.g., blog) and new meanings for words (e.g., streaming) can also be expected to influence the word rankings within the low-frequency zone. A comparison of the words with predicted frequencies of one to nine in the COCA database [36], which sampled texts in general, not only school texts, produced a correlation of 0.54 with the low-frequency words in the word zone framework. The content and generalizability of the low-frequency word zone require further description—one of the goals of the current study.
The size and variability within the fourth zone of rare words are considerable. Many rare words appear only a single time in a text [37], meaning that even a small percentage of rare words in a text can entail many unique words. Tabulating rare words from a sample of texts provides only a glimpse at the rare words that students can be expected to confront across their school careers. However, analyzing rare words at varying levels of text complexity can provide insight into the percentage of rare words and features of rare words at different levels and in different text genres.

3.2.1. Development Differences

Those who learned to read with the basal readers that were prominent in the middle of the 20th century in U.S. schools, where characters Dick, Jane, Janet, and Mark were prominent, might assume that rare words in early texts are few and far between. These texts contained relatively few challenging words—a finding confirmed by researchers [15]. This research catalyzed an important shift in educational materials toward authentic texts, leading to significant changes in vocabulary complexity in beginning readers. In analyzing first-grade reading materials from a leading educational publisher over a 60-year period, Fitzgerald et al. [38] found that the rareness of words increased during the latter half of the study period, coinciding with mandates emphasizing authentic texts in reading instruction.
A critical consideration, however, in considering the rareness of a word is its frequency across developmental points. The overall frequency of a word is calculated based on the number of times a word appears in a corpus. But a word’s frequency can differ remarkably as a function of the developmental level of texts. In the overall lexicon, for example, words such as kiss and mommy have relatively low frequency indices but in texts for young children can be frequent. Juhasz et al. [39] have labeled these variations in frequency the trajectories of words. Simple, common words like get give way to more precise alternatives like acquire or attain. Similarly, general terms are replaced by specialized academic vocabulary like specific or process. Along with these semantic changes come shifts in word structure as simple root words develop more complex forms with prefixes and suffixes.
The use of word frequency lists often overlooks this important factor of developmental trajectory. Simply knowing a word’s overall frequency does not provide information on the relevance of a word for learners at specific points in reading development. Some words that appear frequently in general language may be rare in early reading materials, such as system and government, which are among the first 300 words in the Zeno et al. [10] frequency list. Conversely, some words that are less frequent overall (e.g., kiss and mommy) may be essential in early reading materials but become less relevant in later grades. This underscores why grade-level considerations are crucial when analyzing word frequency data.

3.2.2. Genre Differences

Expository and narrative texts exhibit substantial differences in features such as content and textual structures [40]. These differences also would be expected to extend to the rare vocabularies found within each genre. For instance, certain words such as echolocation and refraction are unlikely to appear in narrative texts, just as shamefaced and scrumptious would not typically be found in expository texts.
The assignment of words to word zones in the Hiebert et al. [34] study revealed similar distributions overall for narrative and expository texts. However, a notable difference was the presence of more unique words in narrative than in expository texts. Hiebert and Cervetti [41] found that 6% of the words in the narrative texts came from the low-frequency and rare zones, compared to 4% in the expository texts. A significant distinction between the two genres lies in the number of different words sourced from the rare zone. Narrative texts contain a higher number of unique words from the rare zone and exhibit few repetitions of these words. In contrast, expository texts in the Hiebert and Cervetti sample had fewer rare words and, when these words appeared, they were repeated an average of five times each.

3.2.3. Word Features

Analyses of the distributions of the unique and total words in the low-frequency and rare word zones provide insight into the overall challenges readers face. However, features other than frequency contribute to the complexity of recognizing the meaning of individual words. As demonstrated by Lawrence et al. [35] in their analysis of factors affecting students’ understanding of academic vocabulary, stronger readers were less affected by word frequency than weaker ones. In the case of rare words, which account for a high proportion of the lexicon, identification of their features can be useful for establishing which words may be especially challenging for beginning and struggling readers.
The length of a word has long been acknowledged as a key factor influencing both the speed and accuracy with which individuals access word meanings and decode words [42]. This relationship is closely tied to the number of phonemes and morphemes within a word. However, the number of letters in a word has consistently emerged as a reliable indicator of word complexity in both decoding and meaning retrieval [42].
Several additional variables have been found to predict word recognition. A critical factor is the prior exposure students have to words in their oral language environments. A variable that is based on predictions of when words enter individuals’ oral language contexts, age of acquisition (AoA) [43], has been shown to influence word recognition beyond age-related variables such as word frequency [39].
Other lexical elements relevant to the study of rare vocabulary include onomatopoeia, abbreviations, and words from other languages. Proper names would also be anticipated to be prominent among rare words; for example, Nagy and Anderson [44] predicted that about 20% of the lexicon would comprise proper names

4. The Current Study

This study is an analysis of rarer words in narrative and expository texts at three elementary school levels: the ends of grades 1, 3, and 5. Rarer words are those in the low-frequency word zone, where the predicted appearances of a word per million words of text or U function ranges from nine to one, and the rare word zone, where the U function of a word is less than one appearance per million. Morphological families within the low-frequency word zone have previously been identified [7], but the features and lexical classifications of words in this zone have not been established. Estimates of how much students read in school [45] lead to predictions that low-frequency words may prove a challenge to many elementary students whose reading exposure has been limited. Consequently, in the current analysis, we cluster the features of the words in the low-frequency group with words in the rare word zone, rather than with the high- and medium-frequency words.
Some aspects of this examination of the distribution and features of rarer words includes descriptions of high- and medium-frequency words. Our aim was not to perform a comprehensive review of all words in texts, including the high- and medium-frequency ones, but to provide a backdrop against which the characteristics of low-frequency and rare words can be meaningfully understood. This comparative approach is essential for three key reasons. First, by quantifying how much of typical texts is covered by high- and medium-frequency words, we can establish the relative presence and importance of rarer words. This information provides a crucial context for understanding just how “rare” these rare words are. Second, common words serve as an established baseline for comparing linguistic features like word length, age of acquisition, and morphological complexity. These comparisons help illuminate what makes rare words distinctively challenging for students. Third, high- and medium-frequency words represent the vocabulary foundation that students are expected to master. Understanding this baseline proficiency level helps contextualize the additional challenges posed by rarer words that students encounter less frequently.
In selecting texts that represent the reading diet of American students, our study required a systematic approach to analyzing low-frequency and rare words. The Lexile Framework emerged as our means of organization, given its pervasive influence in the American educational system, including educational assessments [46], and even its influences in how some community libraries are asked to organize their youth collections [47].
Moreover, this approach enabled us to describe the nature of rare vocabulary associated with different Lexile bands—an essential contribution given a critical limitation in the Framework’s treatment of vocabulary, as previously described. To illustrate the disconnect between vocabulary complexity and a Lexile designation, we compare three sentences from two eighth-grade items from the 2022 NAEP [11], one in science and one in civics.
Science: Goldfish take in oxygen (breathe) by moving water across their gills when they open and close their mouths. The breathing rate is determined by how often the goldfish opens and closes its mouth. A class set up an investigation to study how temperature affects the breathing rates of goldfish.
Civics: The Declaration of Independence identified several problems with the governance of the American colonies by Great Britain. When the design for the new government of the United States was being planned, the framers of the Constitution included solutions to the problems listed in the Declaration of Independence. Match the grievance from the Declaration of Independence to the corresponding sections of the United States Constitution where those grievances are addressed.
The Lexile Framework judges the vocabulary in the science quotation as more difficult (scoring 3.34) than the civics quotation (scoring 3.47, with higher scores indicating easier vocabulary). This evaluation appears to stem from the presence of four rare or infrequent words in the science passage: three appearances of goldfish and one of gills. Yet these are words that most eighth graders can readily decode. The civics passage also contains four words in the infrequent-to-rare category—corresponding, grievance, grievances, and governance—but these present a markedly different challenge in that the words have the polysyllabic structure that often stymies middle-school students’ reading [48].
The designation of Lexiles to the two excerpts closely correlates with sentence length rather than with vocabulary complexity. The science excerpt, despite having a higher designated vocabulary, has fewer words per sentence (16.3 compared to the civics passage’s 23), resulting in a 1070 Lexile level (upper elementary to middle school), while the civics passage receives a 1250 Lexile score (upper high school). The current dominance of Lexiles in American education cannot be overstated. We aim to establish clear patterns in the types of rarer vocabulary associated with different instructional levels in U.S. educational materials. By mapping vocabulary complexity across Lexile bands, our study provides essential information that complements and clarifies the Framework’s existing metrics.
The research questions (RQs) for this study are as follows:
  • How do texts in three grades and two genres compare regarding the distribution of unique and total number of words that fall into four frequency zones: high, medium, low, and rare?
  • How do word-level features of words in the low-frequency and rare word zones compare with those in the two higher-frequency zones across the three grades and two genres?
  • What lexical categories of words are represented within the low-frequency and rare word zones?
  • How many of the words in the rare word zone fall into existing morphological families (high-, medium-, and low-frequency zones)? How many new morphological families are in the rare word zone?

5. Methods

5.1. Selection of Texts and Words

The texts came from a database of 12,000 digitized school texts (8.5 million words), the TextBase (TextProject.org). Researchers at TextProject have gathered the texts over a 20-year period. The TextBase includes texts from school programs (reading and content areas) and trade books that are aimed at students from kindergarten through college- and career-ready levels. The TextBase contains all exemplar texts, both narrative and expository, identified by Common Core developers [23], which have been particularly influential in text selection for reading instruction over the past 15 years in the U.S. The TextBase also includes all selections from literature anthologies published after the Common Core’s introduction by the three major U.S. educational textbook companies (for details, see [49]). As a whole, the TextBase provides a comprehensive collection of texts that have been used for instruction and assessment across the U.S. in the 40-year period since the mandates of large American states for authentic text. As such, it provides a unique resource for analyzing the features of texts used in U.S. schools.
The Lexile of each text in the TextBase has been established [4]. The Lexile Framework assigns a difficulty score to a text from below 0 (beginning reading level) to above 2000 (adult level). The score represents an analysis of two components of the text: sentence length (mean sentence length) and semantic complexity (the previously described MLWF). Texts chosen for this study came from three Lexile bands that correspond to end-of-year levels for grades 1, 3, and 5 identified by Common Core developers [25]: 310–400 Lexile (L), 610–700 L, and 910–1000 L, respectively.
For each of the three grade levels, we randomly identified 100,000-word samples of text from the TextBase. Half the words at each grade level came from narrative texts and half from expository texts. The sample size was determined based on two criteria: (a) equal word count across all six components (2 genres, 3 grade levels) and (b) adherence to the specified Lexile range for each grade. Expository texts at the first grade level (310–400 L) was the smallest group with 120,000 words. To ensure a random sample for all groups, including first-grade expository texts, 50,000 words were selected for each of the six components. A previous proof-of-concept study [50] analyzed 1.4 million words sampled from kindergarten through the first year of college. Samples in that study for third and fifth grades were considerably larger than in the current study (grade 3: expository—78,987, narrative—182,801; grade 5: expository—97,869, narrative—180,935). The results of that study align closely with the current findings for grades 3 and 5, validating our approach of using equal word counts across groups.
The variation in numbers of texts represented in grade levels and genres, evident in Table 2, where first-grade texts outnumber fifth-grade texts by a ratio of 3.6:1, reflects the significantly shorter nature of texts designed for beginning readers compared to those for more proficient readers. For inclusion in the sample, texts needed to have at least 100 words. Moreover, given that a single novel commonly assigned to fifth graders can approach or exceed the 50,000-word criterion for a component (e.g., The Secret Garden at 79,000 words; Holes at 48,000 words), it became necessary to establish guidelines limiting the contribution from any single book. This was done to prevent the vocabulary of a single text from dominating the results. Consequently, we implemented a criterion that no more than 5000 words could be sourced from any individual text. For texts exceeding this upper limit, we took a 5000-word excerpt from the middle of the text. Table 2 provides descriptive data for numbers of words and texts in the sample.
We included all independent groups of letters in the analyses of individual words and text profiles, except for internet addresses and Arabic numerals. Hyphenated words were treated in one of two ways: (a) by combining components when one component was a word part (e.g., coworkers for co-workers) or (b) by separating the components into individual words (e.g., low and key for low-key).

5.2. Corpus Analysis

Two types of analyses were conducted on the corpus: quantitative and qualitative.

5.2.1. Quantitative Analyses

The Word Zone Profiler (WZP) [51] is a digital program that provides information on features of individual words in a text and on the frequency distributions of all words in a text. The WZP uses the U function (number of predicted appearances per million words of text; [10]) to place all words in a text into the four frequency groups in Table 1.
Both number of letters and AoA are also part of the output of the WZP for all of the words within a text sample. The database used to obtain AoA data was from Kuperman et al. [43]. This database included 78% (n = 11,425) of the 14,660 unique words from all six samples. Of the unique words without AoA data, 16% were proper names, contractions, onomatopoeia, non-English words, interjections (e.g., hey), and abbreviations (e.g., Dr.). The other 6% of words did not fit Kuperman et al.’s criterion of representativeness of words for inclusion in the AoA database. These words can therefore be considered rare.

5.2.2. Qualitative Analyses

We applied two qualitative schemes to individual words in the low-frequency and rare word zones: (a) classifications of lexical categories and (b) membership in a morphological family. Coding of lexical categories and morphological family membership was completed by the second author.
To verify the qualitative coding, a research assistant was trained in the coding of unique words from the sample. The second author provided a data dictionary that included the lexical categories with examples and non-examples. Also, part of the training materials was a document that described the procedures used to establish the original word families in the high-, medium-, and low-frequency word zones (see [34] for full procedures). These materials included the Becker et al. [52] database of morphological families of 26,000 words, which was used for verification of membership assignment in the original studies [7,34] and in this study.
The two raters reviewed the documents; the research assistant was then provided examples of coding from the sample of texts and given the opportunity to complete assisted and independent practice. Finally, the research assistant coded a 2.5% random sample (n = 378) of words from the total set of unique words among the six samples. For lexical categories, inter-rater agreement was 95%, and for morphological family membership, agreement was 96%.

5.3. Lexical Categories

We identified a set of lexical categories that describe different kinds of words, using foundational sources for verifying both the comprehensiveness and content of the categories [53,54]. The first group relates to fundamental morphological forms such as roots, derivations, and compounds. The second group of words, which fulfill communicative functions in text, consists of interjections, oral language or dialectic forms, onomatopoeia, and coined words. The third group consists of words from languages other than English and spellings from different English variants (i.e., British and American dialects). The final group consists of proper names.
Appendix A contains definitions and examples of the 13 lexical categories. We considered roots, inflections, derivations, compounds and contractions, and proper names as primary vocabulary, whereas we treated the other categories as word variants and combined them into a single group.

5.4. Morphological Family Membership

A morphological or word family consists of morphologically related words (e.g., walk, walks, walked, walking, walker, and sidewalk). The member of a morphological family with the highest frequency in the Zeno et al. [10] database is described as the lead word. This analysis began with the 5500 lead words in the high-, medium-, and low-frequency groups that had been identified in previous analyses [7,34]. Based on the criteria used in the previous work, words within the rare zone were either assigned to one of the 5500 morphological families or a new morphological family was formed in cases where no match could be made to one of the existing 5500 lead words.
As was the case in the previous analyses of lead words and morphological family membership, we used Becker et al.’s [52] database as a source of corroboration. We placed all roots, inflections, derivations, compounds and contractions, and alternative spellings into morphological stages. Exclamations and onomatopoeia were placed into a morphological stage if the word appeared as a head word in www.dictionary.com (accessed on 3 August 2022) with the same meaning as in the text sample. Proper nouns, abbreviations, non-words, non-English words, words from oral language, and word parts were not assigned a morphological stage, nor were exclamations and onomatopoeia that were not found in a dictionary. These lexical elements were classified as “no morphological stage”.

5.5. Statistical Analysis

For RQ1, we conducted chi-square tests of independence to examine the distributions of unique and total words across samples: (a) genre: a comparison of the samples of both genres; (b) grade: a comparison of the samples from grades 1 and 3, the samples from grades 3 and 5, and the samples from grades 1 and 5; and (c) genre by grade: a comparison of the expository texts with the narrative texts in each of the three grades. We then conducted post-hoc analyses with standardized residuals to determine whether each frequency zone in each comparison differed significantly from expected frequencies. For RQ2, we conducted Mann–Whitney U tests by genre, grade, and genre by grade to determine whether age of acquisition and word length differed. To account for multiple comparisons, we used Bonferroni correction in each test.

6. Results

RQ1: 
How do texts in three grades and two genres compare regarding the distribution of unique and the total number of words that fall into four frequency zones: high, medium, low, and rare?

6.1. Unique Words

Figure 1 shows the percentages of unique words in each frequency zone for all texts in each genre, regardless of grade, and all texts in each grade, regardless of genre. The distribution of unique words differed significantly between genres (χ²(3) = 113.59, p < 0.001). Expository texts had significantly more unique medium- and high-frequency words, while narrative texts had significantly more low-frequency and rare unique words (all ps < 0.001).
In comparisons of unique words by grade, chi-square analysis revealed significant differences in the distribution of unique words across grades (χ²(6) = 295.31, p < 0.001). Pairwise comparisons with Bonferroni correction showed significant differences between all grade pairs (grade 1 vs. 3: χ²(3) = 68.11; grade 1 vs. 5: χ²(3) = 277.12; grade 3 vs. 5: χ²(3) = 91.7; all ps < 0.001). Frequencies of unique words differed significantly from expectation in several cases. In grade 1, high-frequency words were overrepresented and low-frequency and rare words were underrepresented (ps < 0.001). In grade 3, there were fewer rare words than expected (p < 0.001). In grade 5, there were significantly fewer high-frequency (p < 0.001) and medium-frequency (p < 0.01) words than expected, and significantly more low-frequency and rare words (ps < 0.001). These differences resulted in significantly more high-frequency (p < 0.001) and fewer low-frequency (p < 0.01) and rare words (p < 0.001) in grade 1 than in grade 3, significantly more high-frequency (p < 0.001) and medium-frequency (p < 0.05) and fewer low-frequency (p < 0.05) and rare words (p < 0.001) in grade 3 than in grade 5, and significantly more high-frequency (p < 0.001) and medium-frequency (p < 0.05) and fewer low-frequency and rare words (ps < 0.001) in grade 1 than in grade 5.
Figure 2 presents the distributions of words within a genre for each individual grade. We found significant differences in word frequency distributions between genres at all grade levels (grade 1: χ²(3) = 22.13; grade 3: χ²(3) = 91.16; grade 5: χ²(3) = 64.68; all ps < 0.001). In grade 1, the comparison indicated significantly more high-frequency words and significantly fewer low-frequency and rare words (all ps < 0.001) in expository texts. In grade 3, expository texts included significantly more high-frequency (p < 0.001) and medium-frequency (p < 0.01) words and fewer low-frequency and rare words (ps < 0.001). The pattern from grade 3 was repeated in grade 5 expository and narrative texts (all ps < 0.001).

6.2. Total Words

Figure 3 shows the percentages of total words within each frequency zone for all texts by genre and grade. Similar to unique words, the distribution of total words differed significantly across genres (χ²(3) = 255.92, p < 0.001).
Grade-level differences in total word usage were significant (χ²(6) = 864.59, p < 0.001), with all pairwise comparisons showing significant differences (grade 1 vs. 3: χ²(3) = 233.39; grade 1 vs. 5: χ²(3) = 855.76; grade 3 vs. 5: χ²(3) = 211.53; all ps < 0.001). Among total words by grade, there were significantly more high-frequency words than expected in grade 1 and fewer in grade 5 (ps < 0.001), with significantly fewer medium-frequency (p < 0.05), low-frequency (p < 0.001), and rare words (p < 0.001) in grade 1 and more low-frequency and rare words (ps < 0.001) in grade 5. In pairwise comparisons, the post-hoc analyses showed that grade 1 had significantly more high-frequency words (p < 0.001) and fewer medium-frequency (p < 0.05), low-frequency (p < 0.001), and rare words than grade 3 and that both grade 1 and grade 3 had significantly more high-frequency words and fewer low-frequency and rare words (all ps < 0.001) than grade 5.
Figure 4 shows percentages of total words in each frequency zone by genre within each grade. Within-grade analyses of total words showed significant genre differences at all levels (grade 1: χ²(3) = 157.86; grade 3: χ²(3) = 75.61; grade 5: χ²(3) = 157.37; all ps < 0.001). Grade 1 expository texts had significantly more high frequency words and fewer words from all other frequency zones (all ps < 0.001) than narrative texts. Grade 3 expository texts had significantly more medium-frequency words and fewer low-frequency and rare words (all ps < 0.001) than narrative texts. Finally, grade 5 expository texts included significantly fewer high-frequency and rare words (ps < 0.001) and more medium-frequency words (p < 0.001) than narrative texts.
RQ2: 
How do word-level features of words in the low-frequency and rare word zones compare with those in the two higher-frequency zones across the three grades and two genres?
Table 3 provides means and standard deviations for word-level features of the low-frequency and rare words, which will be described as rarer words, and of the high- and medium-frequency words, which will be described as frequent words.
Compared to narrative texts, expository texts had significantly higher mean AoAs for frequent words (p < 0.001), but the differences were not substantive, diverging by 0.2 years. The genre comparison for rarer words revealed no significant difference. Differences in word length by genre were significant for both frequent and rarer words (ps < 0.001). The means for both AoA and word length increased significantly by grade (all ps < 0.001) and were higher among the rarer-word sample than among the sample of frequent words.
For grade 1, words in the expository-text sample had significantly higher mean AoAs (all words: p < 0.001; rarer words: p < 0.01) than those in the narrative-text sample. No significant differences were detected across genres in the grades 3 and 5 samples of rarer words, though when comparing frequent words across genres, differences in both grades were significant (ps < 0.001).
At each grade level, words from expository texts were longer on average than words from narrative texts; the differences ranged from 0.2 letters to 0.3 letters among frequent words and from 0.3 to 0.4 letters among rare words. All differences were significant (all ps < 0.01).
RQ3: 
What lexical categories of words are represented within the low-frequency and rare word zones?
As shown in Figure 5, the most common lexical categories in the expository and narrative samples were roots, inflections, and derivations, ranging from 21 to 25% of each sample. Proper names and compounds were the next most frequently occurring categories. The category that included all other word types comprised 3% of the expository samples and 7% of the narrative samples.
Within each grade, roots and inflections were the most frequently occurring word categories, ranging from 22 to 31% of words. Compounds and other word categories were reasonably consistent across grades, each comprising 5–11% of the unique words. Derivations differed greatly across grades, however, with marked increases in frequency from grade to grade. Compared to grade 1, proper names were less frequent in grades 3 and 5.
Figure 6 depicts the distributions of lexical categories for the six groups. In grade 1, the percentage of roots was substantially higher in expository texts than narrative ones. The pattern was reversed for grade 3, while the percentages of roots were the same across genres in grade 5.
Patterns for inflections were similar for both text genres at each grade level. But variations were considerable for derivations at different grade levels. At the first grade, more derived words appeared in narrative than in expository texts, while in the third grade the pattern was the opposite. At the fifth grade, derived words were higher in narrative than in expository texts, but differences were not as substantial as for the other two grades. Proper nouns and compounds tended to maintain a more consistent presence across genres within each grade.
RQ4: 
How many of the words in the rare word zone fall into existing morphological families (high-, medium-, or low-frequency zones)? How many new morphological families are in the rare word zone?
In that the morphological family membership of all low-frequency words in the Zeno et al. [10] database had been established [7], the present analysis was focused solely on the morphological family membership of rare words in the current text sample. More than half the words from both the expository and narrative samples and the samples from each grade had a low-frequency lead word, as evident in the data of Figure 7; these samples also had reasonably low percentages of words with high-frequency lead words. The percentage of words from new morphological families—that is, words with no established lead word in the previous three word zones—were highest in narrative texts and in grade 5 texts.
Across grade levels, as shown in Figure 8, differences in morphological family levels were not consistent. For instance, words with a low-frequency lead word occurred more often in expository texts than in narrative texts in grades 1 and 5, but the reverse was true in grade 3. Percentages of words from new morphological families were substantially higher in expository texts for grade 1, slightly higher in expository texts for grade 3, and substantially higher in narrative texts for grade 5.

7. Discussion

This study examined the percentage of low-frequency and rare words in elementary-level narrative and expository texts. We describe the patterns and features of rarer words across grade levels and genres and their implications for the design of reading instruction.

7.1. Profiles of Unique and Total Rarer Words

While rare words constitute a relatively small percentage of total words across grades one through five, their impact on text complexity and reading development warrants careful consideration. The progression of rare word usage across grade levels follows a clear developmental trajectory, increasing from 1.6% in first-grade texts to 3.4% in fifth-grade materials. Notably, this fifth-grade percentage exceeds the theoretical 98% word-recognition threshold considered optimal for reading comprehension.
The mere percentage of rare words tells only part of the story. Particularly striking is the relationship between rare words’ frequency and their contribution to lexical diversity. While rare words represent only a small percentage of total words, they account for a disproportionate share of unique words—ranging from 10% in first-grade texts to 17% in fifth-grade materials. This pattern highlights a crucial characteristic of rare words in elementary texts: their lack of repetition. The data reveal that 66% of rare words in our 300,000-word sample appeared only once, with non-proper names averaging just 1.8 occurrences (compared to 4.3 occurrences for proper names).
Low-frequency words play a consistent and significant role in elementary school texts, constituting approximately 4.6% of total words, with a modest increase from 4.1% to 5.3% across grade levels. While these words represent about 25% of unique words in the texts, their usage patterns differ meaningfully from rare words. Only 49% of low-frequency words appear just once (compared to 66% for rare words), and they show a higher average repetition rate of 2.8 times, excluding proper names. Their accessibility to learners is further enhanced by morphological connections: many low-frequency words have relatives in high- and medium-frequency zones, creating links to previously learned vocabulary. When accounting for these morphological family relationships, the effective repetition rate more than doubles. These characteristics—higher repetition rates, strong morphological connections to familiar words, and more predictable patterns of occurrence—create distinct learning opportunities. Unlike rare words, low-frequency words offer students more systematic exposure and clearer connections to known vocabulary, making them more accessible despite their relatively infrequent appearance in texts.

7.2. Features of Words in the Low-Frequency and Rare Word Zones

For words in the low-frequency and rare word zones, the age of acquisition consistently aligns with the upper end of the grade-level age range. Additionally, these words tend to have a greater number of letters, indicative of their multisyllabic nature. This trend, marked by higher average age-of-acquisition ratings and extended word lengths, is evident in both expository and narrative texts. However, it is particularly pronounced in expository texts.
These differences in age of acquisition and word length underscore a potentially pivotal distinction between rarer words in expository versus narrative texts: their conceptual complexity. Nagy et al. [55] identified conceptual complexity as the most critical factor of students’ ability to learn unknown words in context. They defined conceptual complexity as the extent to which understanding a word’s meaning relies on grasping the meanings of related, intricate concepts. For example, the word barge, found in one of the first-grade text in the current sample, can be easily explained with a picture of a flat-bottomed boat. In contrast, the word prism, also appearing in the sample of first-grade texts in this study, demands comprehension of multiple advanced ideas such as transparency, triangular surfaces, refraction, and the color spectrum. Nagy et al. concluded that students are less likely to understand or learn these conceptually complex words in context.
In the current study, we derived indicators of conceptual complexity from measures of age of acquisition and word length. Determining the conceptual complexity of words remains challenging. In Nagy et al.’s [55] experimental study, using only 200 target words, they still found it challenging to achieve consensus among raters on a 4-point conceptual complexity scale, with only 57% agreement. However, with the rapid advancements in artificial intelligence, it may soon be possible to categorize words by their conceptual complexity on a larger scale.

7.3. Lexical Categories

At all grade levels, rarer words with inflected endings, derivations, and compounds outnumbered root words. Notable differences emerged across grade levels. Derivations were markedly less common in first-grade texts compared to third-grade texts, with their prevalence increasing by approximately 10% from one grade level to the next. This trend highlights the progressively complex morphological structures students encounter as they advance through school.
Moreover, the distribution of lexical categories varied between narrative and expository texts. Proper nouns were substantially more frequent in expository texts, whereas the “other” category, which includes abbreviations and onomatopoeia, was more prevalent in narrative texts. Proper nouns comprised 19% of the low-frequency and rare words in first-grade texts, decreasing to 14% by the fifth grade. Linguists have described proper nouns as having unique semantic and grammatical features [56]. Proper names also vary in their functions across different content areas. In historical fiction or expository texts, proper nouns can be central to the content. In most narratives, authors select character names based on factors like gender, ethnicity, and physical characteristics.
The “other” category grows from first to fifth grade, with its composition evolving over time. In first-grade texts, onomatopoeia is the most common. How young children, particularly those whose primary literacy experiences occur in school, perceive or respond to onomatopoeia remains uncertain. By fifth grade, especially in expository texts, the “other” category predominantly consists of abbreviations. Like onomatopoeia, the degree to which this lexical element influences students’ word recognition and comprehension has not been examined extensively; however, it merits further investigation because of the prevalence of the element in school texts.

7.4. Presence of Morphological Families

The majority of unique words within the rare zone (86%) were morphologically connected to family members in the larger lexicon. Although many rare words belonged to morphological families with lead words in the three previous word zones (high, medium, and low), we identified a number of new morphological families within the rare group. The size of morphological families with rare lead words averaged only 1.3 members. This is understandable, given that both the lead word and its family members are rare.
The membership of many rare words in word families with lead words of higher frequencies highlights the importance of this aspect of the English lexicon. But this prominence does not mean that the instruction and learning of morphological families is straightforward or simple. First, a sizable morphological family does not guarantee that rare family members will be recognized by readers who know a more frequent member of a family. The data show that, as words become rarer, the likelihood of their presence in students’ oral language decreases. Thus, the meanings of rarer members of morphological families may be difficult for students to access.
Often, morphological families are equated with inflected endings and derivations, but the study findings point to a third class of important morphological connections: compound words. In our sample, approximately 9% of the rare words were compounds. The meanings of compound words can often be idiosyncratic. Although terms like schoolboy and choirboy are straightforward, others like cowboy and bellboy are less so. Despite the unique and sometimes complex meanings of these compounds, compound words are not typically emphasized in either research or pedagogy. This situation needs to be ameliorated, especially because compounding has been used in generating many terms for modern inventions and experiences (e.g., mousepad and smartwatch).

7.5. Caveats and Questions

Establishing thresholds of vocabulary covered by different word zones, such as the emphasis on the rarer word zones in this study, can serve a function in designating text complexity. At the same time, the overinterpretations and misguided practices that can result from rigid applications of numeric indices (e.g., readability formulas) serve as a cautionary tale for the use of word zones to evaluate students’ proficiency. The goal is not to withhold texts from students but to accurately assess their proficiency with particular word zones.
Furthermore, this study did not delve into the orthographic patterns of words—a significant area for further research. Many of the rare words identified were multisyllabic, presenting additional challenges in terms of their orthographic and phonological complexity. Analyzing the patterns within these multisyllabic words is a complex undertaking, as noted by Vousden [57], and would require a detailed investigation into their structure and usage across different text genres and grade levels.
We also emphasize that neither the word zone perspective nor the findings of this study are intended to limit rare vocabulary exposure for elementary-level students. Incorporating challenging vocabulary into texts is widely acknowledged as crucial for intellectual development and vocabulary acquisition [58]. Exposure to unfamiliar and complex terms can broaden readers’ understanding. In the absence of variability in vocabulary, texts risk becoming monotonous, which could limit readers’ cognitive and educational growth. However, understanding how these challenging words affect comprehension, particularly among beginning and struggling readers, requires a more nuanced analysis than what is provided by descriptive data of word occurrence in texts.
The precise threshold at which rare or less common words begin to hinder comprehension is not well-defined. Typical texts may contain 2 rare words per 100 words, as found in this study, but this percentage’s impact can vary significantly depending on readers’ proficiency levels. Additionally, the role of these rare words in the text plays a crucial role in comprehension. For instance, if unknown words are adjectives describing story elements or characters, readers may still understand the narrative’s gist even if they do not grasp the specific meanings of some words [59].
A less studied area of percentages of rare words in texts is long-term effects on students, especially those who are not proficient readers. While existing research has examined the immediate impact of challenging words on comprehension [60], the long-term effects of frequent exposure to difficult vocabulary are not well understood. For struggling readers, constant engagement with texts filled with challenging words could lead to frustration and reduced motivation. Over time, this persistent difficulty might increase disengagement and impede overall reading development [61]. To better support struggling readers and enhance their reading experiences, future research should investigate the long-term consequences of variations in vocabulary thresholds for students of different proficiency levels.
A question that we raise relates to the role of word repetition in the current model of authentic texts. Although the corpus analyzed in this study does not represent an instructional program, the transition to authentic texts has highlighted the high percentages of singly appearing words within program texts [37]. A morphological perspective in reading instruction may alleviate some of the burden of singly appearing words. However, as already raised, we do not know what is required for students, especially those who struggle with reading acquisition, to draw on their knowledge of the morphemes in previously encountered words. While current text complexity models have largely overlooked the role of word repetition since the shift toward authentic texts, research in reading development consistently emphasizes the importance of repeated exposure to words and their structural patterns for developing automatic word recognition and meaning extraction [62]. The precise number of exposures needed for students at different developmental stages to achieve automaticity remains undetermined, highlighting a critical gap in our understanding. These findings suggest that future research should prioritize investigating the optimal frequency of word exposure needed for proficient word recognition and meaning acquisition.

7.6. Applications to Instruction

Despite extensive research in corpus linguistics over the past two decades, vocabulary programs in American schools have not significantly benefitted from these advancements. In many classrooms, vocabulary instruction is limited to instruction of six to eight words for a week of study. Instruction in a small set of individual words has not proven to substantially enhance generalized comprehension [63] or vocabulary [64].
We believe this study underscores the need for a new model of vocabulary instruction: a three-pronged approach. This model addresses vocabulary experiences with the general academic words, topical words, and literary words within the first three word zones, as well as rare words. A substantial amount of research needs to be conducted on this model, but we believe that its design merits attention.
First, morphological connections need to be placed on an equal footing with orthography in reading instruction. The finding that 76% of the rare words were related to lead words in the first three word zones underscores the importance of integrating morphological knowledge into the vocabulary curriculum. This approach should be tailored to different grade levels, aligning with students’ developmental stages. In the primary grades, instruction should focus on inflected endings and compound words. As students progress through elementary school, the emphasis should shift to more complex morphological forms, such as derivations. This progression not only supports decoding and spelling but also fosters a deeper understanding of word meanings and relationships, ultimately improving overall literacy.
A second aim is to increase awareness of lexical elements. Proper nouns are an example of a lexical element that rarely receives attention in the American reading curriculum. Proper nouns vary considerably from common nouns in that they do not have a real meaning or definition, and they are not connected to what they designate by a semantic link but by a particular convention [65]. Proper nouns can also have inconsistent orthographic patterns. By explicitly teaching students to anticipate proper nouns and their unique characteristics, educators can bridge a critical gap in literacy instruction.
A third element has to do with the unique vocabularies of literary and expository texts. The unique words in expository and narrative texts vary and so must instruction. This study’s finding that a higher percentage of words from the low-frequency and rare zones occur in narrative than in expository texts reflects storytellers’ use of synonyms to convey character traits or actions or features of the fictional context or problem [66]. Support for a semantic strategy with narrative texts comes from an investigation by Beck et al. [67], who taught groups of words for different components of story structure, including roles of characters (e.g., spectator and acquaintance), ways of looking (e.g., squint and gape), and traits related to dependability (e.g., diligent and independent). This study stands out for its effectiveness compared to other vocabulary interventions [63,64], demonstrating significant improvements in both comprehension and vocabulary on norm-referenced measures when compared to participants who received typical instruction.
In expository texts, the critical vocabulary represents interconnected concepts, unlike the synonyms of narrative texts. Words in expository texts are conceptually linked, forming networks that support complex understanding. The list of topics that describe the phenomena of the natural and physical worlds can be excessive, but current technology supports the identification of key terms that may influence vocabulary. O’Reilly et al. [68] identified key terms related to a topic through natural language processing techniques. The key terms may not all have been present in the articles that students read, but knowledge of these terms had a strong positive effect on comprehension.
In summary, current vocabulary programs in American schools have not fully utilized advancements in corpus linguistics. By integrating morphological awareness, fostering awareness of lexical elements, and distinguishing between literary and expository vocabularies, we aim to redefine vocabulary instruction to better support students’ literacy development and overall comprehension.

8. Conclusions

Rare words add to the quality, content, and specificity of texts. In authentic texts, students are exposed to words that can extend their vocabulary and comprehension. At the same time, the number of rare words in authentic texts can increase the demands on students’ reading proficiency, requiring them to have automaticity with the majority of words in the high-, medium-, and low-frequency word zones and strategies to decipher rare words.
Policymakers frequently cited the national report Becoming a Nation of Readers [16] to mandate the use of authentic texts in reading programs. However, the report itself did not provide evidence supporting the efficacy of authentic texts in reading acquisition and development. Rather, the report called for “interesting, comprehensible, and natural-sounding selections, while at the same time constraining the vocabulary” [16] (p. 47). The creation of interesting, comprehensive, and natural-sounding texts that are based on well-conceived theoretical frameworks of vocabulary should be a priority for the educational community. Such work is essential if students are to develop the extensive lexicons and automaticity that underlie the comprehension required for full participation in the global–digital world.

Author Contributions

Conceptualization, E.H.H. and A.P.; methodology, E.H.H., A.P. and D.M.K.; software, A.P. and D.M.K.; validation, E.H.H. and A.P.; formal analysis, E.H.H. and A.P.; investigation, E.H.H. and A.P.; resources, E.H.H.; data curation, E.H.H. and A.P.; writing—original draft preparation, E.H.H. and A.P.; writing—review and editing, E.H.H. and A.P.; visualization, A.P.; supervision, E.H.H. and D.M.K.; project administration, E.H.H.; funding acquisition, not applicable. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Descriptions of word categories with examples.
Table A1. Descriptions of word categories with examples.
CategoryDescriptionExamples
Abbreviations
  • Initialisms
  • Acronyms
  • Contractions that use a period
  • Shortened forms that use a period
  • DVD, m
  • NASA, PETA
  • vs.
  • corp.
Alternative spellings
  • Spellings from other dialects
  • Spellings that are rarely used or archaic
  • flavour, cosy
  • goodby, cocoanuts
Compounds and contractions
  • Compound words without spaces or hyphens
  • Multiword contractions with apostrophe(s)
  • armchair, painstakingly
  • how’ll, more’n
Derivations
  • Unbound roots with derivational affixes
  • Derived words as above that are inflected
  • Gerunds
  • refill, sweetness
  • energized, awaiting
  • dwelling
Exclamations
  • Commonly used exclamations and interjections
  • Variations on spellings for common exclamations
  • hurray, boohoo
  • shhhh, ooooh
Inflections
  • Root words (including clipped forms) with inflectional suffixes
  • Irregular forms of verbs
  • Foreign loanwords that use English inflections
  • limping, dived
  • clung, bade
  • siestas, sarongs
Non-words
  • Invented words that are not exclamations or onomatopoeia
  • Word parts that are separated for effect
  • Pronunciation guides or character pronunciations
  • purrfect, snapcrackers
  • hon, nour (from on my hon-nour)
  • sahd, ee (from Saade, a proper name)
Non-English words
  • Words from other languages that are not loanwords in English
  • Expressions in non-English languages whose individual words are not loanwords
  • ibn, certainement
  • je, sais
Onomatopoeia
  • Words that imitate sounds
  • moo, clunk
Oral language
  • Changes in spelling for drama/effect
  • Changes in spelling to reflect oral language, including those with apostrophes
  • deeeeeeeelicious
  • fellas, runnin’
Proper nouns
  • Proper nouns
  • Proper adjectives
  • Alternative spellings of proper nouns
  • Proper forms of address in long form
  • Baltimore’s, Buddhists
  • Nigerian, Portuguese
  • Dan’l (from Daniel)
  • Mister, Madame
Roots
  • English root words, including words with multiple unbound Greek or Latin roots
  • Words borrowed from other languages
  • Common or literary clipped forms of longer words, including those with apostrophes
  • bliss, meter
  • safari, maharaja
  • ad, o’er
Word parts
  • Affixes from words from which hyphens or numbers have been removed
  • Individual letters (e.g., a character reciting the alphabet)
  • micro, th (e.g., 4th)
  • A, B, C

References

  1. Zipf, G.K. The Psycho-Biology of Language: An Introduction to Dynamic Philology; Houghton Mifflin: Boston, MA, USA, 1935. [Google Scholar]
  2. Oxford English Dictionary. Frequency. 2024. Available online: https://www.oed.com/information/understanding-entries/frequency/?tl=true (accessed on 1 November 2024).
  3. Dale, E.; Chall, J.S. A formula for predicting readability: Instructions. Educ. Res. Bull. 1948, 27, 37–54. [Google Scholar]
  4. Stenner, A.J.; Burdick, H.; Sanford, E.E.; Burdick, D.S. How accurate are Lexile text measures? J. Appl. Meas. 2006, 7, 307. [Google Scholar]
  5. Murray, C.S.; Stevens, E.A.; Vaughn, S. Teachers’ text use in middle school content-area classrooms. Read. Writ. 2022, 35, 177–197. [Google Scholar] [CrossRef]
  6. Hiebert, E.H. In pursuit of an effective, efficient vocabulary curriculum for elementary students. Teach. Learn. Vocab. Bringing Res. Pract. 2005, 1, 243–264. [Google Scholar]
  7. Hiebert, E.H. An Examination of Morphological Families in the Low-Frequency Word Zone (Reading Research Report 24.01); TextProject: Santa Cruz, CA, USA, 2024. [Google Scholar]
  8. Schmitt, N.; Schmitt, D. A reassessment of frequency and vocabulary size in L2 vocabulary teaching. Lang. Teach. 2014, 47, 484–503. [Google Scholar] [CrossRef]
  9. Thorndike, E.L. The Teacher’s Word Book; Teachers College, Columbia University: New York, NY, USA, 1927. [Google Scholar]
  10. Zeno, S.; Ivens, S.H.; Millard, R.T.; Duvvuri, R.; Rothkopf, E.Z.; Touchstone Applied Science Associates; National Institute of Child Health and Human Development (U.S.); New York State Science and Technology Foundation. The Educator’s Word Frequency Guide; Touchstone Applied Science Associates: Rosemont, NJ, USA, 1995; Available online: https://search.worldcat.org/title/The-Educator%27s-word-frequency-guide/oclc/33926219 (accessed on 1 November 2024).
  11. National Center for Education Statistics. The NAEP Reading Achievement Levels by Grade; U.S. Department of Education, Institute of Education Sciences: Washington, DC, USA, 2022. Available online: https://nces.ed.gov/nationsreportcard/reading/achieve.aspx#2009_grade8 (accessed on 1 November 2024).
  12. Amendum, S.J.; Conradi, K.; Hiebert, E. Does text complexity matter in the elementary grades? A research synthesis of text difficulty and elementary students’ reading fluency and comprehension. Educ. Psychol. Rev. 2018, 30, 121–151. [Google Scholar] [CrossRef]
  13. Elston, W.H.; Gray, W.S. Elston Gray Basic Readers; Scott, Foresman, & Co.: Chicago, IL, USA, 1936. [Google Scholar]
  14. Dolch, E.W. A basic sight vocabulary. Elem. Sch. J. 1936, 36, 456–460. [Google Scholar] [CrossRef]
  15. Davison, A.; Kantor, R.N. On the failure of readability formulas to define readable texts: A case study from adaptations. Read. Res. Q. 1982, 17, 187–209. [Google Scholar] [CrossRef]
  16. Anderson, R.C.; Hiebert, E.H.; Scott, J.A.; Wilkinson, I.A.G. Becoming a Nation of Readers: The Report of the Commission on Reading; National Academy of Education: Washington, DC, USA, 1985. [Google Scholar]
  17. California Board of Education. English-Language Arts Framework for California Public Schools: Kindergarten Through Grade 12; Department of Education: Sacramento, CA, USA, 1988. [Google Scholar]
  18. Texas Educational Agency. Proclamation of the State Board of Education Advertising for Bids on Textbooks; Texas Education Agency: Austin, TX, USA, 1990. [Google Scholar]
  19. Durr, W.K.; LePere, J.M.; Niehaus, B. Houghton Mifflin Reading; Houghton Mifflin: Boston, MA, USA, 1974. [Google Scholar]
  20. Cooper, J.D.; Pikulski, J.J.; Au, K.; Calderon, M.; Comas, J.C.; Lipson, Y.; Mims, J.S.; Page, S.E.; Valencia, S.W.; Vogt, M. Invitations to Literacy; Houghton Mifflin Company: Boston, MA, USA, 1997. [Google Scholar]
  21. Stover, M. Patrick and the Great Molasses Explosion; Dillon Press: Scottsdale, AZ, USA, 1985. [Google Scholar]
  22. Gardner, D. Vocabulary recycling in children’s authentic reading materials: A corpus-based investigation of narrow reading. Read. Foreign Lang. 2008, 20, 92–122. [Google Scholar]
  23. National Governors Association Center for Best Practices & Council of Chief State School Officers (NGA/CBP & CCSSO). Common Core State Standards for English Language Arts and Literacy in History/Social Studies, Science, and Technical Subjects. 2010. Available online: https://www.thecorestandards.org/ELA-Literacy/ (accessed on 1 November 2024).
  24. Peterson, P.E.; Barrows, S.; Gift, T. After Common Core, states set rigorous standards. Educ. Next 2016, 16, 9–15. [Google Scholar]
  25. Nelson, J.; Perfetti, C.; Liben, D.; Liben, M. Measures of Text Difficulty: Testing Their Predictive Value for Grade Levels and Student Performance; Student Achievement Partners: New York, NY, USA; Council of Chief State School Officers: Washington, DC, USA, 2012. [Google Scholar]
  26. Cunningham, J.W.; Hiebert, E.H.; Mesmer, H.A. Investigating the validity of two widely used quantitative text tools. Read. Writ. 2018, 31, 813–833. [Google Scholar] [CrossRef]
  27. Betts, E.A. Foundations of Reading Instruction, with Emphasis on Differentiated Guidance; American Book Company: Woodstock, GA, USA, 1946. [Google Scholar]
  28. Kilgallon, P.A. A Study of Relationships among Certain Pupil Adjustments in Language Situations. Unpublished Ph.D. Thesis, Pennsylvania State University, University Park, PA, USA, 1942. [Google Scholar]
  29. Halladay, J.L. Revisiting key assumptions of the reading level framework. Read. Teach. 2012, 66, 53–62. [Google Scholar] [CrossRef]
  30. Laufer, B. 25 What Percentage of Text-Lexis is Essential for Comprehension? In Special Language: From Humans Thinking to Thinking Machines; Multilingual Matters: Bristol, UK, 1989; p. 316. [Google Scholar]
  31. Gardner, D. Vocabulary input through extensive reading: A comparison of words found in children’s narrative and expository reading materials. Appl. Linguist. 2004, 25, 1–37. [Google Scholar] [CrossRef]
  32. Hirsch, D.; Nation, P. What vocabulary size is needed to read unsimplified texts for pleasure. Read. Foreign Lang. 1992, 8, 689–696. [Google Scholar]
  33. Hu, M.H.; Nation, P. Unknown vocabulary density and reading comprehension. Read. Foreign Lang. 2000, 13, 403–430. [Google Scholar]
  34. Hiebert, E.H.; Goodwin, A.P.; Cervetti, G.N. Core vocabulary: Its morphological content and presence in exemplar texts. Read. Res. Q. 2018, 53, 29–49. [Google Scholar] [CrossRef]
  35. Lawrence, J.F.; Knoph, R.; McIlraith, A.; Kulesz, P.A.; Francis, D.J. Reading comprehension and academic vocabulary: Exploring relations of item features and reading proficiency. Read. Res. Q. 2022, 57, 669–690. [Google Scholar] [CrossRef]
  36. Davies, M. The Corpus of Contemporary American English as the first reliable monitor corpus of English. Lit. Linguist. Comput. 2010, 25, 447–464. [Google Scholar] [CrossRef]
  37. Graves, M.F.; Elmore, J.; Fitzgerald, J. The vocabulary of core reading programs. Elem. Sch. J. 2019, 119, 386–416. [Google Scholar] [CrossRef]
  38. Fitzgerald, J.; Elmore, J.; Relyea, J.E.; Hiebert, E.H.; Stenner, A.J. Has first-grade core reading program text complexity changed across six decades? Read. Res. Q. 2016, 51, 7–28. [Google Scholar]
  39. Juhasz, B.J.; Yap, M.J.; Raoul, A.; Kaye, M. A further examination of word frequency and age-of-acquisition effects in English lexical decision task performance: The role of frequency trajectory. J. Exp. Psychol. Learn. Mem. Cogn. 2019, 45, 82. [Google Scholar] [CrossRef] [PubMed]
  40. Biber, D. Variation Across Speech and Writing; Cambridge University Press: Cambridge, UK, 1991. [Google Scholar]
  41. Hiebert, E.H.; Cervetti, G.N. What differences in narrative and informational texts mean for the learning and instruction of vocabulary. Vocab. Instr. Res. Pract. 2012, 2, 322–344. [Google Scholar]
  42. Brysbaert, M.; Mandera, P.; Keuleers, E. The word frequency effect in word processing: An updated review. Curr. Dir. Psychol. Sci. 2018, 27, 45–50. [Google Scholar] [CrossRef]
  43. Kuperman, V.; Stadthagen-Gonzalez, H.; Brysbaert, M. Age-of-acquisition ratings for 30,000 English words. Behav. Res. Methods 2012, 44, 978–990. [Google Scholar] [CrossRef]
  44. Nagy, W.E.; Anderson, R.C. How many words are there in printed school English? Read. Res. Q. 1984, 19, 304–330. [Google Scholar] [CrossRef]
  45. Guthrie, J.T.; Schafer, W.D.; Huang, C.W. Benefits of opportunity to read and balanced instruction on the NAEP. J. Educ. Res. 2001, 94, 145–162. [Google Scholar] [CrossRef]
  46. Koenig, J.A.; Edley, C., Jr. (Eds.) Evaluation of the Achievement Levels for Mathematics and Reading on the National Assessment of Educational Progress; National Academies Press: Washington, DC, USA, 2017. [Google Scholar]
  47. Mehra, B.; Davis, R. A strategic diversity manifesto for public libraries in the 21st century. New Libr. World 2015, 116, 15–36. [Google Scholar] [CrossRef]
  48. Tortorelli, L.S.; Strong, J.Z.; Anderson, B.E. Multisyllabic decoding achievement and relation to vocabulary at the end of elementary school. J. Exp. Child Psychol. 2024, 246, 106018. [Google Scholar] [CrossRef]
  49. Kearns, D.M.; Hiebert, E.H. The word complexity of primary-level texts: Differences between first and third grade in widely used curricula. Read. Res. Q. 2022, 57, 255–285. [Google Scholar] [CrossRef]
  50. Pugh, A.; Hiebert, E.H. What is the task represented by rare words in text? In Proceedings of the Annual Meeting of the Literacy Research Association, Tampa, FL, USA, 4–7 December 2019. [Google Scholar]
  51. Hiebert, E.H. Word Zone Profiler; TextProject: Santa Cruz, CA, USA, 2012. [Google Scholar]
  52. Becker, W.C.; Dixon, R.; Anderson-Inman, L. Morphographic and Root Word Analysis of 26,000 High Frequency Words; Technical Report 1980-1; University of Oregon/Follow Through Project: Eugene, OR, USA, 1980. [Google Scholar]
  53. Baker, M. Lexical Categories: Verbs, Nouns, and Adjectives; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
  54. Huddleston, R.; Pullum, G.K. The Cambridge Grammar of the English Language; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
  55. Nagy, W.E.; Anderson, R.C.; Herman, P.A. Learning word meanings from context during normal reading. Am. Educ. Res. J. 1987, 24, 237–270. [Google Scholar] [CrossRef]
  56. Jackendoff, R. Foundations of Language; Oxford University Press: Oxford, UK, 2002. [Google Scholar]
  57. Vousden, J.I. Units of English spelling-to-sound mapping: A rational approach to reading instruction. Appl. Cogn. Psychol. Off. J. Soc. Appl. Res. Mem. Cogn. 2008, 22, 247–272. [Google Scholar] [CrossRef]
  58. Adams, M. The Challenge of Advanced Texts: The Interdependence of Reading and Learning. In Reading More, Reading Better: Are American Students Reading Enough of the Right Stuff? Hiebert, E.H., Ed.; Guilford Publications: New York, NY, USA, 2009; pp. 163–189. [Google Scholar]
  59. Freebody, P.; Anderson, R.C. Effects of vocabulary difficulty, text cohesion, and schema availability on reading comprehension. Read. Res. Q. 1983, 18, 277–294. [Google Scholar] [CrossRef]
  60. Stahl, S.A.; Jacobson, M.G.; Davis, C.E.; Davis, R.L. Prior knowledge and difficult vocabulary in the comprehension of unfamiliar text. Read. Res. Q. 1989, 24, 27–43. [Google Scholar] [CrossRef]
  61. Morris, D.; Meyer, C.; Trathen, W.; McGee, J.; Vines, N.; Stewart, T.; Gill, T.; Schlagal, R. The simple view, instructional level, and the plight of struggling fifth-/sixth-grade readers. Read. Writ. Q. 2017, 33, 278–289. [Google Scholar] [CrossRef]
  62. Seidenberg, M. Language at the Speed of Sight: How We Read, Why So Many Can’t, and What Can Be Done About it; Basic Books: New York, NY, USA, 2017. [Google Scholar]
  63. Wright, T.S.; Cervetti, G.N. A systematic review of the research on vocabulary instruction that impacts text comprehension. Read. Res. Q. 2017, 52, 203–226. [Google Scholar] [CrossRef]
  64. Cervetti, G.N.; Fitzgerald, M.S.; Hiebert, E.H.; Hebert, M. Meta-analysis examining the impact of vocabulary instruction on vocabulary knowledge and skill. Read. Psychol. 2023, 44, 672–709. [Google Scholar] [CrossRef]
  65. Valentine, T.; Brennen, T.; Bredart, S. The Cognitive Psychology of Proper Names; Routledge: London, UK, 1996. [Google Scholar]
  66. Mandler, J.M. On the psychological reality of story structure. Discourse Process. 1987, 10, 1–29. [Google Scholar] [CrossRef]
  67. Beck, I.L.; Perfetti, C.A.; McKeown, M.G. Effects of long-term vocabulary instruction on lexical access and reading comprehension. J. Educ. Psychol. 1982, 74, 506. [Google Scholar] [CrossRef]
  68. O’Reilly, T.; Wang, Z.; Sabatini, J. How much knowledge is too little? When a lack of knowledge becomes a barrier to comprehension. Psychol. Sci. 2019, 30, 1344–1351. [Google Scholar] [CrossRef]
Figure 1. Percentages of unique words in each frequency zone by genre and grade. Note. Expository = all samples of expository text (N = 9453); Narrative = all samples of narrative text (N = 9969); Grade 1, Grade 3, and Grade 5 = expository and narrative samples from grade 1 (N = 4882), grade 3 (N = 7879), and grade 5 (N = 10,259), respectively. Percentages may not sum to 100 due to rounding.
Figure 1. Percentages of unique words in each frequency zone by genre and grade. Note. Expository = all samples of expository text (N = 9453); Narrative = all samples of narrative text (N = 9969); Grade 1, Grade 3, and Grade 5 = expository and narrative samples from grade 1 (N = 4882), grade 3 (N = 7879), and grade 5 (N = 10,259), respectively. Percentages may not sum to 100 due to rounding.
Education 14 01314 g001
Figure 2. Percentages of unique words in each frequency zone by genre within each grade. Note. Ex = expository; nar = narrative. Each column label refers to unique words within that grade and genre combination. Ns from left to right: 3499; 3118; 4874; 5346; 6440; 6576. Percentages may not sum to 100 due to rounding.
Figure 2. Percentages of unique words in each frequency zone by genre within each grade. Note. Ex = expository; nar = narrative. Each column label refers to unique words within that grade and genre combination. Ns from left to right: 3499; 3118; 4874; 5346; 6440; 6576. Percentages may not sum to 100 due to rounding.
Education 14 01314 g002
Figure 3. Percentages of total words in each frequency zone by genre and grade. Note. Expository = all samples of expository text (N = 149,448); Narrative = all samples of narrative text (N = 150,892); Grade 1, Grade 3, and Grade 5 = expository and narrative samples from grade 1 (N = 100,810), grade 3 (N = 99,887), and grade 5 (N = 99,643), respectively. Percentages may not sum to 100 due to rounding.
Figure 3. Percentages of total words in each frequency zone by genre and grade. Note. Expository = all samples of expository text (N = 149,448); Narrative = all samples of narrative text (N = 150,892); Grade 1, Grade 3, and Grade 5 = expository and narrative samples from grade 1 (N = 100,810), grade 3 (N = 99,887), and grade 5 (N = 99,643), respectively. Percentages may not sum to 100 due to rounding.
Education 14 01314 g003
Figure 4. Percentages of total words in each frequency zone by genre within each grade. Note. Ex = expository; nar = narrative. Each column label refers to unique words within that grade and genre combination. Ns from left to right: 50,030; 49,952; 50,078; 50,091; 49,992; 50,005. Percentages may not sum to 100 due to rounding.
Figure 4. Percentages of total words in each frequency zone by genre within each grade. Note. Ex = expository; nar = narrative. Each column label refers to unique words within that grade and genre combination. Ns from left to right: 50,030; 49,952; 50,078; 50,091; 49,992; 50,005. Percentages may not sum to 100 due to rounding.
Education 14 01314 g004
Figure 5. Percentages of unique rarer a words in lexical categories by genre and grade. Note. Expository = unique rarer words from all expository texts (N = 3714); Narrative = unique rarer words from all narrative texts (N = 4661); Grade 1, Grade 3, and Grade 5 = unique rarer words from grade 1 (N = 1559), grade 3 (N = 2945), and grade 5 (N = 4414) texts, respectively. Percentages may not sum to 100 due to rounding. a Words in low-frequency and rare frequency zones.
Figure 5. Percentages of unique rarer a words in lexical categories by genre and grade. Note. Expository = unique rarer words from all expository texts (N = 3714); Narrative = unique rarer words from all narrative texts (N = 4661); Grade 1, Grade 3, and Grade 5 = unique rarer words from grade 1 (N = 1559), grade 3 (N = 2945), and grade 5 (N = 4414) texts, respectively. Percentages may not sum to 100 due to rounding. a Words in low-frequency and rare frequency zones.
Education 14 01314 g005
Figure 6. Percentages of unique rarer a words in lexical categories by genre within each grade.
Figure 6. Percentages of unique rarer a words in lexical categories by genre within each grade.
Education 14 01314 g006
Figure 7. Percentages of rare words at each morphological stage by genre and grade.
Figure 7. Percentages of rare words at each morphological stage by genre and grade.
Education 14 01314 g007
Figure 8. Percentages of rare words at each morphological stage by genre within each grade.
Figure 8. Percentages of rare words at each morphological stage by genre within each grade.
Education 14 01314 g008
Table 1. Total and unique words and morphological families: four word zones based on U function 1.
Table 1. Total and unique words and morphological families: four word zones based on U function 1.
Unique Words (#) 2Word Families (#) 3Predicted Appearances per Million Words of Texts
(U Function)
Total Words Accounted for in Texts 4 (%)
High-Frequency930621100+78
Medium-Frequency4655183099–1016
Low-Frequency13,88130409–14
Rare124,405+ <12
1 U function is the number of times a word is predicted to appear per million words of text [10]. 2 Unique words are derived from [10]. 3 High- and medium-frequency families are based on [34]; low-frequency families are based on [7]. 4 This percentage is based on the entire Zeno et al. database [10].
Table 2. Texts at three grade levels and two genres: numbers of texts and words in texts.
Table 2. Texts at three grade levels and two genres: numbers of texts and words in texts.
SampleNo. of TextsWords in SampleText Length Mean (SD)
Grade 1 ex24250,030207 (130)
Grade 1 nar14949,952338 (477)
Grade 3 ex9050,078556 (505)
Grade 3 nar4550,0911113 (1172)
Grade 5 ex6949,992725 (797)
Grade 5 nar3950,0051282 (1410)
Note. ex = expository; nar = narrative.
Table 3. Means (SDs) for age of acquisition and word length of unique words by genre, grade, and grade x genre.
Table 3. Means (SDs) for age of acquisition and word length of unique words by genre, grade, and grade x genre.
Age of AcquisitionWord Length
SampleFrequent WordsRarer WordsFrequent WordsRarer Words a
Expository6.5 (2.0) ***8.8 (2.6)6.4 (2.1) ***7.3 (2.4) ***
Narrative6.3 (2.0)8.7 (2.6)6.2 (2.0)7.0 (2.2)
Grade 1 b5.5 (1.6) ***7.2 (2.2) ***5.6 (1.8) ***6.1 (2.1) ***
Grade 3 c6.2 (1.9) ***8.1 (2.4) ***6.2 (2.0) ***7.0 (2.2) ***
Grade 5 d6.6 (2.1) ***9.4 (2.6) ***6.5 (2.1) ***7.5 (2.3) ***
Grade 1 ex5.4 (1.6) ***7.3 (2.2) **5.6 (1.8) ***6.2 (2.1) ***
Grade 1 nar5.2 (1.4)6.9 (2.1)5.3 (1.6)5.9 (2.1)
Grade 3 ex6.1 (1.9) ***8.1 (2.3)6.1 (2.0) ***7.1 (2.3) **
Grade 3 nar5.8 (1.7)8.0 (2.4)5.9 (1.9)6.8 (2.2)
Grade 5 ex6.6 (2.1) ***9.4 (2.6)6.5 (2.1) ***7.7 (2.5) ***
Grade 5 nar6.3 (2.0)9.2 (2.6)6.2 (2.0)7.3 (2.2)
Note. Expository = all samples of expository text; Narrative = all samples of narrative text; Grade 1, Grade 3, and Grade 5 = expository and narrative samples from grade 1, grade 3, and grade 5, respectively; AoA = age of acquisition; WL = word length. Word length is measured in number of letters. a Words in low-frequency and rare word zones. b Compared to grade 3 in Mann–Whitney U test. c Compared to grade 5. d Compared to grade 1. ** p < 0.01, *** p < 0.001.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hiebert, E.H.; Pugh, A.; Kearns, D.M. The Presence and Progression of Rare Vocabulary in Texts Across Elementary Grades and Between Genres. Educ. Sci. 2024, 14, 1314. https://doi.org/10.3390/educsci14121314

AMA Style

Hiebert EH, Pugh A, Kearns DM. The Presence and Progression of Rare Vocabulary in Texts Across Elementary Grades and Between Genres. Education Sciences. 2024; 14(12):1314. https://doi.org/10.3390/educsci14121314

Chicago/Turabian Style

Hiebert, Elfrieda H., Alia Pugh, and Devin M. Kearns. 2024. "The Presence and Progression of Rare Vocabulary in Texts Across Elementary Grades and Between Genres" Education Sciences 14, no. 12: 1314. https://doi.org/10.3390/educsci14121314

APA Style

Hiebert, E. H., Pugh, A., & Kearns, D. M. (2024). The Presence and Progression of Rare Vocabulary in Texts Across Elementary Grades and Between Genres. Education Sciences, 14(12), 1314. https://doi.org/10.3390/educsci14121314

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop