1. Introduction and Theoretical Framework
Students’ science, technology, engineering, and mathematics (STEM)-related motivational beliefs have implications for their performance in individual courses as well as their long-term outcomes [
1,
2,
3,
4,
5]. In particular, students’ motivational beliefs are correlated with their goals and tend to predict student learning outcomes [
1,
2,
3,
4,
5,
6,
7]. There also tend to be differences between men’s and women’s motivational beliefs regarding physics, which have been linked to performance differences in physics courses [
4,
8,
9,
10,
11]. Here, we focus on two motivational factors: test anxiety and self-efficacy. Test anxiety can affect students’ test performance and is more likely to affect women [
12]. Self-efficacy in a given domain is a student’s belief in their ability to succeed at an activity or subject or complete a task [
13].
Test anxiety can impact students’ cognition, physical body, and behavior [
12,
14]. When they experience test anxiety, students’ cognitive resources are not entirely devoted to the assessment, but can be taken up by worry and intrusive thoughts of failure [
12]. Additionally, test anxiety can affect how students feel during an assessment. For example, they may experience a fast heartbeat or “butterflies in their stomach”. The behavioral aspect of test anxiety manifests in avoidance techniques, such as procrastination or interacting only with surface-level feedback after the exam (e.g., not examining mistakes closely to make a plan for future improvement) [
12,
15]. Other studies have found that test anxiety negatively affects student performance, especially on high-stakes assessments such as exams that are heavily weighted [
16,
17,
18]. In addition, women are more likely to report test anxiety than men [
12,
13], so understanding test anxiety and how to minimize its effect on student success is vital in creating positive and equitable learning environments.
Self-efficacy [
13,
19] has been linked to positive learning outcomes for physics students [
4,
5,
7,
11]. Self-efficacy of students in a particular domain can be enhanced through several mechanisms. One way is by overcoming difficulties, such as succeeding in a challenging homework assignment [
19]. Self-efficacy can also be formed through social means, for example, through observation of role models succeeding in the domain of interest or through receiving encouragement such that it allows for students to measure their success through personal improvement [
19]. The final mechanism is regulation of emotional states, such as management of anxiety [
19]. Women may have fewer role models due to under-representation of women in physics [
20,
21], and they are less likely to receive encouragement that they can succeed in physics from their instructors and peers [
22,
23]. Because high self-efficacy allows students to develop coping mechanisms that could reduce test anxiety, we hypothesize that students with high self-efficacy are also likely to have low test anxiety [
13].
In this research, we aim to investigate whether test anxiety and/or self-efficacy predict low- and high-stakes assessment outcomes for women and men in a novel context: a two-semester physics course sequence for bioscience students in which women outnumber men. Here, low-stakes assessments are those that individually make up a small portion of a student’s grade, such as homework. High-stakes assessments are individual assessments that make up a large portion of a student’s grade such as traditional exams [
16,
24,
25]. For example, while weekly homework assignments that add up to 10% of the total grade (with each of them counting toward less than one percent of the grade) are considered as low-stakes assessments, a heavily weighted final exam that individually accounts for 25% of the total grade is considered as a high-stakes assessment. Prior research in a physics course for physical science and engineering students and students in other STEM courses has shown that gender gaps are more prevalent in high-stakes than low-stakes assessments [
16,
24,
25]. This, combined with gender differences in physics self-efficacy and test anxiety among physical science and engineering students [
18], led us to hypothesize that physics test anxiety and self-efficacy may predict high-stakes but not low-stakes physics assessment performance and may more adversely affect women even in physics courses in which they outnumber men. Prior research has found that there is a relationship between test anxiety and high-stakes assessment grades in biology classrooms [
16]. Additionally, past research shows a relationship between self-efficacy, test anxiety, and high-stakes physics grades as well as gender differences disadvantaging women for students enrolled in introductory physics courses for engineering and physical science majors in which women are under-represented [
18]. Past research shows that in general, bioscience students differ from physical science and engineering majors in both interests and motivations [
26]. Therefore, it is very important to investigate how systemic these inequitable trends are across different contexts, e.g., for an entirely different student population within a given STEM discipline such as physics.
It is important to investigate whether the numerical under-representation of women in some physics courses is the main factor contributing to gender differences. For example, in physics, there are certain stereotypes about who can excel in the discipline and that one needs to be a genius to do well, which may disproportionately affect women and lead to greater anxiety than men even in physics courses in which they outnumber men. Thus, here we focus on these issues related to physics test anxiety and self-efficacy in high-stakes and low-stakes assessment for students who have never been the focus of this type of investigation in the past: female and male students in introductory physics for bioscience and health science-related majors where women are in the majority. Comparison of our findings, discussed in detail in the Results Section, with prior studies involving physical science and engineering students shows that although women outnumber men in physics courses for bioscience students and the career goals of bioscience students are very different from the earlier researched group, the trends hold even for this new population [
18,
27,
28]. Thus, the findings discussed here in the context of physics courses, involving bioscience students and other students with interest in health-related majors, are very important because they emphasize the deep-rooted nature of women being affected more adversely by anxiety in high-stakes assessments, which is a major impediment to creating equitable and inclusive learning environments.
Students pursuing bioscience and health science-related majors are generally required to take at least one physics course for their major (and many of them are required to take two physics courses). Women are not under-represented in these physics courses for bioscience and health science-related majors, but there may still be a gender gap in the motivational belief scores of students in the course. In particular, prior research has found that even in physics courses in which women are not under-represented, men tend to have higher grades and physics-specific motivational beliefs than women [
10,
29,
30,
31,
32,
33,
34,
35]. For example, women tend to have lower physics self-efficacy than men with the same grades in courses for engineering and physical science students as well as courses for students with interest in bioscience and health science-related professions [
9,
36].
Our goals for this research were to investigate the relationship between test anxiety and assessment outcomes in introductory physics courses, with a focus on gender differences in each construct, in this novel context of students in bioscience and health science-related majors. We hypothesized that test anxiety would predict high-stakes but not low-stakes assessment outcomes among students. We also hypothesized that despite women outnumbering men in these physics courses, women would still be more likely to experience higher levels of test anxiety partly due to the stereotype that physics requires innate genius and because women tend to stay away from disciplines linked to exceptional innate ability [
37]. We included self-efficacy in our investigation because the relationship between performance and self-efficacy is well documented in physics [
11,
38] and because management of anxiety is explicitly mentioned as a mechanism to enhance self-efficacy [
19].
Research Questions
With this background and goals in mind, we aim to answer the following research questions in the novel context of a two-semester physics course sequence for bioscience students and students interested in health professions:
- RQ1.
Are there gender differences in students’ prior preparation, self-efficacy, or test anxiety?
- RQ2.
Are there gender differences in students’ low- or high-stakes assessment scores?
- RQ3.
Are there gender differences in terms of low- or high-stakes assessment scores for students with the same self-efficacy and/or test anxiety?
2. Methodology
2.1. Participants and Procedures
This study took place at a large research university in the United States. Participants were students enrolled in a Physics 1 or 2 course for bioscience and other health-related majors. Intended majors for students in our sample, aside from biological sciences, include but are not limited to microbiology, neuroscience, molecular biology, chemistry with bioscience option, etc., where the common theme across these students is their interest in health-related professions. Previous research has shown that for those aiming for health-related careers and enrolled in science courses, there are various pathways that can lead to or away from their initially intended majors [
39]. Therefore, the many minor distinctions within these majors make it challenging to analyze them separately with sufficient statistical power. We will use the term “bioscience majors” throughout for all students since they comprise the majority of students in both Physics 1 and Physics 2 and both these courses are mandatory for them. Students can only advance to Physics 2 if they have at least a C grade in Physics 1, equivalent to a 2.0 out of a 4.0 point grade scale at this institution, with 3.0 corresponding to a B grade and 4.0 corresponding to an A grade. Some of the low-enrollment health-related majors only require Physics 1, so students may not take Physics 2 unless they want to take the MCAT exam for medical school admission.
The Physics 1 course primarily covered mechanics, though both thermodynamics and waves were also included. The Physics 2 course covered electricity and magnetism, geometrical optics, and physical optics. The courses included 2 weekly sessions of 75 min traditional lecture-based instruction led by the course instructors, along with smaller-sized 50 min recitation sessions taught by teaching assistants in which students worked collaboratively on solving physics problems. These courses did not include a laboratory component, although many of the students taking them end up taking laboratory courses as an elective later. The Physics 1 and Physics 2 student samples included sections taught by the same instructor, although the students were not necessarily the same across the two courses. This helped us ensure that no instructor-level effects went unaccounted for in our models. For both courses, midterm and final exams comprised 40% and 25% of the final course grade, respectively. We considered the midterm and final exams as high-stakes assessment as they were each highly weighted and altogether made up approximately two-thirds of the final course grade. Another 10% of the course grade was determined by homework problems which were assigned weekly for students to complete at their own time, and the remaining 25% were participation grades based on completeness for students responding to clicker questions in the lectures or their work in groups on problems in the recitation sessions. Generally, we consider homework and participation as low-stakes assessments as each of them individually take up a small portion of the total grade. For the analysis in this paper, however, we only used students’ homework grades as the low-stakes outcome, as those were the only low-stakes assessments that were graded for correctness and not completeness. Throughout this paper, we use the terms “test” and “exam” interchangeably. All the midterm exams and final exams for the Physics 1 and Physics 2 courses were supervised, timed, and in-person, and while we mainly associate the exams being high-stakes with the grading weight, we recognize that these elements can also contribute to the anxiety levels in students. Midterm and final exams were interspersed uniformly throughout the semester, with midterm exams held around the one-, two-, and three-month marks after the start of classes. All exams were mainly multiple choice, with 1–3 open-ended questions.
Students were given extra credit as an incentive for taking the survey. The surveys were given during the first and last week of classes in the mandatory teaching assistant-led recitations. We call the first and final data sets “pre” and “post”, respectively. In Physics 1, 204 students took the pre-test and 210 took the post-test. In Physics 2, 185 students took the pre-test and 89 took the post-test. The post-tests were taken in the final week of the classes, before students took their final exam.
For analysis, we included only students who successfully passed an attention check on the survey (a question that requested the students to select option “C”). Additionally, we included as many students as possible in each part of the analysis. For example, in a model that uses the average of one construct as well as students’ standardized test scores, we would exclude students who were missing SAT and ACT scores or were missing either pre- or post-survey results. One Physics 2 class section was not able to complete the post-survey and were missing post-test anxiety and self-efficacy data. This resulted in a smaller sample size for post-test anxiety and self-efficacy, but students in this section had statistically indistinguishable prior preparation, pre-motivational factors, and assessment outcomes from other students in the sample, so they were included in analysis where possible.
This research was carried out in accordance with the principles outlined in this institution’s Institutional Review Board ethical policy, and de-identified demographic data were provided through university records. For some variables, such as high school GPA, this approach allows us to rely on records that may be more accurate than students’ own recollection. However, it limits other measures such as student gender, for which students could only report either “male” or “female”. We acknowledge the potential limitations and harms that this method of data collection may cause [
40]. This institution recently began to implement more inclusive gender reporting methods for students, which are planned to be used once student samples are large enough to be meaningful for quantitative analysis. Demographic data indicated our Physics 1 sample was 67% women and our Physics 2 sample was 56% women. Students in Physics 1 identified with the following races/ethnicities: 66% White, 17% Asian, 7% African American/Black, 6% multiracial, 3% Hispanic/Latinx, and 1% unspecified. Students in Physics 2 identified with the following races/ethnicities: 63% White, 21% Asian, 6% African American/Black, 6% multiracial, 3% Hispanic/Latinx, and 1% unspecified.
2.2. Measures
2.2.1. Self-Efficacy and Test Anxiety
All test anxiety and self-efficacy survey items can be found in
Table 1. The test anxiety survey questions were adapted from the previously validated Motivated Strategies for Learning Questionnaire [
41,
42]. To ensure we were measuring domain-specific constructs, we explicitly mentioned physics in the survey items, as seen in
Table 1. For example, “I have an uneasy, upset feeling when I take an exam” became “I have an uneasy, upset feeling when I take a physics test”. Self-efficacy survey questions were constructed from other surveys and were previously validated [
43]. Test anxiety items were on a five-point Likert scale (1—Not at all true, 2—A little true, 3—Somewhat true, 4—Mostly true, 5—Completely true), and self-efficacy items were on a four-point Likert scale (1—NO!, 2—no, 3—yes, 4—YES!). All responses were placed on a 0-1 scale to account for multiple Likert scales. Higher scores on the test anxiety items indicate higher test anxiety levels; therefore, an ideal course outcome would be high self-efficacy and low test anxiety scores on the surveys. We note that since students take the pre-surveys in the first week of their classes, their responses are based on anticipation for the course and reflect any prior physics course experience they have had. We further validated the survey through twenty one-hour student interviews to ensure that students interpreted questions as intended. These validations were incorporated early in the design of the survey before we started using the survey at this institution and included students all the way from introductory to upper-level physics courses.
Additionally, we performed confirmatory factor analysis using the student sample in this study as a check for continued validity. For both the pre- and post-surveys, the Comparative Fit Index (CFI) and Tucker Lewis Index (TLI) were ≥0.90 [
44], and the Root Mean Square Error of Approximation (RMSEA) and Standardized Root Mean Square Residual (SRMR) were both ≤0.08 [
45], which can be seen in
Table A1 in the Appendix. Cronbach’s
was also consistently above 0.7, indicating good internal consistency across our measures [
46]. Standardized factor loadings were all above 0.5 [
44]. The square of the standardized factor loadings gives the percentage of variance of each observed variable that is explained by the latent variable, meaning that at least 25% of the variance in each of the survey items is explained by the respective construct.
2.2.2. Prior Academic Preparation
High school Grade Point Average (HS GPA) was reported using the weighted 0–5 scale, which is based on the standard 0 (failing)–4 (A) scale with adjustments for honors, Advanced Placement, and International Baccalaureate courses (these programs may offer a bonus as a reward for taking advanced courses, which can allow a GPA higher than 4.0). High school GPA is taken as a measure of general academic skills and generally is a strong predictor of early undergraduate course performance [
47].
Students’ Scholastic Achievement Test Math (SAT Math) scores are on a scale of 200–800 and are used as a predictor of performance on high-stakes assessments involving mathematical problem solving (e.g., physics exams) [
47,
48,
49]. If a student took the American College Testing (ACT) examination, we converted ACT to SAT scores [
50]. If a student took a test more than once, the school provided the highest section-level score for the SAT and the highest composite score for the ACT. If a student took both ACT and SAT tests, we used their SAT score.
2.2.3. Assessment Scores
Homework and exam grades were provided by the instructor and were de-identified by an honest broker before being included in analysis. If grades were not on a 0–100 scale, they were rescaled. For example, if homework was graded on a 10-point scale, all scores were multiplied by 10 for analysis.
2.3. Analysis
First, we report means and standard deviations of each variable separately for men and women. Next, to determine if there were sex differences in the means of self-efficacy, test anxiety, prior preparation, or assessment scores, we performed unpaired
t-tests to measure the statistical significance of the differences [
46] and Cohen’s
d to measure the size of the difference [
51]. Cohen’s
d is calculated using:
where
and
are the mean values of each group and
and
are the standard deviations of each group [
51]. Group one was women and group two was men. Cohen’s
d is considered small if
, medium if
, and large if
[
51]. We performed this analysis separately for Physics 1 and Physics 2 courses.
To explore the predictive relationships between test anxiety and assessment outcomes, we used multiple regression analysis. For each regression model, we report the standardized
coefficients, sample size, and adjusted R-squared. Standardized coefficients were used because they are in units of standard deviation and allow for direct comparison of effects [
52]. We initially used gender, SAT Math scores, and HS GPA as predictors for low- and high-stakes assessment scores. Here, low-stakes assessment scores are the students’ average homework grades. High-stakes assessment scores are weighted so that 75% of the category is midterm exam grades and 25% is the final exam grade. This weighting was performed because the instructor gave three midterm exams and one final exam.
After establishing baseline models, we introduced pre- or average test anxiety and self-efficacy as predictors. Average test anxiety/self-efficacy is the mean of pre- and post-scores and was used as a proxy for students’ test anxiety/self-efficacy while they were taking the course. For both courses, we introduced two models using pre-self-efficacy and pre-test anxiety scores and four models using the average scores. The first two models predicted high-stakes assessment scores using both or neither of the constructs, while the third and fourth models used either self-efficacy or test anxiety in addition to the baseline predictors. During regression analysis, we used combined assessment categories (e.g., low- and high-stakes assessments), but results were similar when the categories were separated. For example, the regression models predicting high-stakes assessment scores were similar to both the models predicting midterm exam grades and those predicting final exam grades.
4. Conclusions, Limitations, and Future Directions
We have added to the research in this field by exploring gender differences in and the predictive power of self-efficacy and test anxiety for low- and high-stakes assessments for physics courses in which women are in the majority. While past research has identified gender differences in these factors and their relationship with low- and high-stakes assessment outcomes in courses where women are in the minority [
18], our results here demonstrate that the inequitable patterns hold even when women outnumber men. Using validated survey data and grade information from students in a two-semester introductory Physics course sequence for bioscience majors and other majors with students interested in health-related professions, we compared the predictive power of self-efficacy and test anxiety on female and male students’ performance on both low- and high-stakes assessments. We found that there are gender differences disadvantaging women in self-efficacy, test anxiety, and high-stakes (but not low-stakes) assessment outcomes in Physics 1. We also found that self-efficacy and/or test anxiety predicted only high-stakes assessment outcomes in both Physics 1 and Physics 2. Comparison of these findings in a novel context with prior studies involving physical science and engineering students shows that although women outnumber men in physics courses for bioscience students and the career goals of bioscience students are very different from those of physical science and engineering majors, most of the adverse trends are similar even for this new population. Therefore, these findings highlight the systemic nature of women being more adversely affected by anxiety in high-stakes assessments and the need for creating more equitable learning environments.
Instructors can help decrease test anxiety and its adverse impact on student performance directly by lowering the emphasis on high-stakes assessments and increasing the emphasis on low-stakes assessments in their courses. The careful choice of assessment tools can help create a more equitable classroom environment by minimizing fears that come with test anxiety related to high-stakes assessments (e.g., receiving a low course grade due to one bad exam score which counts for a significant portion of the course grade). Moreover, frequent low-stakes assessments can also give students many attempts to practice test taking and encourage spaced practice, which is more effective for the retention of knowledge and skill development than “cramming” before an exam [
59]. We recognize that instructors may want to include opportunities to assess students’ cumulative learning in a course. In particular, since physics is a hierarchical discipline, it may be useful to give students incentives and support to organize their knowledge hierarchically so that they focus on discerning the connections between the concepts in different chapters. However, these types of cumulative assessments can be made lower-stakes by offering more of them in the course and making each count towards less of a student’s grade [
60]. Additionally, implementing a range of formative assessments (such as clicker questions, homework, tutorials, projects, and other types of assessments), each of which do not count for a very large portion of a students’ course grade, can help students develop a wider variety of skills without increasing their anxiety. In particular, providing students with these types of supports via different types of frequent formative assessments can help students develop the desired knowledge structure and skills including the ability to communicate science better based upon the goals of the course in a low-anxiety and equitable learning environment.
In addition to recommending the instructors to incorporate more frequent low-stakes assessments to reduce student anxiety and boost self-efficacy, rather than a few heavily weighted high-stakes exams, we also suggest the implementation of research-based interventions and activities designed to promote equity in college courses. An important implication is that the course instructors should carefully contemplate how to foster equitable, inclusive, and low-anxiety learning environments and examine the role of high-stakes and low-stakes assessments in grading policies. The course instructors have a central role to play in creating such an environment in which student anxiety is reduced during learning and assessment and self-efficacy is improved [
61]. For example, prior research suggests that when instructors emphasize at the beginning of their courses through short activities that most students struggle in learning to solve challenging physics problems, that struggle is the stepping stone to learning physics, and that students should embrace them, while talking about their own struggles when they were in similar courses, the gender gap in performance is eliminated [
62]. When instructors implement these types of activities, they are also encouraged to reflect on their own mindset about whether all students can excel in their courses, which is important because instructors with growth mindset about their students’ potential have significantly smaller performance differences between traditionally marginalized and dominant demographic groups in STEM courses, and students from traditionally marginalized groups report greater motivation in those courses [
63].
One limitation of this investigation with regard to RQ3 is that our findings are correlational in nature and a correlation does not imply causation. Another limitation related to generalizability is that the research was carried out at a large research university in the US and results may not apply to other types of US institutions, e.g., two-year or four-year liberal arts colleges or higher education institutions in other countries. Moreover, the institution where this study was carried out is a predominantly White institution, and these findings may not apply to minority-serving institutions. Another limitation relates to the fact that the investigation only focused on gender differences and did not focus on other facets of students’ identities, e.g., based upon their race/ethnicity, first-generation college status, low-income status, etc., and the intersectionality of different identities and how they impact test anxiety and self-efficacy.
Future studies could focus on investigating similar issues at other types of institutions as well as for different student demographic groups and intersecting identities. It would also be valuable to conduct individual interviews or focus group discussions with students from various demographic groups because qualitative investigations can complement quantitative study discussed here and shed light on the mechanisms, e.g., for the observed gender differences.