1. Introduction
Much has been written and debated about business school rankings over the past several years, acknowledging their limitations and offering suggestions for improvement. In particular, the metrics used by traditional rankings have been found wanting. Perverse effects on faculty and decanal priorities have been identified, including incenting behaviors that are at odds with the achievement of the United Nations Sustainable Development Goals (SDGs) [
1,
2] and the 2030 agenda. In response to this recognition, a new rating system was launched in 2019 known as the Positive Impact Rating (PIR) [
3]. The PIR uniquely centers on students’ perceptions of their business school’s governance and culture, as well as the extent to which programs and learning methods have prepared them to pursue careers of purpose. The outcomes have also been uniquely tallied and presented, with business schools arranged into one of five tiers (from beginning to pioneering) as opposed to being ranked one against the other. This rating approach was intended to help foster collaboration and continuous improvement among business schools as opposed to competition.
According to its founders, the purpose of the PIR is to “help speed up the transformation” towards orienting “teaching, research and outreach activities towards social impact and sustainability” [
3] (p. 6). In order for the PIR to achieve its goals, including becoming broadly perceived as a reliable and valid assessment instrument, it is essential that its metrics and approach be held to a high standard, and further, that its various dimensions (currently energizing, educating, and engaging) are statistically supported.
This paper reports on the results of a study conducted at one business school, the Gordon S. Lang School of Business and Economics at the University of Guelph, Canada, during the 2020/21 academic year. The PIR scale as a benchmark in theory realizes the importance of business schools and their role in achieving the 17 SDGs [
4] and therefore was selected as an aligned measurement tool for Lang given its Principles of Responsible Management Education (PRME) Champion School status. A PRME champion school is recognized by the United Nations Global Compact as leaders with a commitment to responsible business education. The scale was designed to assess student perceptions of their business school’s positive impact on society across several factors, including school governance, culture, programs, learning methods, student support, the institution as a role model, and its level of public engagement [
3]. As a potential measurement tool that holds promise to overcome the perverse effects created by traditional business school ranking systems, it is essential that the metrics used be—and be perceived as—both reliable and valid. To this end, this study analyzes the efficacy of the PIR scale empirically to test the following null hypotheses:
Hypothesis 1 (H1). The PIR is a reliable instrument as estimated by the coefficient alpha (Cronbach’s alpha).
Hypothesis 2 (H2). The confirmatory factor analysis supports the PIR as a valid instrument. A relationship exists between the observed responses and their underlying latent constructs; specifically, the responses to the survey support the model construct (Energizing, Engaging, Educating).
Hypothesis 3 (H3). There is a selection bias in channeling the data collection for the PIR through student organizations engaged in the sustainability field.
Hypothesis 4 (H4). There is a selection bias in collecting PIR data from students in courses linked to sustainability.
Hypothesis 5 (H5). Demographics or socio-cultural characteristics of the student influence PIR responses.
Our paper begins in
Section 2 with a brief synopsis of two events, which focused on the shortcomings of traditional business school rankings, held in Davos, during the 2019 World Economic Forum (WEF). In
Section 3, the observations made at WEF are compared with support from the extant literature. Next in
Section 4, we summarize our methods and results. Our discussion in
Section 5 centers on the need to ensure that the measurement criteria for a chosen rating system support goals that enable a business school to achieve its vision and mission. In doing so, when used effectively as a recruitment tool, they could help attract students whose goals and values are aligned, strengthening organizational culture and further supporting the school’s ability to achieve its objectives. It is suggested that recruiters of recent graduates understand explicitly the criteria used to construct rankings and recognize that a graduate from a ‘top-ranked business school’ may not have the qualities assumed. In the spirit of future enhancements to this promising scale, we made the following recommendations: (1) a re-evaluation of the survey questions that informed the three separate elements (energizing, educating, and engaging) to ensure that these questions are measuring three distinct and separate aspects of a school’s positive societal impact; (2) a deliberate and broad-based distribution method for the survey to ensure participation by both highly engaged and less engaged students; and (3) an additional case study at a different school whose mission does not include the achievement of the 17 SDGs in order to compare results based on contrasting goals and student demographics to further confirm its reliability. Finally, we discuss the contributions to the literature and the efficacy of the PIR scale, the implications for business practitioners and business school leaders, the limitations to the current study, and future research enhancements.
2. Challenges Leveled at Traditional Business School Rankings
During the 2019 WEF, an event on business school rankings was convened at the invitation of Corporate Knights and the UN Global Compact (and supported by the Globally Responsible Leadership Initiative (GRLI) [
5]). Here, business school deans of PRME alongside deans and business leaders recognized by Corporate Knights, as globally leading sustainability champions, considered the need for significant change in business school rankings.
The event began with a presentation by the authors of the report ‘Business School Rankings for the 21st Century’ [
2]. Based on an extensive literature review and focus groups with key stakeholders, Pitt-Watson and Quigley [
2] suggested that business schools exert tremendous influence on society (through the knowledge and actions of their graduates). The priorities and actions of business schools, in turn, “appear to be greatly influenced by business school rankings” [
2] (p. 23). Yet, the metrics of business school rankings do not appear to be in alignment with the needs of society, including the development of “a sustainable, inclusive 21st century economy” [
2] (p. 23). More specifically, Pitt-Watson and Quigley concluded that the metrics used by traditional business school rankings fall short across several domains, including [
2] (p. 23) “(a) salary overemphasis; (b) business schools penalized in the rankings for turning out graduates who work for non-profits; (c) course content not evaluated; and (d) teaching quality, sustainability and business ethics minimized or absent.” They concluded with a call for the re-imagination of rankings to “encourage these institutions to educate managers equipped to address the challenges of this era” [
2] (p. 23).
Next, was a presentation by Katrin Muff on the newly developed Positive Impact Rating (PIR), scheduled for release in 2020. Muff explained how the design of the PIR was intended to respond to the Pitt-Watson and Quigley [
2] critique, with its focus on the perceptions of current students, with respect to “the quality and nature of their learning experience and the culture of the business schools in which they study” [
5] (p. 1). Schools in attendance were encouraged to participate in the PIR and join the celebration of its release during the World Economic Forum in 2020.
Following this, discussion groups considered three questions: Why do rankings matter and to whom? What is most unfortunate about current rankings? And what is the desired future state? A synthesis of the perceptions of the participants suggested that:
“[R]anking systems (with notable exceptions–such as Corporate Knights) have had perverse (unintended) consequences on the focus of faculty research, curricular and pedagogical innovation, and the student experience (particularly for undergraduate students). Driven by the desire to be well-ranked (with the concomitant rewards that such rankings engender–such as significantly enhanced brand and credibility amongst potential donors, faculty, students and senior university administrators), business schools have been strongly incented to “play the game” and engineer results, particularly in the areas of student salaries and faculty research.”
Other observations included that business school rankings inordinately focus on MBA programs, which can deprive large undergraduate programs of needed attention and resources. Additionally, publication lists, such as the fifty journals included in the Financial Times ranking (the FT50), can influence who gets hired, as well as promotion and tenure decisions. The problem with the latter was underscored by Dyllick [
6], who reported that journal rankings such as the FT50 contain considerable bias, privileging English speakers from Europe and North America who provide disciplinary-based explanations of past development, as opposed to addressing pressing societal issues, including in interdisciplinary ways.
During the 2020 WEF, a second Deans Multi-Stakeholder Dialogue at Davos took place. That year’s event featured the launch of the PIR (with deans from top-rated schools in attendance), a panel discussion with a representative from Corporate Knights, the PIR, and the Financial Times. These group discussions centered around three key topics: (1) participant reaction to the PIR; (2) perceptions of changes to other rankings that are underway; and (3) creating a wish-list for further change [
7].
Findings from the discussions suggested that there was broad support for the PIR and its focus on student perceptions, as well as its five rating bands (beginning, emerging, progressing, transforming, and pioneering schools). Participants supported the potential for this approach to help foster collaboration amongst the rated schools. Some concern was also expressed about the potential replicability of the results, given a relatively low bar for response rates (i.e., a minimum of 30 responses) and the method by which the survey was promoted to potential participants (via Oikos International, Net Impact, and local student leaders). The suggestion was made that future iterations should endeavor to ensure a demographically diverse group of students, from multiple programs and year levels, from those in leadership positions and otherwise, to enhance the reliability of the results.
Observations and recommendations for improving traditional rankings included making them “more equitable and inclusive”, “embracing continuous improvement”, “valuing teaching and learning”, valuing “emerging inter-disciplinary journals and more accessible forms for research and dissemination”, and developing “mechanisms for reporting on contributions to the UN’s 2030 agenda” [
8] (p. 3).
4. Methodology
To collect data for this study, undergraduate and graduate business students from a public Canadian university (Gordon S. Lang School of Business and Economics) completed a questionnaire to test for possible selection biases resulting from the way the PIR rating responses were collected and to see how these responses were influenced by certain student demographic and socio-cultural factors. To ensure that the framing sample was representative of the target audience (Lang undergraduate and graduate students), students were recruited through various club associations and identified classes to include all cohorts and students with different extracurricular interests. Clubs and classes chosen for the study were identified as either sustainability-focused or not. The questionnaire was first distributed in the fall of 2020 and was then distributed a second time in winter 2021. In both cases, face-to-face classes and on-campus study were restricted due to COVID-19. A mobile-friendly application of the survey was added to leverage the ubiquity of smartphones to better access students in remote locations and to simplify the survey completion task. Recognizing the fatigue aspect associated with a continuous lockdown order, we coded questionnaires to capture this difference in timing (i.e., fall and winter). We highlight the potential impacts of COVID-19 on our survey in the discussion section.
The questionnaire consisted of 64 questions including twenty questions that form the PIR rating, assessing how students perceive their school‘s current commitment to creating a positive impact, twenty-three socio-cultural and demographic questions, eleven attitude and behavior questions to establish their sustainability attitudes and behavior score, eight political questions to establish their political leaning score, and two overall satisfaction with their academic journey questions. Three treatments were conducted; the first treatment placed the PIR survey questions first and the second placed the PIR questions second to test whether priming or framing effects from the other questions would influence the score. Both treatments were conducted in the fall of 2020. The third treatment had the PIR questions second (same as treatment 2), but was executed in the winter of 2021. The electronic questionnaire took approximately 40 min to complete. A subset of the questions from the survey is attached in
Appendix A.
There are three parts to this analysis. In Part 1, we estimate the reliability of the PIR scale using the coefficient alpha (also known as Cronbach’s alpha) using R statistical programming language and the ‘psy’ package. Cronbach’s alpha [
48] measures the average covariance between the items in the scale, where the higher the coefficient alpha the more consistent the measure of the concepts that form the scale. Knekta et al. [
49] recommended Cronbach’s alpha to estimate reliability but not to estimate the validity for an instrument with a single distribution. Although a high reliability is necessary to make valid interpretations, it is not sufficient as a validity measure, as it does not test for the dimensionality of a scale. To test for the dimensionality of a scale, a factor analysis is recommended.
Therefore, in Part 2 of the analysis, a confirmatory factor analysis (CFA) was conducted to verify the factor structure of the 20-observed responses that informed the PIR scale. A CFA was chosen as it measures a model that is hypotheses-driven. Specifically, for the PIR, the researchers pre-specified all aspects of the model. Questions in the survey were divided into three areas, and these were further divided into seven dimensions. Yong and Pearce [
50] and Kline [
51] recommended the use of CFA for questionnaires that exceed 4 items and that have at least three or more items for each subcategory identified. In our case, the PIR has 20 items, and there are at least three items associated with each of the three subcategories of engaging, energizing, and educating. Although larger sample sizes are generally better to increase the statistical power of the result [
52], minimum sample sizes of 100 to 200 can be utilized, as long as the free parameters are less than the known values, i.e., the model is over-identified [
53].
The CFA tested the hypothesis that a relationship exists between the observed responses and their underlying latent constructs; specifically, the area of categories and dimensions. R statistical programming language and the ‘lavaan’ package were used to perform the CFA. A maximum likelihood estimation was chosen given normally distributed data. A covariance matrix explored the psychometric properties of the 20-item PIR survey. To determine model fit, we report the chi-square value, comparative fit index (CFI), the Tucker–Lewis fit index (TLI), and the root mean square error of approximation (RMSEA), where a CFI ≥ 0.90, TLI ≥ 0.95, and RMSEA < 0.08 would indicate a good fit.
In Part 3, we conducted a bivariate (OLS) statistical analysis to understand the causes of the observed PIR scores. A bivariate (OLS) versus multivariate statistical model was implemented even though there is a trend in multivariant analysis in this area [
54,
55], as the bivariate OLS model was a best fit given the theory and the research design for this study. Specifically, the model satisfied the 7 assumptions necessary for a linear regression to give a valid result; the dependent variable was continuous; the independent variables were categorical; there was a linear relationship between the dependent and independent variables; there were no significant outliers; there was independence of observations; the data were homoscedastic; and the residuals (errors) of the regression line were approximately normally distributed. The dependent and explanatory variables are described below.
4.1. Dependent Variables
The overall PIR score and scores of three subcategories of the PIR scale (energizing, educating, and engaging) were calculated following the original methodology in the PIR report. Specifically, the arithmetic average over 20 PIR questions for each participant was defined as the overall PIR score. Three sub-scores were calculated as the arithmetic average over related PIR questions.
Table 1 reports the summary statistics for the aforementioned four PIR scores. The average PIR score was 7.43, positioning the Lang Business School as a transforming (7.4–8.7) business school on the scale’s tiered rating system. The scores for energizing and engaging subcategories were also transforming, with scores of 7.65 and 7.45, respectively. The educating subcategory scores were positioned lower at 7.29, placing Lang on the progressing (5.9–7.3) tier within the tiered system (see
Appendix B, for details on PIR tiers)
Treatment 3 (winter 2021) was not significantly different from either Treatment 1 or 2 (fall 2020). As such, Treatment 3 and 2 were collapsed, as they both had the PIR questions positioned second in the survey.
Table 2 reports the two-sided
t-test results on the equality of PIR and three subcategory scores from the two treatments.
p-values greater than 10% suggested that there are no statistically significant differences between the two treatments.
4.2. Explanatory Variables
Thirteen (13) explanatory variables were constructed from the 44-question survey to test the influences of survey design, survey distribution methods, and student demographic and, socio-cultural factors on PIR scores. A subset of variables were direct response categorical variables. These included a course that requested that you take the survey, whether you belonged to a student club or not and if so whether the club had a sustainability focus, self-identified faith affiliation, overall satisfaction with their academic experience, gender, subject discipline, and co-op status.
An additional three explanatory variables were constructed indirectly based on a series of questions. In the first step, a continuous index is constructed (see details below). In the second step, a binary variable is constructed based on whether the score is below or above the median score among all participants.
4.2.1. Political Leaning Index (Political)
This index was based on 8 questions from a pre-existing PEW survey, where a lower index value would suggest a more liberal-leaning (left-leaning) and a higher index value would suggest a more conservative-leaning (right-leaning) political view. As the political orientation of students was not significant with a low standard deviation, we were interested in understanding whether being relatively left-leaning versus relatively right-leaning influenced the score. To this end, we constructed a binary variable with two levels with the reference point as left-leaning (see questions 1–8,
Appendix A)
4.2.2. Attitudes toward Sustainability and the Environment (Envir Belief)
An index score ranging from 1 to 5 was constructed per subject based on four questions related to their attitudes toward sustainability and the environment. The higher score indicated that the participant was more sustainability concerned. This index had similar results as the political leaning index score, leading us to the construction of a binary variable with two levels with the reference point as ‘lower sustainability attitude’ (see questions 11, 15, and 16,
Appendix A)
4.2.3. Consumption and Purchase Behavior (Consum)
A binary variable was created based on a series of four questions asking students about their consumption and purchase behavior. Zero was assigned to selected responses if the purchasing and intended consumption behavior did not comply with sustainability concerns (see questions 9 and 12–14,
Appendix A).
5. Results
In total, we collected 156 PIR responses usable for reliability and validity testing and 143 usable surveys to conduct the bi-variant analysis.
Table 3 reports the summary statistics for the political leaning score (political), attitudes toward sustainability and environment score (env_belief), and the consumption and purchase behavior score (consum). The mean score of 0.30 indicated that participants at Lang were politically left-leaning, aligning more closely to liberal government policies. The high mean of 4.17 out of a possible score of 5 indicated a positive attitude toward sustainable business practices and the environment. Conversely, the mean score of 0.42, below the median of 0.50, indicated an intended consumption and purchase behavior marginally away from environmentally sustainable products.
Table 4 reports the descriptive statistics for PIR scores (mean and standard deviation). The majority of participants identified as female (65%), were under-graduates (77%), were in a non-co-op program (63%), did not belong to a club (75%), identified as no faith (45%), and had a ‘relatively’ left (liberal) political leaning (57%). Close to 49% of participants were recruited from a course with a sustainability focus. With the exception of the high number of females (38% versus 65%), the sample was representative of the target population at Lang [
56].
All undergraduate participants were registered in one of 11 subprograms housed within the Bachelor of Commerce program. All graduate participants were registered as Lang Master’s students.
Table 5 reports the average PIR score for participants from different programs. The box plots in
Figure 1 show the distribution of scores by academic program.
5.1. Coefficient Alpha Estimate
The Cronbach alpha calculated the average correlation between all 20 items contained in the survey. The coefficient alpha (αα) was 0.949. In the absence of the same students repeating the survey multiple times, this high correlation provides a good measure of the scales’ ability to provide consistent results (reliability). Thus, there is support for Hypothesis 1. The PIR is a reliable instrument as estimated by the coefficient alpha (Cronbach’s alpha).
5.2. Confirmatory Factor Analysis
The CFA was first conducted using the latent variables (on the left in
Table 6 and
Table 7) comprised of the indicators (observed variables on the right in
Table 6 and
Table 7). These are the areas and dimensions with the associated questions as selected by the original creators of the PIR scale. The three model fit criteria for the CFA and coefficients can be found in
Table 6. The chi-square value (
p-value) = 0.00, the comparative fit index (CFI) = 0.835, the Tucker–Lewis fit index (TLI) = 0.812, and the RMSEA = 0.114 (see
Table 8) The chi-square result rejected the null hypothesis that the model fits the data. The CFI, TLI, and RMSEA values also indicated a poor fit between the model constructs and the observed data.
Next, we conducted a CFA using the seven dimensions of governance, culture, programs, learning methods, student support, the institution as a role model, and public engagement to determine whether this led to a better fitting model. The results of the second analysis can be found in
Table 7 and
Table 8. The chi-square (
p-value) = 0.00, the comparative fit index (CFI) = 0.869, the Tucker–Lewis fit index (TLI) = 0.833, and the RMSEA = 0.107. These values again indicated a poor fit between the model constructs and the observed data.
These two analyses indicated that the original categorized survey questions may not be gathering the correct information to measure that pre-specified theme. Using the covariance matrix that explored the psychometric properties of the 20-item PIR scale (see
Table 9), we constructed a new model by placing the responses with the highest covariances together to see if new latent variables emerged that could better explain the data. Specifically, we investigated whether the stronger covariance among items was potentially due to one common single factor. The covariance matrix informed a four-factor model (see
Table 8 and
Table 10). The chi-square value (
p-value) = 0.00, the comparative fit index (CFI) = 0.862, the Tucker–Lewis fit index (TLI) = 0.831, and the RMSEA = 0.119, which again indicated a poor fit. Thus, Hypothesis 2 is rejected. The confirmatory factor analysis does not support the PIR as a valid instrument.
5.3. OLS Regression Analysis
OLS regressions with no interaction terms (
Table 11) and with interaction terms (
Table 12), with either the continuous PIR Score or the three PIR subdimension scores as the dependent variable, were explored to test if there is a selection bias in channeling the data collection for the PIR through student organizations engaged in the sustainability field, from students in courses linked to sustainability, and other demographic or socio-cultural characteristics.
The benchmark model is Model (1) from
Table 11. The results of a multiple linear regression showed a collective significant effect of all the independent variables, F (14, 128) = 3.196, and R
2 = 0.259. Specifically, 25.9% of the variance was explained by the model. Sustainability-focused courses (β = 0.625,
t = 2.199,
p = 0.030), academic evaluation above expectations (β = 0.702,
t = 3.113,
p = 0.003), and attitudes toward the environment (β = 0.409,
t = 1.770,
p = 0.080) were positive and significant in the model, while identifying with no faith (β = −0.426,
t = −1.978,
p = 0.051), academic evaluation below expectations (β = −1.072,
t = −2.477,
p = 0.015), and consumption behavior (β = −0.494,
t = −1.813,
p = 0.073) were negative and significant in the model. Students who were requested to complete the survey within a course that taught sustainability topics, students who rated their academic experience as exceeding expectations, and students who had a positive attitude toward the environment had higher PIR scores. Conversely, students who identified with ‘no faith’, students who had an academic evaluation below expectations, and students who identified with lower eco-conscious consumption and purchase behavior had lower PIR scores.
To study the effect of the explanatory variables on the three subdimensions of the PIR system, three more OLS regressions were run with the three PIR subdimension scores as the dependent variables. The general effect of the explanatory variables on these were similar to that on the general PIR, with several significant differences. Firstly, whether participants were from sustainability-focused courses had no significant effect on the energizing dimension (β = 0.383, t = 1.348, p = 0.181). On the other hand, the energizing dimension was the only one that was significantly affected by whether the participants were from co-op programs (β = 0.469, t = 1.946, p = 0.054). Secondly, faith and sustainability attitudes had no significant effect on educating. Thirdly, club membership had a significant and positive influence on the engaging score (β = 0.789, t = 2.038, p = 0.044).
In
Table 12, the interaction term between political leaning and environmental belief was negative and significant, except for Model (3). On average, participants with a political vision leaning to the right (aligned with conservative policies) and a more sustainability-focused environmental belief would have a significantly lower PIR score (β = −0.777,
t = −1.744,
p = 0.084). The magnitude of the influence from this interactive term on the energizing dimension (β = −0.815,
t = −1.832,
p = 0.070) and the engaging dimension (β = 0.915,
t = −1.767,
p = 0.080) was similar. The effect on the educating dimension was negative but not statistically significant (β = 0.699,
t = −1.416,
p = 0.160).
Hypothesis 3 was rejected; there was no selection bias in channeling the data collection for the PIR through student organizations engaged in the sustainability field. There was support for Hypotheses 4 and 5. There was a selection bias in collecting PIR data from students in courses linked to sustainability (H4). Demographics and socio-cultural characteristics of the student influenced PIR responses (H5).
6. Discussion
The PIR scale is a promising scale for selection by schools like the Lang School of Business who ‘are committed to using business as a force for good to achieve the United Nation’s SDGs’ [
56]. In addition to providing a benchmark for a business school’s performance as perceived by students, arguably its most important stakeholder, it is a tool that could help ‘attract students and faculty who have a social conscience, an environmental sensibility, and a commitment to community involvement’ [
56]. Building an organizational culture that is aligned with the mission, vision, and core values of the institution is critical to achieving an organization’s intended goals. Given the perverse effects that traditional published ranking scales can cause, careful consideration is needed to ensure alignment. Confirming the reliability and validity of any chosen scale is essential. The PIR provides transparency in both the criteria used and methodologies employed. The creators are committed to developing a scale that helps the business community (including the academic community) realize the role it plays in ensuring a sustainable future for all stakeholders.
Given the power of published ratings and the intention of the PIR scale, we identify areas for consideration and improvement toward a statistically robust PIR scale and an execution strategy for the survey that could help mitigate unintended biases.
Firstly, the survey’s coefficient alpha is high, indicating the scale’s reliability, yet the CFA analysis pointed to an invalid model. A survey’s coefficient alpha can be high and yet the survey instrument could still not be measuring what the researcher intended to measure [
57,
58,
59,
60]. In this case, the CFA analysis revealed that all questions that informed the survey were highly interrelated. Specifically, the observed responses for the latent variables (i.e., energizing, educating, and engaging as well as the seven dimensions) were too interconnected and were not separate enough to clearly measure three distinct themes or seven separate dimensions.
Table 13 and
Table 14 highlight the high correlations between the variables.
This inter-relatability between all questions suggested that a better fit model could be a one-factor model with 20 indicators. However, the CFA results for the one-factor model indicated a poor fit (chi-square = 0.00, CFI = 0.81, TLI = 0.78, RMSEA = 0.12), suggesting room for improvement.
However, the results of these CFA analyses are contestable on two fronts. First, the seven-factor CFA analysis did not meet the specifications for CFA analysis. Specifically, for some of the seven categories, there only two questions, and it is recommended that more than two questions are needed for each identified theme for patterns to occur. Second, although a CFA analysis is applicable to small samples (156) where the free parameters are less than the known values (over-identified), CFA and general class structural equation models are large sample techniques. Therefore, the larger the sample the better. Kline [
51] recommends the N:q rule, specifically, that the sample size should be determined by the number of q parameters in the model and that rule should be 20:1. In our example, this would suggest a more valid sample size of approximately 1200.
To this end, our first recommendation is to conduct a CFA analysis with a larger data set to corroborate these initial findings. If the subsequent CFA shows similar results to the ones found in this study, we recommend a revision of the survey questions ensuring that the questions associated with each identified theme have a high covariance within each category and a lower covariance between the selected categories, indicating the measurement of distinct themes or concepts. Distinct themes help inform the participating institution on explicit areas to focus on for improvement. Additionally, if the CFI, TLI, and RMSEA results from the larger data set fail to reject the null, indicating it is not a bad model, we still cannot necessarily say it is the best model. Therefore, using the larger data set, we would further recommend testing for other latent constructs that may have emerged. Further, we recommend an exploratory factor analysis (EFA) or a clustering algorithm from an unsupervised machine learning technique to explore patterns underlying the data set to help develop new theories and identify items that do not empirically belong and should consequently be removed from the survey.
Survey distribution methods and socio-cultural factors influenced student PIR scores. Survey distribution was not completely randomized. A subset of faculty was selected who would be willing to request students to complete the survey, and leaders of extra-curricular school-sanctioned clubs were asked to distribute the survey to their members. Students who were requested to take the survey through a class that taught sustainability and/or corporate responsibility topics had significantly higher PIR scores versus students who were asked by a professor of a course that did not teach these topics.
Students who evaluated their academic experience at Lang as ‘exceeding expectations’ had a higher PIR score than students who rated their experience as ‘meets’ or ‘below’ expectations. Although belonging to a student club was not significant, students who belonged to a club were more likely to select that their academic experience ‘exceeds expectations.’ Interestingly, this observation suggests that if the published rating attracts students who are aligned with the goals of the institution, and the institution does not live up to the student expectations, then the subsequent scores will be lower. The best way forward, therefore, is to have a high rating that is a true representation of the student experience, as this will lead to subsequent high ratings. These initial findings suggest that the student-driven survey, properly disseminated, has a built-in mechanism toward continuous improvement.
The significant influence of the survey response ‘academic experience exceeded expectation’ on the PIR score and the correlation of this factor with students belonging to clubs require further unpacking. The dominant theoretical framework in general education literature suggests that extracurricular activity (ECA) (i.e., belonging to a club) has a positive impact on academic performance [
61]. This literature indirectly connects higher academic performance with higher PIR scores. Establishing a direct and causal relationship between these two variables, in particular, that a higher PIR score signals higher academic performance by the students, could provide further benefits for schools who wish to participate in the rating and for employment recruiters of graduate students.
This study also tested explicitly for priming effects. In one survey treatment, socio-cultural, attitudinal, and political views were asked first before the PIR survey questions, and in the second treatment, these questions were asked in reverse. Although there was no significant difference in PIR scores between the two treatments, we cannot rule out a priming effect for students who were asked by a course instructor who teaches sustainability topics. Considerable experiments have shown how priming effects influence the behaviors of individuals [
62,
63,
64,
65].
Questions were included in the survey to identify faith affiliation, sustainable purchase and consumption behavior, and political orientation. These questions were included to understand the influence of pre-established North American values on North American business school PIR scores. It is important for subsequent studies that wish to test the influence of pre-established values that the questions change to reflect the situational context of the different geographic/political social environments in which the study is executed. At Lang (Guelph, ON, Canada), students were mainly left-leaning (liberal), and political orientation had no impact on PIR scores. However, those that identified with ‘no faith’ had a lower PIR than students who identified with faith. Students with higher environmental beliefs in terms of consumption and purchase behavior also had a higher PIR score. Literature has shown that sociocultural attributes could lead to biased results of surveys [
66,
67,
68,
69]. Although sociocultural differences are assumed in research involving humans, the results can be interpreted wrongly if there is no comparability [
66].
One idea for consideration given these results is to include a set of pre-established value questions (non-political, non-religion-based) in the PIR that assess the organizational culture (OC) of the student body. For example, ‘students in my program are more collaborative than competitive.’ Not only does this allow the institutions to test the alignment of the OC with its core values, but it also allows students to identify a school more closely aligned with their own values. This criteria for selection could continuously build student bench strength that allows a business school to deliver against its aligned vision.
6.1. Contributions to Knowledge and Literature
This study is phase one of a broader study toward uncovering embedded biases and incongruencies in methodological data collection procedures within business school ratings and rankings. While past research has been focused on uncovering embedded biases and methodological flaws in what many have described as ‘rankings that cause perverse effects’, for this first study, we chose a proactive approach by focusing on a new scale that is representative of student school experiences and a school’s ability to have a positive impact on society beyond contributing to an organization’s profits and GDP of a nation.
The scale was developed by academics, business school administrators, and students through a ‘multi-step proto-typing process’ [
3] (p. 6). The PIR was first executed in 2019 to 2500 students, and the results of the survey were published in 2020 [
3]. In its first execution, student organizations such as Oikos International, Net Impact, AIESEC, the SOS (Students Organising for Sustainability, in the UK), and Studenten voor Morgen (in The Netherlands) solicited students to complete the questionnaire [
3]. Dyllick and Muff ‘raised the question of selection bias’ given that the student organizations executing the survey “are mainly active in the sustainability area” [
3] (p. 9). They further highlight that “one of the participating schools has decided to do a controlled study of this question, based on their PIR 2020 data” [
3] (p. 9). As the identified business school, we commend the PIR organization for inviting such a rigorous arms-length assessment and would encourage other ranking scales to follow suit. While we did not find selection biases from the execution of the study through student organizations, we did find that students participating in a sustainability-focused course, when asked by their instructor, had statistically significant higher PIR scores. The power influence of instructors may have played a role here; however, this would require further analysis. The reliability and validity tests, although not conclusive, the selection bias results, and the sociocultural influences on PIR score serve to provide direction toward enhancements to a rating scale that has the power to change the role that business schools play within society.
6.2. Implications for Business School Leaders and Business Practitioners
Business school ratings and rankings serve a dual purpose. Firstly, business school rankings signal to the community whether the metrics are legitimate or not, and how the school is performing in comparison to others and therefore serve as a powerful student recruitment tool. Secondly, it has the potential to influence the strategic direction of a school and the priorities of the faculty. The former is driven by an external audience and is influenced wrongly or rightly through media and a general acceptance by business schools as a crowning achievement. The second one implies careful consideration of the right ‘measurement tool’ that ensures performance of the organization that moves them toward the intended goals. These two purposes should be aligned and arguably in reverse order. Selecting the correct ranking and rating system to benchmark the organizational performance and ensuring a more valid and accurate ranking system serve to enhance institutional legitimacy by promoting behaviors internally that align with the school’s vision, core values, and strategy.
For business practitioners who prefer to recruit from top-ranked business schools, understanding the criteria that underpin the traditional ranking scales is essential. In many cases, these criteria have no connection to the caliber of the student learning experience or the vision or core values espoused by the institution.
6.3. Limitations and Future Research Suggestions
The research was conducted during a global pandemic; it is difficult to determine how a student’s assessment of the school’s positive impact would be influenced by shut-down orders that forced them into an alternative learning environment. While upper-year students would have an institutional memory of lived experience on campus, first-year students would not. However, when comparing the PIR scores from this study with Lang’s first-time participation scores (2019), we found no significant differences in the rating. With the exception of the number of females completing the survey (65% versus 38%), the sample was representative of the Lang student body. Although the sample was larger than the number of students that participated in the first PIR report (2020), it only represented 4% of the Lang student population.
Future research suggestions for the PIR scale specifically include: (1) conducting a confirmatory factor analysis (CFA) on a larger data set to determine the latent structure equation of the survey; (2) even if the subsequent CFA rejects the null, we recommend conducting an EFA to explore potential themes that may have been missed and to drop questions that are not empirically supported; (3) identifying an additional set of potential questions for consideration that measure student values; and (4) an additional same study at another business school in close proximity to Lang with traditional business school values to observe PIR differences to enhance the validity of the scale’s ability to measure a school’s positive social impact. This study is phase one of a broader research study that looks at methodological incongruencies and biases of ‘published rankings’, in particular rankings that influence the priorities of business schools, such as the FT50.
6.4. Conclusions
Published rankings, although beneficial for student recruitment, have caused unintended consequences. Perverse effects on faculty and decanal priorities have been identified, including incentivizing behaviors that are at odds with the achievement of the United Nations Sustainable Development Goals (SDGs) [
1,
2] and 2030 agenda. The Positive Impact Rating scale was introduced in response to these observations with the stated purpose to “help speed up the transformation” towards orienting “teaching, research and outreach activities towards social impact and sustainability” [
3] (p. 6). For the PIR to achieve its goals, including becoming broadly perceived as a reliable assessment instrument, it is essential that its metrics and approach be held to a high standard and scale be statistically supported. The reliability of the scale was confirmed by the coefficient alpha. The validity of the scale, although not conclusive, requires further analysis with a larger data set with a recommendation to apply other structural equation modeling analysis. There were selection biases when distributing the scale and socio-cultural factors that influenced PIR scores, and these should be recognized when disseminating the survey and analyzing its results. Although these enhancements and considerations would improve the efficacy of the scale, the PIR scale continues to be a promising entrant to the ‘published ranking’ forum.