1. Introduction
We live in a cross-cultural, diverse world where globalization has become a fact of organizational life. Different countries’ cultures are being brought closer together than ever before in history [
1]. This process is especially noticeable in Europe, where the European Union has brought about increased economic interdependency and cultural exchange among member states [
2]. This asks for greater cultural sensitivity on the part of our leaders, and also calls into question whether our leadership models, which have mostly been developed within an Anglo-Saxon context, hold across culturally diverse countries such as the members of the EU. With their mostly knowledge-based economies, European economies largely depend on bringing out the best in people. Paying attention to employees’ needs is a crucial factor in staying competitive and achieving long-term success [
3]. The leadership theory that is most oriented towards the needs of employees is servant leadership [
4], and the focus of servant leadership on service may also help to provide a bridge between culturally different worldviews [
1].
For leadership scholars, it is important to develop an accurate understanding of the conceptual similarity of an instrument across different countries and languages. A major concern in this regard is the cross-cultural equivalence of an instrument’s measurement properties (see [
5,
6]). In fact, only with a measure that captures the same construct across different cultural contexts we can compare results across countries and truly gain valid insights into effective cross-national leadership. This is especially true within Europe, where each country has developed its own habits, norms and values [
2] and where several different cultural clusters can be distinguished [
7]. Within this paper, we are therefore particularly interested in the cross-cultural validity of a servant leadership instrument. The main aim of this paper is to study the measurement invariance of the Servant Leadership Survey (SLS) [
8] in different countries. Interestingly, although leadership research is a global phenomenon, explicit invariance tests are scarce (with the exception of the GLOBE leadership study; Global Leadership and Organizational Behavior Effectiveness; House et al. [
9]). Additional insight into which dimensions hold cross-culturally will contribute to the servant leadership literature specifically and to the leadership literature more generally.
1.1. The Construct of Servant Leadership and the Servant Leadership Survey (SLS)
Since Greenleaf [
10] introduced the notion of servant leadership into the vocabulary of leadership research and practice, several conceptualizations of servant leadership have emerged, most over the past 15 years. One of the central features of servant leadership that has been established in its recent history is that servant leadership "places the good of those led over the self-interest of the leader, emphasizing leader behaviours that focus on follower development, and de-emphasizing glorification of the leader” [
11] (p. 397).
Servant leadership has been shown to have a positive impact on follower well-being as well as the overall effectiveness of individuals and teams. Moreover, it has passed the test of showing incremental validity beyond what other popular leadership approaches, most notably transformational leadership, have offered (for a comprehensive overview, see [
4,
12]).
More recently, Van Dierendonck introduced an integrative model of servant leadership and developed the Servant Leadership Survey (SLS) [
8], which represents a psychometrically sound measure covering the key aspects of servant leadership. The SLS is a 30 item eight-dimensional scale of servant leadership, including empowerment, accountability, standing back, humility, authenticity, courage, forgiveness and stewardship. In essence, this measure emphasizes that servant leaders empower and develop people, are willing to retreat into the background and let others shine, hold followers accountable for their work, are willing to let bygones be bygones, dare to take risk, are willing to show what they stand for, have an openness to learn and a willingness to admit mistakes, and work for the good of the whole.
1.2. Cultural Influences on the Perception of Servant Leadership
The countries in our sample are quite diverse and well-representative of Europe and its broad range of national cultures. We were able to include data from the Netherlands, Portugal, Iceland, Italy, Finland, Germany, Turkey and Spain. These countries are all part of the European Region and each has its own culture, language and history.
With regard to the impact of national culture on servant leadership effectiveness, Van Dierendonck [
4] suggested that individualism versus collectivism may be influential. Our sample includes collectivistic countries such as Portugal and Turkey on the one end of the spectrum and individualistic countries such as the Netherlands and Italy on the other end, with the remaining countries scoring in between [
13]. Additionally, with respect to power distance, the sample is quite diverse: it includes countries with a high-power distance such as Turkey and Portugal and those with a low-power distance such as Finland and Germany. Additionally, it should be noted that each country has a different native language, which allows us to compare eight different language versions of the same measure. Translating words into different languages has its challenges, given that a literal translation is not always the best method of conveying meaning. What appear to be the same words may hold very different meanings [
14]; this is certainly the case for the term ‘leader’, where the context often determines which term may be the best translation. More specifically, for what constitutes good leadership, there are also clear differences within Europe. According to the GLOBE leadership study (Global Leadership and Organizational Behavior Effectiveness; House et al. [
9]), the Netherlands and Germany belong to the Germanic cluster, which scores high on future orientation and performance orientation and low on humane orientation and institutional collectivism. Italy, Spain and Portugal belong to the Latin Europe cultural cluster, which is characterized by a comparatively low value placed on humane orientation and on institutional collectivism. Turkey belongs to the Middle East cluster, which scores high on in-group collectivism and low on uncertainty avoidance. Finland (and probably Iceland, although no data are available) belongs to the Nordic European cluster, which scores high on future orientation and gender egalitarianism and low on assertiveness orientation and in-group collectivism.
This cultural diversity within Europe in general and the countries represented in our combined sample can also be illustrated by Ronen and Shenkar’s [
7] recent cultural clustering of similarity and dissimilarity in work-related attitudes based on the most influential cultural models, including Hofstede’s model and the GLOBE study. According to their analysis, the Netherlands, Finland, and Iceland are part of the Nordic cluster, Germany belongs to the Germanic cluster, Portugal, Spain and Italy belong to the Latin Europe cluster, and Turkey is part of the Near East cluster.
Given the overall differences in culture and ideal leadership types in the eight countries, one could wonder to what extent a single leadership theory may be helpful in furthering our understanding of leadership in Europe. The GLOBE research program provides insight into leader behaviours that are perceived by almost everyone around the world as effective and desirable. This ideal leadership type represents values and behaviours that have a clear and wide overlap with servant leadership, such as high integrity, having strong interpersonal skills, and lacking egocentric and self-focused behaviours [
9,
15]. According to the GLOBE study, people throughout the world perceive trustworthiness, fairness and honesty as essential elements of good leadership. Thus, it seems that although the perception of leader effectiveness is influenced by national context and culture [
16], globalization and interdependence between nations have led to a growing awareness of the importance of leaders who show in their everyday behaviour a focus on growth, development and innovation among their followers [
17]. As such, it might be expected that the essence of servant leadership will be viewed similarly throughout Europe.
1.3. Establishing Measurement Equivalence and Invariance
Ideally, we would like to have a measure where the factor structure is fully equivalent, which means that the items are related to their respective dimensions in the same way and the interrelations between the dimensions as exemplified by their relative contributions to one underlying servant leadership dimension are identical. Several researchers in this field have stressed, however, that such full measurement invariance frequently does not hold, and thus represents a too-strict requirement for group comparisons [
18,
19]. Especially with an eight-dimensional measure such as the SLS, it is unlikely that eight different survey versions will be realized and experienced identically. Hence, in our study we focus on partial measurement invariance, which allows some model parameters to vary across groups while still ensuring that the overall comparison is meaningful [
18,
19].
Combining the recommendations of Byrne [
20] on testing cross-cultural multi-group equivalence with those of Chen et al. [
21] on testing the measurement invariance of second-order factor models, we distinguish three levels in testing the measurement invariance of the SLS.
The first level is configural invariance, which means that the same items are indicators of a dimension in each group. For the SLS, this level has already been confirmed through confirmatory factor analysis for the eight-dimensional structure in the survey version for the Netherlands and the UK [
8], Finland [
22], Italy [
23], Germany [
24], Portugal [
25], Spain, Argentina and Mexico [
26]. In the present research, we will test the configural equivalence of the instrument in eight European countries simultaneously, which we expect to be stable.
Hypothesis 1. The factor structure of the SLS is configurally equivalent across samples.
In addition to testing the factorial structure of the SLS, this model will also provide the baseline for the subsequent invariance models. The second level refers to measurement equivalence. This tests whether the factor loadings connecting the items with the dimensions are similar across groups. This level provides insight into the extent to which the items are interpreted in a similar way, and has also been called metric invariance. With metric invariance, the relations between the factors and external variables can be compared. Thus, it allows for the comparison of results between studies performed in different countries, given that the measurement unit underlying a factor is the same. This gives our second hypothesis.
Hypothesis 2. The factor loadings connecting the items to the dimensions will be invariant across countries, indicating measurement equivalence.
The third level is structural equivalence, which focuses on the unobserved (i.e., latent) variables. In our case, we focus on the model covering the eight servant leadership dimensions and their interrelations. The SLS covers different attributes that together represent servant leadership. Specifically, we aim to test whether the eight-dimensional structure of servant leadership that is supposed to underlie servant leadership as measured by the SLS is similar across countries. In other words, the parameters of interest are the relationships between the eight dimensions of the SLS as well as the factor loadings between the eight dimensions. In line with the different leadership preferences between cultures, which make full measurement equivalence unlikely, we focus here on partial metric invariance [
18]. This requires cross-country invariance of most but not necessarily all intercorrelations in order for subsequent tests with the SLS to be meaningful.
Hypothesis 3. The SLS will show partial structural equivalence across countries.
In their developmental article, Van Dierendonck and Nuijten [
8] showed that their eight-dimensional instrument covers both the ‘leader’ side (i.e., empowerment, accountability, courage, stewardship) and ‘servant’ side elements of servant leadership (i.e., standing back, humility, authenticity). In the present research, we are also examining the possibility of shortening the measure so that the remaining items and dimensions are perceived as equivalent across countries, while simultaneously representing the ‘leader’ and the ‘servant’ aspects of the original SLS.
Taken together, this study provides insight into the extent to which servant leadership as a whole can be interpreted similarly across different samples from eight different countries. By explicitly testing three levels of measurement invariance, we are able to gain insight into the extent that empirical results of the SLS can be compared.
3. Results
Mplus 7.11 was used for data analysis [
30]. As a preliminary step, data from all samples were combined and subjected to confirmatory factor analysis (CFA) to test the eight-dimensional model, allowing all dimensions to intercorrelate. The model fit the data well; that is χ
2 = 6296.351,
df = 377, CFI = 0.94, TLI = 0.93, RMSEA = 0.06. With one underlying second order factor, the fit was still good with χ
2 = 8220.788,
df = 397, CFI = 0.92, TLI = 0.91, RMSEA = 0.06. This confirms the underlying eight-dimensional model for the sample as a whole.
Table 1 depicts the correlations between the eight dimensions for the overall sample. There may, however, still be differences between countries, which we tested subsequently.
We evaluated the measurement invariance across the different countries and languages in three steps (
Table 2). First, we tested the configural invariance where the underlying model is the same in all countries, and all factor loadings were set to be free across the sub-groups (i.e., countries). Our results show that the eight-dimensional configural invariance model, allowing all latent dimensions to correlate, was clearly a better fit for the data than the one-dimensional model where all items load on the same factor. Accordingly, the configural invariance model obtained good relative fit indices with 0.92 for CFI (Bentler Comparative Fit Index), 0.91 for TLI (Tucker Lewis Index), 0.06 for RMSEA (Root Mean Square Error of Approximation) and 0.05 for SRMR (Standardized Root Mean Square Residual). This confirms Hypothesis 1, that the proposed factorial structure can be confirmed in the different samples.
In the next step, we tested for multi-group invariance by analysing the measurement equivalence model, which was then compared with the configural model. In the measurement equivalence model, not only is the underlying model the same, additionally, the factor loadings linking the items to their latent factors are set to be the same across sub-groups (i.e., countries). Results show the χ
2 difference to be statistically significant (Δχ
2 = 1643.203,
p < 0.001), which is to be expected given that the chi-square is sensitive to sample size. With a sample size of 5201 persons, even very small differences are likely to become significant. More important, therefore, are the differences in comparative fit indices. These are limited: only 0.02 for CFI and TLI, similar for RMSEA and 0.03 for SRMR. This pattern indicates some differences (see [
31]), but overall and in terms of absolute values, three out of four fit indices are still within established levels for CFI, RMSEA and SRMR [
20]. These results indicate that there are variations between the different samples with regard to the interpretation of some of the dimensions, thus we show only partial measurement invariance and Hypothesis 2 could not be confirmed; however, the overall interpretation of servant leadership as measured by the SLS is still quite similar.
Finally, the structural model was analysed. This is an even stricter model given that the correlations between the latent factors are also set to be the same between the countries. Here, the fit indices dropped below the 0.90 threshold for CFI and TLI and the 0.08 threshold for SRMR, which indicates that the relationships between the dimensions are different between the countries. It should be noted, however, that full structural invariance seldom occurs and that partial invariance suffices to allow for a comparison of the findings between studies.
Given that partial invariance may be the best that can be reached with a multi-dimensional measure such as the SLS (in our case: measurement equivalence), we checked the relative contribution of the chi-square of the different samples for the structural equivalence model. If the relative chi-square is higher, this indicates that the lesser fit is due to some kind of misfit within the sample from that country as compared to the overall full-sample fit. The relative chi-square was highest for Iceland (2251.287), followed by Portugal (1438.097), Finland (1046.610), the Netherlands (967.304), Germany (859.880), Italy (852.997), Turkey (777.433) and Spain (687.681). We therefore checked for invariance by testing the model without the data from Iceland. The relative fit of the configural invariance model was similar (CFI = 0.92, TLI = 0.91, RMSEA = 0.05, SRMR = 0.05). It did, indeed, result in a better fit for the measurement equivalence model, with CFI = 0.91, TLI = 0.90, RMSEA = 0.05, SRMR = 0.07. The fit of the structural invariance model was similar as with the full dataset with CFI = 0.87, TLI = 0.86, RMSEA = 0.06, SRMR = 0.08, thereby confirming partial measurement invariance, as stated in Hypothesis 3.
Development of the SLS Short Version
Next, we examined the possibility of improving cross-cultural stability by shortening the SLS. The previous analysis confirmed configural invariance and partial measurement invariance. Structural invariance, however, remained below recommended values. We aimed for a measure that would fulfil all three invariance standards and where the subscales would be internally consistent with a Cronbach’s alpha of at least 0.70. Our basic premise was to choose those dimensions and items that best reflected the underlying servant leadership concept [
8]. Thus, we aimed for cross-cultural stability by removing those dimensions and items that were interpreted differently between countries. We started by removing those dimensions that least reflected servant leadership with the greatest variance across countries. Next, within the remaining dimensions, items with a relatively low item-total correlation were removed to retain those items that best reflected the total underlying variance of a subscale. In this way, we aimed to generate a shorter measure that would reflect the core dimensions of servant leadership with a stable and invariant underlying factorial model across countries and with reliable subscales.
To gain better insight into the different interpretations of servant leadership across different countries, we used the configural invariance model with eight dimensions and one underlying second-order factor. The outcome provides the standardized factor loadings for the eight dimensions of servant leadership for all countries. This gave the following pattern: Empowerment = 0.94 (range between 0.81 and 0.96); humility = 0.89 (range between 0.83 and 0.94); standing back = 0.84 (range between 0.82 and 0.97); stewardship = 0.94 (range between 0.44 and 0.99); authenticity = 0.83 (range between 0.68 and 0.98) courage = 0.34 (range between −0.33 and 0.75); forgiveness = 0.40 (range between 0.21 and 0.63); accountability 0.64 (range between 0.23 and 0.84). From these average values and ranges, we may conclude that empowerment, humility, standing back, stewardship and authenticity are clearly incorporated within servant leadership across countries. Courage is a problematic dimension, as the overall factor loading is the lowest of all eight dimensions; the negative value is due to the results of the Icelandic sample where the two items have a low intercorrelation (r = 0.11). We therefore decided to remove both courage items. Forgiveness was the other dimension with a low overall factor loading. In some countries, this value dropped below 0.30. The items belonging to this dimension were therefore also dropped. The third dimension we decided to remove was accountability, given that its average factor loading was lower than that of the other five and that its range was broader, with a lower limit of 0.23 in one of the countries, indicating that accountability was not perceived as a central part of servant leadership overall.
We aimed for a shorter scale that combined cross-cultural invariance with good internal consistency for the subscales. For the internal consistency, at least three items in each subscale are preferred. Two of the remaining five dimensions consisted of three items (i.e., standing back and stewardship) and were therefore retained in their original forms. For empowerment, one item’s factorial loading was lower in several countries whereas the remaining six items loaded equally well; thus, this item was removed. Given that empowerment has two underlying aspects; encouragement of followers’ growth and providing autonomy and freedom to do one’s job, the remaining six items were kept to represent both elements within this subscale. The correlation between the seven- and six-item versions is 0.99. A similar process was used to decide which three of the five humility items best reflected the essence of this subscale. Here, the correlation between the five-item and three-item scale was 0.97. For authenticity, one item was removed to enhance the overall fit of this dimension with the other dimensions, resulting in a three-item scale with a correlation of 0.96 with the four-item scale. Together, the end result is a five-dimensional survey with 18 items (see
Table 3).
Again, we conducted an overall test of this five-dimensional model by combining all samples from different countries into one overall sample. This yielded an excellent overall fit of χ
2 = 1749.325,
df = 125, CFI = 0.97, TLI = 0.97, RMSEA = 0.05, SRMR = 0.02, which was better than the fit associated with the one-dimensional model with all 18 items loading on one factor (χ
2 = 8049.607,
df = 152, CFI = 0.88, TLI = 0.87, RMSEA = 0.10, SRMR = 0.05).
Table 4 shows the measurement invariance fit indices, which were excellent for the configural invariance model. The overall fit was still good for the five-dimensional model taking country into account as a subgroup and holding the factorial loadings across countries constant (measurement equivalence). Even the structural equivalence model showed a good fit to the data. A comparison of the relative indices of the short version (
Table 4) with those of the full version (
Table 2) shows that the overall invariance fit increased considerably.
Not surprisingly, the overall five-dimensional model, including a second order SLS-factor using the full sample and not taking country into account, confirmed the excellent fit of the SLS short version (χ
2 = 2227.886,
df = 130, CFI = 0.97, TLI = 0.96, RMSEA = 0.06, SRMR = 0.03), whereby all dimensions contributed strongly to the underlying servant leadership factor: 0.91 for empowerment, 0.91 for humility, 0.86 for standing back, 0.91 for stewardship and 0.89 for authenticity. The factor loadings of the items on the subscales can be found in
Table 3. The internal consistencies of these subscales ranged between 0.71 and 0.92 (see
Table 1). For the scale as a whole, the internal consistency is 0.95 with item-total correlations ranging between 0.58 and 0.79. We can therefore safely conclude that this 18-item version is factorially valid, internally consistent and holds cross-nationally.
4. Discussion
This paper set out to study the cross-cultural equivalence of the Servant Leadership Survey. With a composite sample of 5201 individuals, we were able to establish the factorial validity, configural invariance and measurement equivalence of the measure across eight countries and languages. Taken as a whole, the present paper contributes to the servant leadership literature because it shows that it is possible to measure servant leadership in a similar way, internationally and cross-culturally, with samples from a broad range of sectors and professions. Additionally, we also found indications that the relations underlying the original dimensions as formulated by Van Dierendonck and Nuijten [
8] may differ between countries. To tackle this issue, we introduced a shorter version of the SLS that retains the essential elements of the longer version.
Cross-cultural research into servant leadership has been growing over the last few years, as have the number of measures available [
4]. Servant leadership theory originated within the United States, as have most of the measurement instruments. Although research on servant leadership throughout the world is growing (for an overview see, [
1]), our study is the first to explicitly explore and confirm that servant leadership can be measured across languages, nations and cultures in Europe. This paper builds on the encouraging results of the SLS in the original developmental paper [
8]. Including the British sample in that article and the eight samples in the present paper, it has now been shown that the SLS can be reliably measured in nine languages.
The partial invariance of the measurement invariance model confirms the findings of factorial validity of the separate samples combined into our composite sample, as shown in recent articles [
22,
23,
24,
26]. Both the factorial structure and the individual item loadings are similar across most countries and languages. It took only the removal of one country (i.e., Iceland) to reach generally acceptable values. This encouraging result allows for future testing and interpretation of the relationships between servant leadership and other concepts across countries using the SLS. The lack of fit of the Icelandic sample may have been caused by the sample’s specific language and culture, but it is just as likely due to the specific characteristics of the sample, given that our composite sample consisted of convenient subsamples with different occupational backgrounds. Specific for Iceland is the relative high percentage of woman and an on-average older age. Future research will have to determine to what extent the lack of fit for the Icelandic sample is due to cross-cultural factors or due to specific sample characteristics in terms of profession, sector, age and gender.
Regretfully, for the eight-dimensional model, the fit indices of the structural equivalence model were below recommended values. This may be due to several reasons. One is the broad spectrum measured by the dimensions of the SLS. Although aspects such as empowerment and humility are conceptually defined as part of servant leadership just like forgiveness and courage, it may be clear that their focus is different. For example, empowerment can be seen as a more general approach to encouraging followers to become the best they can be, whereas forgiveness is more conditional in that it is only shown after followers make mistakes. It should also be acknowledged that even for a valid and reliable measure, structural differences between countries are usually real and interpretable given measurement invariance of the constructs, which is the case for the SLS. Another possible explanation is the specific content of the operationalization of the dimensions. With only three or two items, respectively, aspects such as accountability, forgiveness and courage cannot be measured in their full complexity. To acknowledge these aspects more comprehensively, it may be necessary to formulate additional items. Another option is to use already existing measures when one is interested in studying those particular dimensions, instead of the SLS. For example, for leadership forgiveness, researchers may want to review Verdoold and Van Dierendonck [
32].
The results of the shorter, five-dimensional measure are very encouraging. The fit indices confirmed configural invariance, measurement equivalence and structural equivalence. This version allows us to test servant leadership in different countries with the same measure, knowing that the underlying concept is experienced in a similar way. It also allows for a comparison of mean levels. The latter is very helpful because it means that we can now study the extent to which servant leadership is experienced in different countries. One of the limitations of the present study is that we had to rely on convenient samples for our composite sample. The samples in this paper came from different occupations and sectors, so no mean values are given. As such, we cannot make any claim with regard to generalizability of the average values in the different countries, and we strongly encourage further research to overcome this limitation. It would be very insightful for a future study to measure servant leadership within the same sector(s) in different countries, similar to the setup used in the original study by Hofstede [
33]. Another limitation is that we did not have similar outcome variables available across samples to test the similarity in predictive validity. Additionally, in some countries the sample size was relatively small compared with other countries. Nevertheless, by combining the different datasets, our analyses have the unique feature of being able to test the factorial validity across different countries, which has a strong additional value above separately testing the factorial validity in one single country.
In conclusion, this article builds on the earlier paper by Van Dierendonck and Nuijten [
8] by also confirming the configural and measurement cross-cultural equivalence of the SLS. It is the first servant leadership measure where such an extensive insight is available, thus rendering this measure a serious contender for researchers throughout the world interested in using a valid and reliable measure to operationalize servant leadership.