The purpose of any psychometric scale, test, or inventory is to produce a valid score that reflects the degree to which a person possesses a given attribute [
1]. For scores to be useful, they should reveal differences among test takers in their attribute if they indeed vary on the attribute. Because an item is the building block of any instrument, the quality of items needs to be checked. That is, items of any assessment should function similarly for different groups of individuals (i.e., ethnicity, gender, school type, universities, etc.), assuming that the grouping condition is irrelevant to the attribute being measured. In the event that this is not the case, the item results in measurement error and has a bias toward a particular group of respondents [
2].
1.1. Measurement Invariance: Literature Review
Measurement invariance is a key idea in many scientific fields, such as psychology, education, and sociology. When comparison among groups or populations is of major importance in a study, it is necessary to ensure that the measuring tool measures the same thing in the same way for all groups. If measurement invariance exists, this means that the construct under investigation is the same across the compared groups. On the other hand, a lack of measurement invariance means that these comparisons are biased since respondents in one group provide systematically different responses than respondents from another group, although they share the same level of the latent trait. In such a case, the obtained differences among the groups are not meaningful since they are the product of bias [
4].
The most frequently used method to test measurement invariance across groups is the multi-group confirmatory factor analysis (MGCFA) method [
5,
6]. Within this approach, various (nested) models are tested by constraining different parameters (e.g., factor loadings, intercepts, residual variances, etc.), and the subsequent loss of fit is compared. The most frequently examined models are the configural model (i.e., whether the scale’s structure is conceptualized similarly across groups), the metric or weak model (i.e., whether each item contributes to the latent construct in the same manner and the same degree across groups), and the scalar or strong invariance (i.e., whether all groups exhibit the same mean level in the latent construct) [
7]. It should be noted that although additional constraints on certain parameters (e.g., item residuals, factor variances, and covariances) could be imposed, these additional forms of invariance are useful only when specific hypotheses regarding the relationship among the dimensions of the construct being measured may be of interest [
8,
9].
Finally, to be able to compare group means at the latent level across different groups, scalar invariance must be supported since only then could one be confident that any statistically significant differences in group means are not due to idiosyncratic scale characteristics (e.g., poor-quality items, low reliability, vague factor structure, etc.) but reflect true mean differences across groups [
8,
9,
10]. On the other hand, if scalar invariance fails, multigroup equivalence cannot be assumed, and as a result, no comparisons at the mean level can be undertaken.
1.2. The Alignment Method
An important limitation of the MGCFA approach as a method of examining measurement invariance, especially with large-scale studies, is that it is extremely difficult to satisfy the assumption of scalar invariance when a large number of group comparisons are involved [
11]. To overcome this problem and make group comparisons feasible, a new method for comparing latent variables across a large number of groups was introduced without requiring measurement invariance [
12]. This is called the alignment method and finds great application, especially in cross-cultural research where large-scale and widely diverse cultural groups are examined, e.g., [
13,
14]. Interestingly, previous research has shown that measurement invariance with the alignment method can be possible even when the number of groups is as large as 92 [
15].
Unlike traditional measurement invariance testing, where both factor loadings and intercepts are constrained, the alignment approach typically allows the intercepts to vary across groups. This recognizes that variations in response styles or item biases may exist between groups. In practice, the alignment method involves estimating a configural model that assumes the same overall factor structure for all groups and aligning this model to the specific factor structures of each group. To achieve this, the alignment technique uses an alignment optimization function to look for invariant item loadings and intercepts, which, in turn, look for latent means and standard deviations. (e.g., a quadratic loss function). With this method, all groups may be compared at once, and latent means can be aligned and compared even if some loadings and intercepts are non-invariant. The function minimizes some non-invariances while leaving some of them large; its logic is similar to factor rotation. Once the groups have been aligned, the factor loadings can be compared directly to assess measurement invariance. Moreover, the resulting aligned model can be used to compare the factor means and variances across groups.
Mathematically, the goal of the alignment method is to align the factor loading matrices (e.g.,
λ1 and
λ2) so that they are comparable across groups. To achieve this, the method attempts to align the groups on a common factor space using orthogonal Procrustes rotation. This involves finding a matrix that minimizes the difference between the factor loadings of the items in the different groups while preserving the overall structure of the factor space. This can be expressed as:
subject to the constraint that R′ R = I, where I is the identity matrix.
The degree of non-invariance in pattern coefficients between each pair of groups is estimated using a
loss function, and Bayesian estimation is used to re-weight the estimates in the configural invariance model to minimize the non-invariance in the aligned model. Equation (1) is part of the loss function and is used to measure the degree of non-invariance between two groups. Specifically, Equation (1) measures the squared difference between the factor loading matrix for group 1 (
λ1) and the factor loading matrix for group 2 (
λ2) multiplied by the rotation matrix (R) that aligns the factor structures. The alignment method iteratively adjusts the rotation matrix to minimize the loss function and align the factor structures across all groups. Once we have found the Procrustes rotation matrix, R, we can apply it to the factor loading matrix,
λ2, to obtain the aligned factor loading matrix,
λ2′:
We can then compare the factor loadings of the items between the two groups by testing whether the aligned factor loading matrix, ′, is equal to the factor loading matrix, , or whether it differs by a constant factor. If the factor loading matrices are equal up to a constant factor, then the measurement instrument exhibits configural invariance. If the factor loadings are equal in magnitude up to a constant factor, then the measurement instrument exhibits metric invariance. If the factor loadings are equal in magnitude and intercept up to a constant factor, then the measurement instrument exhibits scalar invariance.
In its simplest application (i.e., a one-factor model), the alignment method can be mathematically illustrated as follows [
16]:
where
yipg is the
pth observed variable for participant
i in group
g, v
pg represents the intercept,
λpg is the factor loading for the
pth observed variable in group
g, ε
ipg~N (0,
) represents the error term for individual
i in group
g, and η
ig is the factor for individual
i in group
g. The alignment method estimates all the parameters, including
vpg,
λpg,
ag,
ψg, and
θpg as group-specific parameters. This means the method estimates the factor mean and variance separately for each group without assuming measurement invariance. In other words, the alignment method allows each group to have its own unique factor structure rather than assuming all groups.
The initial stage of the alignment method involves estimating the configural model, which assumes that all groups have the same overall structure. The configural model sets certain parameters in each group,
g, to specific values (
) and estimates group-specific parameters for all other parameters. The configural factor model is represented as follows:
Since the aligned model has the same fit as the configural model, certain relationships must be held between these parameters.
where
and
are the
model estimated variance and mean.
By imposing equality constraints,
, Equation (5) will be:
Putting the value of
obtained from Equation (7) into Equation (6),
To make this more precise, the alignment function,
f, will be minimized in terms of
αg and
ψg. This function takes into account all sources of non-invariance in the measurements.
where
W is the factor weight,
N is the sample size of the group
, and
f is a component loss function:
where
represents a small value, typically around 0.0001.
The alignment function, f, is designed to be approximately equal to the absolute value of x. To ensure that the function has a continuous first derivative, which makes optimization easier and more stable, we use a positive value for .
Previous studies have introduced the alignment method as a powerful tool for analyzing measurement invariance across multiple groups, especially when the number of compared groups is large, e.g., [
11,
12,
15]. It seems a sophisticated method for examining measurement invariance in latent variable models because it provides researchers with a flexible way to align specific components of the measurement model while allowing for intercept variability, ensuring meaningful and valid comparisons across different groups or time points. The alignment approach allows for a nuanced examination of measurement invariance by selectively aligning parameters.
From this perspective, this study aimed to demonstrate the empirical usefulness of this method when a large number of group comparisons are necessary. For that, we examined the measurement invariance of an instrument test, measuring general cognitive ability across 36 universities in the Kingdom of Saudi Arabia. The findings from this study will help researchers evaluate the psychometric robustness of the method in examining measurement invariance across multiple groups using real-life data. Additionally, it will help them determine whether this strategy is useful in actual practice when meaningful comparisons between groups (e.g., universities) are important. For instance, the results of this study may be a strong warrant for a better understanding of students’ academic performance and score differences among universities, helping policy educators and national governmental agencies explore the possible factors that might have caused these gaps.