In social science research, latent variables are abstract concepts that cannot be observed directly, such as attitudes, preferences, trust, loyalty, and honor. Latent variables are described or predicted by the observed variables that can be accessed directly. Traditional statistical methods, such as analysis of variance and multiple regression analysis, can only analyze the relationship between observed variables and cannot accurately measure and examine a model containing latent variables. This study used a structural equation model under maximum likelihood estimation for analysis. Structural equation models are a combination of confirmatory factor and causal models—known as the measurement and structural models, respectively—which are capable of analyzing complex situations involving both latent and observed variables. This comprehensive technique integrates multiple regression analysis, path analysis, and causal theory. Specifically, the measurement model is primarily used to conduct factor analysis and describe the relationship between latent and observed variables (i.e., factor loading). The structural model is mainly used for path analysis and to describe the relationships between latent variables (i.e., path coefficients). Overall, this study’s analysis comprised a test of the measurement and structural models, as well as a comparative experimental analysis to ensure the accuracy and stability of the model.
3.3.1. Measurement Model Testing
Confirmatory factor analysis (CFA) was used to test the measurement model. This is an important part of structural equation modeling and aims to evaluate and verify the fit of the measurement model and confirm whether the observed variables truly describe the corresponding latent variables [
30]. As Timothy et al. demonstrated, in empirical studies in the social sciences, especially psychology and behavioral science, conducting CFA is much more important for measurement models than it is for structural models [
31].
A form of theory-driven analysis, CFA is premised on the principle of using sample data to verify a preset factor structure hypothesis based on theoretical or prior knowledge [
32]. Based on existing theoretical knowledge and practical experience, researchers hypothesize that the observed variables are affected by common factors. On the basis of the theoretical hypothesis, a measurement model covariance matrix to be estimated is constructed, with the sample data obtained through a questionnaire survey constituting the actual sample covariance matrix [
33]. CFA is then conducted to determine whether the measurement model is consistent with the actual data by repeatedly comparing the differences between the two covariance matrices [
34]. Accordingly, measurement models are often described using a measurement equation. Consider the latent variable estimated from three observed variables as an example, which can be expressed as follows:
where
X is the latent variable,
x1 −
x3 is the observed variable,
is the factor load, and
is the measurement error.
Typically, the standardized factor loading in a measurement model should be between 0.6 and 0.95. Anything below this lower limit suggests a weak relationship, and anything above the upper limit indicates the overfitting of a single observed variable. When measured on a self-developed scale, values above 0.5 are considered acceptable. The next step in the process of constructing a complete measurement model is evaluating the identifiability of the initial model [
35]. If the model’s degree of freedom is not less than 0, the model is identifiable and there is a unique solution to its parameter estimates [
36]. Only when the model is identified can the subsequent parameter estimation and model checking results be considered meaningful. The parameter estimation and fitting analysis of the measurement model are carried out on this basis [
37]. In general, obtaining optimal results from the initial measurement model is difficult; thus, it is necessary to adjust the model according to the correction index until reasonable results are achieved. Such refinement is necessary to ensure that the scale structure is consistent with the theoretical conception and that the measurement model is accurate and reliable [
38].
To guarantee the scientificity of the entire index system of tendentious information dissemination, we should not only optimize the single measurement model, but also ascertain the reliability and validity of the overall measurement scale. Item reliability is an important indicator of the reliability and stability of a single item in a measurement tool. Item reliability is typically determined using squared multiple correlation (
SMC). As per Formula (3),
SMC is numerically equal to the square value of the load of standardized factors, and indicates the explanatory ability of latent variables to observed variables. Generally, values above 0.36 are acceptable, and values above 0.5 are ideal [
30]. This study calculated
SMC as follows:
wherein
λi is the standardized factor loading of observed variable
i.
Composite reliability (
CR) reflects the degree of internal consistency among all observed variables under the same latent variable on the measurement scale. A higher
CR means that the items in the scale have a high degree of internal consistency in measuring the same latent variable, and that these items all point to and measure the same theoretical hypothesis and model structure.
CR is calculated based on the standardized covariance or correlation coefficient between the observed variables. Generally, the closer the
CR value is to 1, the higher the internal consistency of the scale and the more reliable the measurement model. Values above 0.6 are recommended as acceptable and values above 0.7 as ideal [
39]. This study calculated
CR as follows:
wherein
λi is the standardized factor load of observed variable
i, and
is the measurement error corresponding to observed variable
i.
Convergent validity tests the degree of correlation and the consistency of items within the same latent variable on a measurement scale. Average variance extracted (
AVE) is typically used to evaluate the convergent validity of a measurement scale for latent variables.
AVE measures the average explanatory ability of all items in each dimension to the whole dimension, that is, the ratio of the explainable variance of latent variables to the total variance of items. Generally, a higher
AVE value indicates that the observed variables under the same latent variable are more convergent, and that the explanatory power of the latent variable to the observed variables is stronger. A value above 0.4 is recommended as acceptable, and a value above 0.5 is ideal [
39]. This study calculated
AVE as follows:
where
λi is the standardized factor load of observed variable
i, and
is the measurement error corresponding to observed variable
i.
Discriminant validity assesses whether there are real differences between the measured values of a measurement tool when measuring different concepts. In other words, discriminant validity measures whether a measurement scale can effectively distinguish between different dimensions and concepts. Discriminant validity can be evaluated using the correlation coefficient matrix of the index system. When the arithmetic square root of the
AVE value of a latent variable is greater than the Pearson correlation between it and other latent variables, the latent variable is considered to have obvious discriminant validity with other variables [
40].
Based on the foregoing, this study conducted CFA on each of the seven indicators of tendentious information dissemination in cyberspace.
Figure 4 illustrates the measurement model using the latent variable of ATD as an example. In
Figure 4,
represents the six observed variables measuring the latent variable (ATD),
is the factor loading amount of each observed variable, and
is the measurement error generated by each observed variable in estimating the latent variable (ATD). Similarly, eight observed variables were used to measure SND, IDB, ISO, and DTR, and seven observed variables were used to measure PBC and IDI. A total of 52 measurement items were used as observed variables in the questionnaire. The reliability and validity of the entire measurement questionnaire were tested based on a CFA of the single measurement model.
3.3.2. Structural Model Testing
Reasonable results of CFA can provide assurance for the effective evaluation of subsequent structural models and reliability of statistical conclusions. Using a structural equation model comprising two exogenous latent variables and one endogenous latent variable as an example, the relationship between the latent variables can be described by structural equations, expressed as follows:
Among them,
wherein
Y is the endogenous latent variable; Γ is the matrix of the path coefficients of the exogenous latent variables on the endogenous latent variables, including
γ1 and
γ2;
X is the exogenous latent variable, including
X1 and
X2; and
e is the structural residual of the equation.
Structural equation models are based on analysis of variance (ANOVA). Essentially, the structural equation model testing process involves conducting a differential analysis between the model covariance matrix (
E) obtained according to the model hypothesis relationship and the sample covariance matrix (
S) based on the actual collected data. Accordingly,
S-E can be used to measure the extent to which the hypothesized model is close to a real model. If there is no significant difference between the two models within an acceptable statistical range, the hypothesized relationship of the model describes the real problem well. This process is achieved by calculating and analyzing the fit index. The fit index can be divided into absolute fit index, relative fit index, and parsimony fit index according to different test criteria. The chi-square value (
χ2) is the basis of many fit index calculations. A smaller value indicates that the constructed model has a better fit to the sample data, but it is easily affected by the sample size. In the absolute fit index,
χ2/
df does a good job of eliminating the influence of degrees of freedom, with values less than 5 deemed acceptable and values between 1 and 3 considered ideal [
41].
Goodness of fit index (
GFI) and adjusted goodness of fit index (
AGFI) analyze the proportion of the sample covariance explained by the model covariance, with values above 0.8 and 0.9 considered acceptable and ideal, respectively [
42].
GFI and
AGFI are calculated using Equations (7) and (8), as follows:
wherein
E is the model covariance matrix,
S is the sample covariance matrix,
I is the characteristic equation, and
tr is a matrix trace.
where
p is the number of observed variables,
df is the degrees of freedom, and
GFI is the goodness of fit index.
The root mean square error of the approximation (
RMSEA) is used to evaluate the model’s degree of fit from the perspective of the residual error. Values below 0.08 are considered acceptable, while values below 0.05 are ideal [
43,
44].
RMSEA is calculated as follows:
in which
χ2 is the chi-square value,
df is the degrees of freedom, and
N is the sample size.
The comparative fit index (
CFI) and non-normed fit index (
NNFI) are relative fit indices representing the degree of improvement of the assumed model relative to the worst-fitting baseline model. Values above 0.9 and 0.95 are considered acceptable and ideal, respectively [
43,
44].
CFI and
NNFI are calculated as follows:
where
is the
χ2 of the baseline model,
dfN is the
df of the baseline model,
is the
χ2 of the model to be tested, and
dfM is the
df of the test model.
The parsimonious goodness of fit index (
PGFI) and parsimonious normed fit index (
PNFI) are composite indexes of model parsimony and fitting effect, with values greater than 0.5 considered acceptable [
45].
PGFI is calculated as follows:
in which
E is the model covariance matrix,
S is the sample covariance matrix,
I is the characteristic equation,
tr is the matrix trace,
dfN is the
df of the baseline model, and
dfM is the
df of the test model.
Meanwhile,
PNFI is calculated as follows:
where
is the
χ2 of the baseline model,
dfN is the
df of the baseline model,
is the
χ2 of the model to be tested, and
dfM is the
df of the test model.
When evaluating a model, we should not rely on a single fit index because each index evaluates the performance of a model from a different perspective. Some indices focus on assessing the complexity of the model, while others focus on its explanatory and predictive power. Therefore, to obtain a comprehensive understanding of the model performance, it is necessary to comprehensively consider multiple fitting indexes and evaluate the merits and demerits of the model in combination with the specific research background and purpose [
36].
In structural model testing, in addition to comprehensive analysis of model fit, we should pay attention to the explanatory ability of the model in respect to the dependent variables. The coefficient of determination (
R2) is a measure of the predictive power of the model and is used to evaluate the degree to which the structural equation model can explain the variation in endogenous variables, including the degree to which all paths in the model (including direct and indirect paths) can jointly explain the variation in endogenous variables. R
2 is calculated as follows:
where
is the variance estimate of the endogenous latent variable and
is the variance estimate of the variable residual. The value of
R2 ranges from 0 to 1; the higher the value, the better the model explains the dependent variable. According to Zong et al., in empirical studies based on structural equation models, the
R2 of the dependent variable is considered acceptable if it exceeds 0.19. However,
R2 values exceeding 0.33 and 0.67 indicate that the model has moderate and substantial predictive and explanatory power, respectively [
46].