In the following, I classified the articles according to five categories. Each category is treated in a subsection.
2.1. Multilevel Modeling and Structural Equation Modeling
The article of Rosseel [
1] discusses maximum likelihood estimation for two-level structural equation models under a perspective of computationally efficient implementations of the observed log-likelihood function. By presenting R snippets, several implementations are compared that motivate the final implementation in the
lavaan package.
Jak et al. [
2] discuss the estimation of different two-level factor models for cluster-level constructs in the software package
lavaan and M
plus. They compare the so-called configural model and the simultaneous shared-and-configural model to replicate the simulation study of Stapleton and Johnson (2019,
J. Educ. Behav. Stat.). As an outcome of their study, Jak et al. [
2] worried about default settings in the M
plus software for the chi-square test of model fit and provide suggestions for circumventing these issues.
As a comment on Jak et al. [
2], the M
plus authors Asparouhov and Muthen [
3] suggest modifying of the robust chi-square test of fit. The improved statistic yielded more accurate type I error rates when the estimated model parameters are at the boundary of the admissible parameter space, which was the focus in Jak et al. [
2].
Hecht et al. [
4] investigate different Markov-chain Monte Carlo implementations of the two-level random intercept model in the popular general Bayesian software packages JAGS and Stan. The authors compare a parameterization based on sufficient statistics (i.e., means and covariances; covariance- and mean-based parametrization) with a classic parameterization that also samples random effects. Computational efficiency was assessed as the effective sample size per second. It turned out that Stan outperformed JAGS in the covariance- and mean-based parameterizations, but JAGS outperformed Stan in the classic parameterization.
Zitzmann et al. [
5] discuss the assessment of the convergence of Markov-chain Monte Carlo estimation in the M
plus software. They argue that the effective sample size should be preferred over the frequently used potential scale reduction factor. Zitzmann and Hecht (2019,
Struct. Equ. Modeling) propose a method that can be used to check whether a minimum effective sample size has been reached in Mplus. This method was evaluated in a simulation study in the contribution of this Special Issue.
Schoemann and Jorgensen [
6] review methods of estimating and testing latent variable interactions in structural equation modeling, with a focus on the product indicator method. They demonstrate how the product indicator methods of examining latent interactions can provide an accurate method to estimate and test latent interactions. Moreover, the authors show how this method can be implemented in any structural equation modeling software package. Schoemann and Jorgensen [
6] illustrate the implementation of the product indicator method in the
semTools package that relies on the R package
lavaan for fitting the structural equation model.
Jorgensen [
7] show how to use structural equation modeling for estimating error components in generalizability theory for continuous and ordinal items. The author uses real and simulated datasets to demonstrate how a structural equation model can be specified to estimate the absolute error by posing constraints on the mean structure (for continuous items) as well as the thresholds (for ordinal items). Different estimators for continuous and ordinal items are compared using the R packages
lavaan and
gtheory.
The article of Arnold et al. [
8] investigates parameter heterogeneity with respect to covariates in structural equation models. The authors demonstrate how the individual parameter contribution regression framework could be used to predict differences in any parameter of a structural equation model. Arnold et al. [
8] implement the individual parameter regression framework in the R package
ipcr. Furthermore, they compare the performance of individual parameter regression with alternative methods for dealing with parameter heterogeneity (e.g., regularization methods, structural equation models with interaction effects) in a simulation study.
Li et al. [
9] provide a tutorial on the sparse estimation of structural equation models (i.e., regularized structural equation modeling). Regularization techniques penalize the complexity of the model and can perform parameter selection in an automatic and completely data-driven way. Li et al. [
9] illustrate regularized structural equation modeling using a detailed example code in the R package
regsem.
Christensen and Golino [
10] investigate the assessment of sampling variability in exploratory graph analysis with a bootstrap approach. They conduct a simulation study to assess the suitability of several sampling statistics (i.e., descriptive statistics, structural consistency estimates, item stability statistics). Moreover, Christensen and Golino [
10] illustrate their method in the R package
EGAnet.
2.2. Item Response Modeling and Categorical Data Modeling
Beisemann et al. [
11] compare several acceleration methods for the expectation-maximization (EM) algorithm that is often prone to slow convergence. The acceleration techniques for the EM algorithm were applied to marginal maximum likelihood estimation of item response models and mixture models. Beisemann et al. [
11] showed that all three studied acceleration methods reduced the number of total log-likelihood evaluations. Hence, using them might be an important part of the implementation of efficient software.
Garnier-Villarreal et al. [
12] compare different estimation methods for multidimensional item response models in a large simulation study. They compare limited information methods such as implemented in
lavaan, marginal maximum likelihood estimation in
mirt, and Markov chain Monte Carlo estimation in the Stan software. The study of Garnier-Villarreal et al. [
12] provides recommendations for applied researchers on which estimation methods should be preferred in particular data-generating constellations.
Ulitzsch and Nestler [
13] also focus on estimating multidimensional item response models. The authors compare Markov-chain Monte Carlo estimation in Stan and marginal maximum likelihood estimation in the
TAM package with variational Bayes estimation implemented in Stan. Ulitzsch and Nestler [
13] conclude that variational Bayes was computationally much more efficient than Markov-chain Monte Carlo estimation but did not outperform marginal maximum likelihood estimation. Moreover, because variational Bayes estimates provide biased estimates of item discriminations, the authors argue that variational Bayes is not a viable alternative for estimating multidimensional item response models.
In the article of Kolbe et al. [
14], the association of two ordinal variables by means of polychoric correlations is studied. They show that the estimated polychoric correlation is biased if the underlying continuous latent variable is not bivariate and normally distributed. Kolbe et al. [
14] illustrate how various bivariate distributions could be fitted to ordinal data and examined how estimates of the polychoric correlation may vary under different distributional assumptions. As a conclusion, the authors noted that the bivariate normal or the bivariate skew–normal distribution might only rarely hold in empirical datasets.
Bulut et al. [
15] is a tutorial paper of the
eirm package that implements exploratory item response models. The functionality of the
eirm package includes traditional item response models (e.g., Rasch model, partial credit model, and rating scale model), item-explanatory models (i.e., a linear logistic test model), and person-explanatory models (i.e., latent regression models) for both dichotomous and polytomous responses. Bulut et al. [
15] illustrate the general functionality of the
eirm package with annotated R codes based on the Rosenberg self-esteem scale as a running empirical example.
Finnemann et al. [
16] is an introduction to the Ising model. They provided a conceptual introduction with a survey of Ising-related software packages in R. The authors use simulation studies to assess how the Ising model captures local-alignment dynamics. In the article, Finnemann et al. [
16] offer recommendations on when to use frequentist or Bayesian estimation for the Ising model.
The article of Feuerstahler [
17] is a tutorial paper for the
flexmet package that estimates the filtered monotonic polynomial item response model for dichotomous and polytomous items. This model is a semiparametric item response model that allows for more flexible function shapes and includes traditional item response models as special cases. The tutorial of Feuerstahler [
17] aims at providing both an introduction to the unique features of the filtered polynomial model and a guide to its implementation in the R package
flexmet.
Debelak and Debeer [
18] conduct a simulation study on detecting differential item functioning (DIF) for continuous covariates in multistage tests. The authors implement a linear logistic regression test and two score-based DIF tests in the R package
mstDIF. It turned out that the score-based tests had larger power against DIF effects than the linear logistic regression test.
Shi et al. [
19] show how to perform the analysis of a G-DINA model in the R packages
GDINA,
CDM, and
cdmTools. The G-DINA model framework is central to the literature of cognitive diagnostic modeling. The article provides an overview of several typical steps that are conducted in a G-DINA analysis: Q-matrix evaluation, estimation of the G-DINA model, model fit evaluation, item diagnosticity investigation, estimation of classification reliability, and the presentation and visualization of results.
Sorrel et al. [
20] provide an overview of recent developments in cognitive diagnosis computerized adaptive testing implemented in the R package
cdcatR. The package includes functionalities for data generation, model selection based on relative fit information, implementation of several item selection rules such as item exposure control, and the evaluation of performance in terms of classification accuracy, item exposure, and test length.
Heine and Stemmler [
21] present the application configural frequency analysis in the R package
confreq. The configural frequency analysis is a person-centered approach that analyzes the residuals of non-fitting models. The authors presented different kinds of configural frequency analyses: the first-order configural frequency analysis based on the null hypothesis of independence, configural frequency analysis with covariates, and the two-sample configural frequency analysis. Heine and Stemmler [
21] illustrate the estimation with R code using the
confreq package.
2.3. Missing Data and Synthetic Data
Keller [
22] provides a brief overview of the factored regression framework (i.e., sequential modeling) for imputing multiple missing data. The author describes the functional notation used to conceptualize the models and generate multiple imputations using this framework within the
Blimp software. A mediation model with accompanying code is used as an illustration.
Dai [
23] reviews the commonly used methods for dealing with missing item responses in psychometrics and examines their performance in a simulation study. Furthermore, the R package
TestDataImputation is used in an illustration with an example data set.
Volker and Vink [
24] outline a workflow for generating synthetic data with the multiple imputation software
mice. It was demonstrated in a simulation study that the analysis results obtained on synthetic data yielded unbiased and valid statistical inference. Volker and Vink [
24] argue that the ease of use when synthesizing data with
mice, along with the validity of inferences obtained, demonstrates rich possibilities for data dissemination.
2.4. Large-Scale Assessment Methodology
Mirazchiyski [
25] introduce the R package
RALSA (R analyzer for large-scale assessments) for the analysis of international, educational, large-scale assessment data. The article focuses on the technical aspects of
RALSA. The use of the
data.table package for memory efficiency, speed, and efficient computations is illustrated using examples. Mirazchiyski [
25] mention the utilization of code reuse practices to achieve consistency, efficiency, and safety in the computations performed by the analysis functions of the
RALSA package.
Becker et al. [
26] introduce the R package
eatATA, which allows the usage of several mixed-integer programming solvers for automated test assembly. The general functionality and the common workflow of
eatATA are presented using a minimal example and four more complex use cases.
In Gary et al. [
27], it is explained how to model norm scores with the R package
cNORM. The
cNORM package is designed to determine norm scores when the latent ability to be measured varies with age or other explanatory variables. Gary et al. [
27] briefly introduce the statistical modeling behind the implementation and apply their proposed method using a real dataset from a reading comprehension test.
Andersen and Zehner [
28] introduce the
shinyReCoR Shiny app that utilizes a cluster-based method for automatically coding open-ended text responses. The app guides users through the complete workflow such as text corpus compilation, semantic space building, preprocessing of the text data, and clustering.
Ludwig et al. [
29] apply a transformer-based approach to automated essay scoring in the Python software and compared it with the bag of words approach. The authors argue that the transformer-based approach has significant advantages, while a bag of words approach suffers from not taking word order into account and reducing the words to their stem. Furthermore, it is demonstrated how such models could improve the accuracy of human ratings.