Bias and Linking Error in Fixed Item Parameter Calibration
Abstract
:1. Introduction
2. Analytical Derivation of Bias and Linking Error
3. Simulation Study
3.1. Methods
3.2. Results
4. Discussion
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
- The following abbreviations are used in this manuscript:
2PL | two-parameter logistic |
FIPC | fixed item parameter calibration |
Inf | infinite sample size |
IRF | item response function |
IRT | item response theory |
RMSE | root mean square error |
References
- Bock, R.D.; Moustaki, I. Item response theory in a general framework. In Handbook of Statistics, Vol. 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 469–513. [Google Scholar] [CrossRef]
- Chen, Y.; Li, X.; Liu, J.; Ying, Z. Item Response Theory—A Statistical Framework for Educational and Psychological Measurement. 2024. Ahead of Print. Available online: https://arxiv.org/abs/2108.08604 (accessed on 4 August 2024).
- De Ayala, R.J. The Theory and Practice of Item Response Theory; Guilford Publications: New York, NY, USA, 2022. [Google Scholar]
- Formann, A.K.; Kohlmann, T. Structural latent class models. Sociol. Methods Res. 1998, 26, 530–565. [Google Scholar] [CrossRef]
- Martinková, P.; Hladká, A. Computational Aspects of Psychometric Methods: With R; Chapman and Hall/CRC: Boca Raton, FL, USA, 2023. [Google Scholar] [CrossRef]
- Noventa, S.; Heller, J.; Kelava, A. Toward a unified perspective on assessment models, part I: Foundations of a framework. J. Math. Psychol. 2024, 122, 102872. [Google Scholar] [CrossRef]
- Yen, W.M.; Fitzpatrick, A.R. Item response theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger Publishers: Westport, WA, USA, 2006; pp. 111–154. [Google Scholar]
- van der Linden, W.J. Unidimensional logistic response models. In Handbook of Item Response Theory, Volume 1: Models; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 11–30. [Google Scholar] [CrossRef]
- Meijer, R.R.; Tendeiro, J.N. Unidimensional item response theory. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test; Irwing, P., Booth, T., Hughes, D.J., Eds.; Wiley: New York, NY, USA, 2018; pp. 413–443. [Google Scholar] [CrossRef]
- Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
- von Davier, M. A general diagnostic model applied to language testing data. Br. J. Math. Stat. Psychol. 2008, 61, 287–307. [Google Scholar] [CrossRef] [PubMed]
- von Davier, M.; Yamamoto, K. Partially observed mixtures of IRT models: An extension of the generalized partial-credit model. Appl. Psychol. Meas. 2004, 28, 389–406. [Google Scholar] [CrossRef]
- Bock, R.D.; Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 1981, 46, 443–459. [Google Scholar] [CrossRef]
- Glas, C.A.W. Maximum-likelihood estimation. In Handbook of Item Response Theory, Vol. 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 197–216. [Google Scholar] [CrossRef]
- Kang, T.; Petersen, N.S. Linking item parameters to a base scale. Asia Pac. Educ. Rev. 2012, 13, 311–321. [Google Scholar] [CrossRef]
- Kim, K.Y. Two IRT fixed parameter calibration methods for the bifactor model. J. Educ. Meas. 2020, 57, 29–50. [Google Scholar] [CrossRef]
- Kim, S. A comparative study of IRT fixed parameter calibration methods. J. Educ. Meas. 2006, 43, 355–381. [Google Scholar] [CrossRef]
- Kim, S.; Kolen, M.J. Application of IRT fixed parameter calibration to multiple-group test data. Appl. Meas. Educ. 2019, 32, 310–324. [Google Scholar] [CrossRef]
- König, C.; Khorramdel, L.; Yamamoto, K.; Frey, A. The benefits of fixed item parameter calibration for parameter accuracy in small sample situations in large-scale assessments. Educ. Meas. Issues Pract. 2021, 40, 17–27. [Google Scholar] [CrossRef]
- Magis, D.; Béland, S.; Tuerlinckx, F.; De Boeck, P. A general framework and an R package for the detection of dichotomous differential item functioning. Behav. Res. Methods 2010, 42, 847–862. [Google Scholar] [CrossRef]
- Mellenbergh, G.J. Item bias and item response theory. Int. J. Educ. Res. 1989, 13, 127–143. [Google Scholar] [CrossRef]
- Millsap, R.E. Statistical Approaches to Measurement Invariance; Routledge: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
- Penfield, R.D.; Camilli, G. Differential item functioning and item bias. In Handbook of Statistics, Vol. 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 125–167. [Google Scholar] [CrossRef]
- Soares, T.M.; Gonçalves, F.B.; Gamerman, D. An integrated Bayesian model for DIF analysis. J. Educ. Behav. Stat. 2009, 34, 348–377. [Google Scholar] [CrossRef]
- Michaelides, M.P.; Haertel, E.H. Selection of common items as an unrecognized source of variability in test equating: A bootstrap approximation assuming random sampling of common items. Appl. Meas. Educ. 2014, 27, 46–57. [Google Scholar] [CrossRef]
- Monseur, C.; Berezner, A. The computation of equating errors in international surveys in education. J. Appl. Meas. 2007, 8, 323–335. Available online: https://bit.ly/2WDPeqD (accessed on 4 August 2024). [PubMed]
- Robitzsch, A. Linking error in the 2PL model. J 2023, 6, 58–84. [Google Scholar] [CrossRef]
- Robitzsch, A. Estimation of standard error, linking error, and total error for robust and nonrobust linking methods in the two-parameter logistic model. Stats 2024, 7, 592–612. [Google Scholar] [CrossRef]
- Sachse, K.A.; Roppelt, A.; Haag, N. A comparison of linking methods for estimating national trends in international comparative large-scale assessments in the presence of cross-national DIF. J. Educ. Meas. 2016, 53, 152–171. [Google Scholar] [CrossRef]
- Sachse, K.A.; Haag, N. Standard errors for national trends in international large-scale assessments in the case of cross-national differential item functioning. Appl. Meas. Educ. 2017, 30, 102–116. [Google Scholar] [CrossRef]
- Wu, M. Measurement, sampling, and equating errors in large-scale assessments. Educ. Meas. Issues Pract. 2010, 29, 15–27. [Google Scholar] [CrossRef]
- Robitzsch, A. Bias-reduced Haebara and Stocking-Lord linking. J 2024, 7, 373–384. [Google Scholar] [CrossRef]
- Robitzsch, A. SIMEX-based and analytical bias corrections in Stocking-Lord linking. Analytics 2024, 3, 368–388. [Google Scholar] [CrossRef]
- De Boeck, P. Random item IRT models. Psychometrika 2008, 73, 533–559. [Google Scholar] [CrossRef]
- Fox, J.P. Bayesian Item Response Modeling; Springer: New York, NY, USA, 2010. [Google Scholar] [CrossRef]
- Fox, J.P.; Verhagen, A.J. Random item effects modeling for cross-national survey data. In Cross-Cultural Analysis: Methods and Applications; Davidov, E., Schmidt, P., Billiet, J., Eds.; Routledge: London, UK, 2010; pp. 461–482. [Google Scholar] [CrossRef]
- de Jong, M.G.; Steenkamp, J.B.E.M.; Fox, J.P. Relaxing measurement invariance in cross-national consumer research using a hierarchical IRT model. J. Consum. Res. 2007, 34, 260–278. [Google Scholar] [CrossRef]
- Longford, N.T.; Holland, P.W.; Thayer, D.T. Stability of the MH D-DIF statistics across populations. In Differential Item Functioning; Holland, P.W., Wainer, H., Eds.; Routledge: London, UK, 1993; pp. 171–196. [Google Scholar] [CrossRef]
- Van den Noortgate, W.; De Boeck, P. Assessing and explaining differential item functioning using logistic mixed models. J. Educ. Behav. Stat. 2005, 30, 443–464. [Google Scholar] [CrossRef]
- Robitzsch, A. Robust and nonrobust linking of two groups for the Rasch model with balanced and unbalanced random DIF: A comparative simulation study and the simultaneous assessment of standard errors and linking errors with resampling techniques. Symmetry 2021, 13, 2198. [Google Scholar] [CrossRef]
- Bock, R.D.; Gibbons, R.D. Item Response Theory; Wiley: Hoboken, NJ, USA, 2021. [Google Scholar] [CrossRef]
- Boos, D.D.; Stefanski, L.A. Essential Statistical Inference; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
- Penfield, R.D.; Algina, J. A generalized DIF effect variance estimator for measuring unsigned differential test functioning in mixed format tests. J. Educ. Meas. 2006, 43, 295–312. [Google Scholar] [CrossRef]
- Morris, T.P.; White, I.R.; Crowther, M.J. Using simulation studies to evaluate statistical methods. Stat. Med. 2019, 38, 2074–2102. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2023; Available online: https://www.R-project.org (accessed on 15 March 2023).
- Robitzsch, A.; Kiefer, T.; Wu, M. TAM: Test Analysis Modules. 2024. R Package Version 4.2-21. Available online: https://doi.org/10.32614/CRAN.package.TAM (accessed on 19 February 2024).
- Bechger, T.M.; Maris, G. A statistical test for differential item pair functioning. Psychometrika 2015, 80, 317–340. [Google Scholar] [CrossRef]
- Camilli, G. The case against item bias detection techniques based on internal criteria: Do item bias procedures obscure test fairness issues? In Differential Item Functioning: Theory and Practice; Holland, P.W., Wainer, H., Eds.; Erlbaum: Hillsdale, NJ, USA, 1993; pp. 397–417. [Google Scholar]
- Doebler, A. Looking at DIF from a new perspective: A structure-based approach acknowledging inherent indefinability. Appl. Psychol. Meas. 2019, 43, 303–321. [Google Scholar] [CrossRef]
- Robitzsch, A.; Lüdtke, O. A review of different scaling approaches under full invariance, partial invariance, and noninvariance for cross-sectional country comparisons in large-scale assessments. Psychol. Test Assess. Model. 2020, 62, 233–279. Available online: https://bit.ly/3ezBB05 (accessed on 4 August 2024).
- Lord, F.M.; Novick, R. Statistical Theories of Mental Test Scores; Addison-Wesley: Reading, MA, USA, 1968. [Google Scholar]
- Bolt, D.M.; Deng, S.; Lee, S. IRT model misspecification and measurement of growth in vertical scaling. J. Educ. Meas. 2014, 51, 141–162. [Google Scholar] [CrossRef]
- Loken, E.; Rulison, K.L. Estimation of a four-parameter item response theory model. Br. J. Math. Stat. Psychol. 2010, 63, 509–525. [Google Scholar] [CrossRef] [PubMed]
- Shim, H.; Bonifay, W.; Wiedermann, W. Parsimonious asymmetric item response theory modeling with the complementary log-log link. Behav. Res. Methods 2023, 55, 200–219. [Google Scholar] [CrossRef]
- Kolen, M.J.; Brennan, R.L. Test Equating, Scaling, and Linking; Springer: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
- Lee, W.C.; Lee, G. IRT linking and equating. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test; Irwing, P., Booth, T., Hughes, D.J., Eds.; Wiley: New York, NY, USA, 2018; pp. 639–673. [Google Scholar] [CrossRef]
- von Davier, M.; von Davier, A.A. A unified approach to IRT scale linking and scale transformations. Methodology 2007, 3, 115–124. [Google Scholar] [CrossRef]
- von Davier, A.A.; Carstensen, C.H.; von Davier, M. Linking Competencies in Educational Settings and Measuring Growth; (Research Report No. RR-06-12); Educational Testing Service: Princeton, NJ, USA, 2006. [Google Scholar] [CrossRef]
- Haberman, S.J. Linking Parameter Estimates Derived from an Item Response Model through Separate Calibrations; (Research Report No. RR-09-40); Educational Testing Service: Princeton, NJ, USA, 2009. [Google Scholar] [CrossRef]
- Monseur, C.; Sibberns, H.; Hastedt, D. Linking errors in trend estimation for international surveys in education. IERI Monogr. Ser. 2008, 1, 113–122. [Google Scholar]
- Robitzsch, A. Analytical approximation of the jackknife linking error in item response models utilizing a Taylor expansion of the log-likelihood function. AppliedMath 2023, 3, 49–59. [Google Scholar] [CrossRef]
- Martin, M.O.; Mullis, I.V.S.; Foy, P.; Brossman, B.; Stanco, G.M. Estimating linking error in PIRLS. IERI Monogr. Ser. 2012, 5, 35–47. [Google Scholar]
- Robitzsch, A. A comparison of linking methods for two groups for the two-parameter logistic item response model in the presence and absence of random differential item functioning. Foundations 2021, 1, 116–144. [Google Scholar] [CrossRef]
- Foy, P.; Yin, L. Scaling the PIRLS 2016 achievement data. In Methods and Procedures in PIRLS 2016; Martin, M.O., Mullis, I.V., Hooper, M., Eds.; IEA: Boston College, Chestnut Hill, MA, USA, 2017. [Google Scholar]
- Foy, P.; Fishbein, B.; von Davier, M.; Yin, L. Implementing the TIMSS 2019 scaling methodology. In Methods and Procedures: TIMSS 2019 Technical Report; Martin, M.O., von Davier, M., Mullis, I.V., Eds.; IEA: Boston College, Chestnut Hill, MA, USA, 2020. [Google Scholar]
- Rutkowski, L.; von Davier, M.; Rutkowski, D. (Eds.) A Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis; Chapman Hall/CRC Press: London, UK, 2013. [Google Scholar] [CrossRef]
- OECD. PISA 2018. Technical Report; OECD: Paris, France, 2020; Available online: https://bit.ly/3zWbidA (accessed on 4 August 2024).
- OECD. PISA 2012. Technical Report; OECD: Paris, France, 2014; Available online: https://bit.ly/2YLG24g (accessed on 4 August 2024).
- Oliveri, M.E.; von Davier, M. Investigation of model fit and score scale comparability in international assessments. Psychol. Test Assess. Model. 2011, 53, 315–333. Available online: https://bit.ly/3k4K9kt (accessed on 4 August 2024).
- von Davier, M.; Yamamoto, K.; Shin, H.J.; Chen, H.; Khorramdel, L.; Weeks, J.; Davis, S.; Kong, N.; Kandathil, M. Evaluating item response theory linking and model fit for data from PISA 2000–2012. Assess. Educ. Princ. Policy Pract. 2019, 26, 466–488. [Google Scholar] [CrossRef]
Bias | RMSE | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
N | N | ||||||||||
Par | 250 | 500 | 1000 | 2000 | Inf | 250 | 500 | 1000 | 2000 | Inf | |
0 | −0.001 | −0.001 | 0.000 | 0.000 | 0.000 | 0.086 | 0.061 | 0.043 | 0.030 | 0.000 | |
0.1 | 0.000 | 0.000 | 0.000 | 0.000 | −0.001 | 0.091 | 0.067 | 0.051 | 0.041 | 0.027 | |
0.2 | −0.003 | −0.001 | −0.002 | −0.003 | −0.002 | 0.102 | 0.081 | 0.069 | 0.062 | 0.054 | |
0.3 | −0.005 | −0.004 | −0.006 | −0.005 | −0.006 | 0.118 | 0.102 | 0.091 | 0.086 | 0.081 | |
0.4 | −0.010 | −0.009 | −0.008 | −0.010 | −0.010 | 0.139 | 0.124 | 0.116 | 0.111 | 0.108 | |
0.5 | −0.014 | −0.014 | −0.015 | −0.015 | −0.016 | 0.160 | 0.148 | 0.142 | 0.137 | 0.134 | |
0.6 | −0.020 | −0.020 | −0.021 | −0.021 | −0.022 | 0.182 | 0.171 | 0.166 | 0.163 | 0.161 | |
0 | −0.004 | −0.002 | −0.001 | 0.000 | 0.000 | 0.075 | 0.053 | 0.038 | 0.027 | 0.000 | |
0.1 | −0.005 | −0.003 | −0.002 | −0.002 | −0.001 | 0.077 | 0.055 | 0.040 | 0.031 | 0.016 | |
0.2 | −0.009 | −0.007 | −0.005 | −0.005 | −0.005 | 0.081 | 0.062 | 0.049 | 0.041 | 0.032 | |
0.3 | −0.013 | −0.013 | −0.011 | −0.011 | −0.010 | 0.089 | 0.071 | 0.061 | 0.054 | 0.048 | |
0.4 | −0.022 | −0.021 | −0.019 | −0.019 | −0.018 | 0.098 | 0.083 | 0.074 | 0.069 | 0.064 | |
0.5 | −0.033 | −0.031 | −0.030 | −0.028 | −0.029 | 0.109 | 0.096 | 0.088 | 0.083 | 0.079 | |
0.6 | −0.044 | −0.043 | −0.042 | −0.041 | −0.041 | 0.121 | 0.109 | 0.102 | 0.099 | 0.096 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Robitzsch, A. Bias and Linking Error in Fixed Item Parameter Calibration. AppliedMath 2024, 4, 1181-1191. https://doi.org/10.3390/appliedmath4030063
Robitzsch A. Bias and Linking Error in Fixed Item Parameter Calibration. AppliedMath. 2024; 4(3):1181-1191. https://doi.org/10.3390/appliedmath4030063
Chicago/Turabian StyleRobitzsch, Alexander. 2024. "Bias and Linking Error in Fixed Item Parameter Calibration" AppliedMath 4, no. 3: 1181-1191. https://doi.org/10.3390/appliedmath4030063
APA StyleRobitzsch, A. (2024). Bias and Linking Error in Fixed Item Parameter Calibration. AppliedMath, 4(3), 1181-1191. https://doi.org/10.3390/appliedmath4030063