Rapid Guessing in Low-Stakes Assessments: Finding the Optimal Response Time Threshold with Random Search and Genetic Algorithm
Abstract
:1. Introduction
2. Background
2.1. Rapid Guessing in Low-Stakes Testing
2.2. Using Response Times to Identify Rapid Guesses
2.3. Limitations of the NT and CUMP Methods
3. Methods
3.1. Data Source
3.2. Data Analysis
Setting RT Thresholds with Random Search
3.3. Setting RT Thresholds with Genetic Algorithm
3.4. Considering Other Optimization Criteria
3.5. Evaluating the RT Thresholds
4. Results
4.1. Research Question 1: Does the Proposed Data-Driven Approach Produce Viable RT Thresholds for Detecting Rapid Guessing in Low-Stakes Assessments?
4.2. Research Question 2: How Consistent Are the Responses Identified as Rapid Guessing across the Different Threshold-Setting Methods?
5. Discussion
Limitations and Future Research
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AIC | Akaike’s Information Criterion |
BIC | Bayesian Information Criterion |
CUMP | Cumulative proportion method |
HQ | Hannan–Quinn criterion |
ICC | Item characteristic curve |
IRT | Item response theory |
MLN | Mixture log-normal method |
NT | Normative threshold method |
RT | Response time |
SABIC | Sample-size-adjusted BIC |
2PL | Two-parameter logistic model |
References
- Jensen, N.; Rice, A.; Soland, J. The influence of rapidly guessed item responses on teacher value-added estimates: Implications for policy and practice. Educ. Eval. Policy Anal. 2018, 40, 267–284. [Google Scholar] [CrossRef]
- Goldhammer, F.; Naumann, J.; Stelter, A.; Tóth, K.; Rölke, H.; Klieme, E. The time on task effect in reading and problem solving is moderated by task difficulty and skill: Insights from a computer-based large-scale assessment. J. Educ. Psychol. 2014, 106, 608–626. [Google Scholar] [CrossRef]
- Gorgun, G.; Bulut, O. A polytomous scoring approach to handle not-reached items in low-stakes assessments. Educ. Psychol. Meas. 2021, 81, 847–871. [Google Scholar] [CrossRef]
- Guo, H.; Rios, J.A.; Haberman, S.; Liu, O.L.; Wang, J.; Paek, I. A new procedure for detection of students’ rapid guessing responses using response time. Appl. Meas. Educ. 2016, 29, 173–183. [Google Scholar] [CrossRef]
- Wise, S.L.; Ma, L. Setting response time thresholds for a cat item pool: The normative threshold method. In Proceedings of the Annual Meeting of the National Council on Measurement in Education, Vancouver, BC, Canada, 13–17 April 2012; pp. 163–183. [Google Scholar]
- Wise, S.L.; Gao, L. A general approach to measuring test-taking effort on computer-based tests. Appl. Meas. Educ. 2017, 30, 343–354. [Google Scholar] [CrossRef]
- Wise, S.L. An information-based approach to identifying rapid-guessing thresholds. Appl. Meas. Educ. 2019, 32, 325–336. [Google Scholar] [CrossRef]
- Rios, J.A.; Guo, H. Can culture be a salient predictor of test-taking engagement? An analysis of differential noneffortful responding on an international college-level assessment of critical thinking. Appl. Meas. Educ. 2020, 33, 263–279. [Google Scholar] [CrossRef]
- Soland, J.; Kuhfeld, M.; Rios, J. Comparing different response time threshold setting methods to detect low effort on a large-scale assessment. Large-Scale Assessments Educ. 2021, 9, 8. [Google Scholar] [CrossRef]
- Rios, J.A.; Deng, J. Does the choice of response time threshold procedure substantially affect inferences concerning the identification and exclusion of rapid guessing responses? A meta-analysis. Large-Scale Assessments Educ. 2021, 9, 18. [Google Scholar] [CrossRef]
- Kroehne, U.; Deribo, T.; Goldhammer, F. Rapid guessing rates across administration mode and test setting. Psychol. Test Assess. Model. 2020, 62, 147–177. [Google Scholar]
- Lindner, M.A.; Lüdtke, O.; Nagy, G. The onset of rapid-guessing behavior over the course of testing time: A matter of motivation and cognitive resources. Front. Psychol. 2019, 10, 1533. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Koretz, D.M. Limitations in the use of achievement tests as measures of educators’ productivity. J. Hum. Resour. 2002, 37, 752–777. [Google Scholar] [CrossRef]
- AERA; APA; NCME. Chapter 1: Validity. In Standards for Educational and Psychological Testing; American Educational Research Association: Washington, DC, USA, 2014; pp. 11–31. [Google Scholar]
- Finn, B. Measuring motivation in low-stakes assessments. ETS Res. Rep. Ser. 2015, 2015, 1–17. [Google Scholar] [CrossRef]
- Setzer, J.C.; Wise, S.L.; van den Heuvel, J.R.; Ling, G. An investigation of examinee test-taking effort on a large-scale assessment. Appl. Meas. Educ. 2013, 26, 34–49. [Google Scholar] [CrossRef]
- Wise, S.L.; DeMars, C.E. Low examinee effort in low-stakes assessment: Problems and potential solutions. Educ. Assess. 2005, 10, 1–17. [Google Scholar] [CrossRef]
- Kroehne, U.; Hahnel, C.; Goldhammer, F. Invariance of the response processes between gender and modes in an assessment of reading. Front. Appl. Math. Stat. 2019, 5, 2. [Google Scholar] [CrossRef]
- Swerdzewski, P.J.; Harmes, J.C.; Finney, S.J. Two approaches for identifying low-motivated students in a low-stakes assessment context. Appl. Meas. Educ. 2011, 24, 162–188. [Google Scholar] [CrossRef]
- Wise, S.L.; Kong, X. Response time effort: A new measure of examinee motivation in computer-based tests. Appl. Meas. Educ. 2005, 18, 163–183. [Google Scholar] [CrossRef]
- Nagy, G.; Ulitzsch, E.; Lindner, M.A. The role of rapid guessing and test-taking persistence in modelling test-taking engagement. J. Comput. Assist. Learn. 2022. [Google Scholar] [CrossRef]
- Rios, J.A. Assessing the accuracy of parameter estimates in the presence of rapid guessing misclassifications. Educ. Psychol. Meas. 2022, 82, 122–150. [Google Scholar] [CrossRef]
- Osborne, J.W.; Blanchard, M.R. Random responding from participants is a threat to the validity of social science research results. Front. Psychol. 2011, 1, 220. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wise, S.L.; DeMars, C.E. Examinee noneffort and the validity of program assessment results. Educ. Assess. 2010, 15, 27–41. [Google Scholar] [CrossRef]
- Rios, J.A.; Deng, J.; Ihlenfeldt, S.D. To What Degree Does Rapid Guessing Distort Aggregated Test Scores? A Meta-analytic Investigation. Educ. Assess. 2022, 27, 356–373. [Google Scholar] [CrossRef]
- Rios, J.A. Is differential noneffortful responding associated with type I error in measurement invariance testing? Educ. Psychol. Meas. 2021, 81, 957–979. [Google Scholar] [CrossRef] [PubMed]
- Deng, J.; Rios, J.A. Investigating the Effect of Differential Rapid Guessing on Population Invariance in Equating. Appl. Psychol. Meas. 2022, 46, 589–604. [Google Scholar] [CrossRef]
- Schnipke, D.L. Assessing Speededness in Computer-Based Tests Using Item Response Times; The Johns Hopkins University: Baltimore, MD, USA, 1996. [Google Scholar]
- Wise, S.L. An investigation of the differential effort received by items on a low-stakes computer-based test. Appl. Meas. Educ. 2006, 19, 95–114. [Google Scholar] [CrossRef]
- Hadiana, D.; Hayat, B.; Tola, B. A new method for setting response-time threshold to detect test takers’ rapid guessing behavior. In International Conference on Educational Assessment and Policy (ICEAP 2020); Atlantis Press: Amsterdam, The Netherlands, 2021. [Google Scholar] [CrossRef]
- Kong, X.J.; Wise, S.L.; Bhola, D.S. Setting the response time threshold parameter to differentiate solution behavior from rapid-guessing behavior. Educ. Psychol. Meas. 2007, 67, 606–619. [Google Scholar] [CrossRef]
- De Ayala, R.J. The Theory and Practice of Item Response Theory; Methodology in the Social Sciences; Guilford Press: New York, NY, USA, 2009. [Google Scholar]
- Bolsinova, M.; de Boeck, P.; Tijmstra, J. Modelling conditional dependence between response time and accuracy. Psychometrika 2017, 82, 1126–1148. [Google Scholar] [CrossRef]
- Gierl, M.J.; Bulut, O.; Guo, Q.; Zhang, X. Developing, Analyzing, and Using Distractors for Multiple-Choice Tests in Education: A Comprehensive Review. Rev. Educ. Res. 2017, 87, 1082–1116. [Google Scholar] [CrossRef]
- Shin, J.; Bulut, O.; Gierl, M.J. The Effect of the Most-Attractive-Distractor Location on Multiple-Choice Item Difficulty. J. Exp. Educ. 2020, 88, 643–659. [Google Scholar] [CrossRef]
- Tellinghuisen, J.; Sulikowski, M.M. Does the answer order matter on multiple-choice exams? J. Chem. Educ. 2008, 85, 572. [Google Scholar] [CrossRef]
- Lee, Y.H.; Jia, Y. Using response time to investigate students’ test-taking behaviors in a NAEP computer-based study. Large-Scale Assess. Educ. 2014, 2, 1–24. [Google Scholar] [CrossRef]
- Scrucca, L. GA: A Package for Genetic Algorithms in R. J. Stat. Softw. 2013, 53, 1–37. [Google Scholar] [CrossRef]
- Richter, J. Randomsearch: Random Search for Expensive Functions. 2022. Available online: https://jakob-r.de/randomsearch/index.html (accessed on 5 February 2023).
- Chalmers, P. Mirt: Multidimensional Item Response Theory. 2022. Available online: https://cran.r-project.org/web/packages/mirt/mirt.pdf (accessed on 5 February 2023).
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
- Zabinsky, Z.B. . Random Search Algorithms; Department of Industrial and Systems Engineering, University of Washington: Seattle, WA, USA, 2009. [Google Scholar]
- Romeijn, H.E. Random Search Methods. In Encyclopedia of Optimization; Floudas, C.A., Pardalos, P.M., Eds.; Springer: Boston, MA, USA, 2009; pp. 3245–3251. [Google Scholar] [CrossRef]
- Ramasubramanian, K.; Singh, A. Machine Learning Using R: With Time Series and Industry-Based Use Cases in R; Springer Science+Business Media: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
- Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
- Osteen, P. An introduction to using multidimensional item response theory to assess latent factor structures. J. Soc. Soc. Work Res. 2010, 1, 66–82. [Google Scholar] [CrossRef]
- Holland, J.H. Genetic algorithms. Sci. Am. 1992, 267, 66–73. [Google Scholar] [CrossRef]
- Popa, R. Genetic Algorithms in Applications; IntechOpen: Rijeka, Crotia, 2012. [Google Scholar]
- Katoch, S.; Chauhan, S.S.; Kumar, V. A review on genetic algorithm: Past, present, and future. Multimed. Tools Appl. 2021, 80, 8091–8126. [Google Scholar] [CrossRef]
- Mitchell, M. An Introduction to Genetic Algorithms; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Goldberg, D.E.; Korb, B.; Deb, K. Messy genetic algorithms: Motivation, analysis, and first results. Complex Syst. 1989, 3, 493–530. [Google Scholar]
- Leite, W.L.; Huang, I.C.; Marcoulides, G.A. Item selection for the development of short forms of scales using an ant colony optimization algorithm. Multivar. Behav. Res. 2008, 43, 411–431. [Google Scholar] [CrossRef] [PubMed]
- Wise, S.L.; Kuhfeld, M.R. Using retest data to evaluate and improve effort-moderated scoring. J. Educ. Meas. 2021, 58, 130–149. [Google Scholar] [CrossRef]
- Deribo, T.; Kroehne, U.; Goldhammer, F. Model-Based Treatment of Rapid Guessing. J. Educ. Meas. 2021, 58, 281–303. [Google Scholar] [CrossRef]
- Gorgun, G.; Bulut, O. Considering Disengaged Responses in Bayesian and Deep Knowledge Tracing. In Proceedings of the Artificial Intelligence in Education, Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners’ and Doctoral Consortium, Durham, UK, 27–31 July 2022; Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 591–594. [Google Scholar] [CrossRef]
- Toton, S.L.; Maynes, D.D. Detecting Examinees With Pre-knowledge in Experimental Data Using Conditional Scaling of Response Times. Front. Educ. 2019, 4. [Google Scholar] [CrossRef]
- Liashchynskyi, P.; Liashchynskyi, P. Grid search, random search, genetic algorithm: A big comparison for NAS. arXiv 2019, arXiv:1912.06059. [Google Scholar]
- Wise, S.L.; DeMars, C.E. An application of item response time: The effort-moderated IRT model. J. Educ. Meas. 2006, 43, 19–38. [Google Scholar] [CrossRef]
- Rios, J.A.; Deng, J. Quantifying the distorting effect of rapid guessing on estimates of coefficient alpha. Appl. Psychol. Meas. 2022, 46, 40–52. [Google Scholar] [CrossRef]
- Ulitzsch, E.; Ulitzsch, V.; He, Q.; Lüdtke, O. A machine learning-based procedure for leveraging clickstream data to investigate early predictability of failure on interactive tasks. Behav. Res. Methods 2022, 1–21. [Google Scholar] [CrossRef]
Item | Average Response Time (s) | Proportion-Correct (%) | ||||
---|---|---|---|---|---|---|
Baseline | Random Search | Genetic Algorithm | Baseline | Random Search | Genetic Algorithm | |
1 | 3.602 | 3.790 | 3.790 | 61.576 | 64.155 | 64.155 |
2 | 4.689 | 5.099 | 5.099 | 50.688 | 55.859 | 55.859 |
3 | 4.284 | 4.595 | 4.593 | 53.479 | 57.955 | 57.918 |
4 | 4.814 | 5.275 | 5.280 | 48.840 | 54.305 | 54.376 |
5 | 5.557 | 6.234 | 6.234 | 45.971 | 52.483 | 52.483 |
6 | 3.420 | 3.596 | 3.596 | 55.366 | 58.165 | 58.165 |
7 | 4.623 | 5.008 | 5.008 | 51.789 | 56.733 | 56.733 |
8 | 4.833 | 5.290 | 5.292 | 49.175 | 54.581 | 54.617 |
9 | 5.180 | 5.745 | 5.747 | 48.192 | 54.197 | 54.199 |
10 | 4.904 | 5.339 | 5.343 | 47.072 | 51.885 | 51.941 |
11 | 4.685 | 5.042 | 5.048 | 50.924 | 55.363 | 55.446 |
12 | 4.677 | 5.076 | 5.084 | 50.727 | 55.678 | 55.786 |
13 | 4.246 | 4.543 | 4.545 | 51.258 | 55.372 | 55.407 |
14 | 4.969 | 5.467 | 5.463 | 46.403 | 51.845 | 51.788 |
15 | 5.717 | 6.389 | 6.397 | 46.010 | 52.291 | 52.372 |
16 | 4.509 | 4.864 | 4.864 | 48.094 | 52.443 | 52.443 |
17 | 4.436 | 4.801 | 4.801 | 52.909 | 57.930 | 57.930 |
18 | 4.127 | 4.420 | 4.420 | 46.796 | 50.574 | 50.574 |
19 | 5.109 | 5.665 | 5.665 | 47.170 | 53.101 | 53.101 |
20 | 6.286 | 7.127 | 7.151 | 45.185 | 52.078 | 52.304 |
21 | 3.643 | 3.857 | 3.858 | 59.198 | 62.461 | 62.474 |
22 | 4.585 | 4.991 | 4.993 | 50.924 | 56.090 | 56.114 |
23 | 4.109 | 4.413 | 4.413 | 54.167 | 58.763 | 58.776 |
24 | 4.862 | 5.314 | 5.314 | 49.921 | 55.314 | 55.314 |
25 | 5.844 | 6.593 | 6.593 | 45.480 | 52.259 | 52.259 |
26 | 4.402 | 4.711 | 4.712 | 55.464 | 59.864 | 59.89 |
27 | 4.974 | 5.437 | 5.439 | 49.076 | 54.377 | 54.401 |
28 | 4.235 | 4.548 | 4.549 | 54.186 | 58.785 | 58.797 |
29 | 4.632 | 5.034 | 5.039 | 50.236 | 55.226 | 55.298 |
30 | 4.884 | 5.357 | 5.359 | 46.443 | 51.695 | 51.729 |
31 | 4.466 | 4.799 | 4.799 | 49.981 | 54.245 | 54.256 |
32 | 3.997 | 4.255 | 4.255 | 55.425 | 59.531 | 59.531 |
33 | 4.518 | 4.870 | 4.870 | 49.332 | 53.724 | 53.713 |
34 | 4.839 | 5.272 | 5.280 | 47.229 | 52.115 | 52.216 |
35 | 4.839 | 5.280 | 5.280 | 47.543 | 51.118 | 51.118 |
36 | 4.856 | 5.259 | 5.258 | 45.951 | 50.269 | 50.258 |
37 | 4.090 | 4.373 | 4.374 | 54.481 | 58.779 | 58.791 |
38 | 4.442 | 4.784 | 4.787 | 59.355 | 64.627 | 64.682 |
39 | 5.180 | 5.707 | 5.691 | 47.956 | 53.528 | 53.341 |
40 | 5.327 | 5.869 | 5.876 | 46.914 | 52.473 | 52.554 |
Item | Average Response Time (s) | Proportion-Correct | ||
---|---|---|---|---|
Random Search | Genetic Algorithm | Random Search | Genetic Algorithm | |
1 | 0.224 | 0.419 | 4.098 | 17.958 |
2 | 0.660 | 0.667 | 0 | 0 |
3 | 0.554 | 0.566 | 0 | 0 |
4 | 0.697 | 0.703 | 0 | 0 |
5 | 0.834 | 0.833 | 0.469 | 0.470 |
6 | 0.405 | 0.405 | 7.473 | 7.473 |
7 | 0.628 | 0.628 | 0.447 | 0.447 |
8 | 0.668 | 0.680 | 0 | 0 |
9 | 0.727 | 0.727 | 0.874 | 0.874 |
10 | 0.659 | 0.663 | 0 | 0 |
11 | 0.595 | 0.595 | 0 | 0 |
12 | 0.628 | 0.632 | 0.436 | 0.433 |
13 | 0.550 | 0.551 | 0 | 0 |
14 | 0.711 | 0.714 | 0 | 0 |
15 | 0.823 | 0.825 | 0.161 | 0.160 |
16 | 0.573 | 0.576 | 0 | 0 |
17 | 0.591 | 0.591 | 0 | 0 |
18 | 0.529 | 0.529 | 0.262 | 0.262 |
19 | 0.735 | 0.735 | 0.523 | 0.523 |
20 | 0.898 | 0.904 | 0.853 | 0.846 |
21 | 0.413 | 0.413 | 10.095 | 10.159 |
22 | 0.628 | 0.619 | 0.634 | 0.644 |
23 | 0.526 | 0.525 | 0 | 0 |
24 | 0.677 | 0.675 | 0 | 0 |
25 | 0.832 | 0.839 | 0.152 | 0.151 |
26 | 0.517 | 0.519 | 0 | 0 |
27 | 0.692 | 0.691 | 0 | 0 |
28 | 0.549 | 0.550 | 0 | 0 |
29 | 0.658 | 0.663 | 0.853 | 0.846 |
30 | 0.695 | 0.706 | 0 | 0 |
31 | 0.573 | 0.571 | 0 | 0 |
32 | 0.508 | 0.508 | 0 | 0 |
33 | 0.564 | 0.565 | 0 | 0 |
34 | 0.653 | 0.663 | 0 | 0 |
35 | 0.645 | 0.642 | 13.497 | 13.552 |
36 | 0.583 | 0.588 | 0.230 | 0.228 |
37 | 0.506 | 0.506 | 0 | 0 |
38 | 0.586 | 0.586 | 0 | 0 |
39 | 0.736 | 0.734 | 0.935 | 0.936 |
40 | 0.755 | 0.761 | 0 | 0 |
AIC | BIC | SABIC | HQ | Log-likelihood | Reliability | |
---|---|---|---|---|---|---|
Baseline | 95,735.81 | 96,258.58 | 96,004.37 | 95,918.87 | −47,787.91 | 0.859 |
Random Deletion | 72,970.21 | 73,492.98 | 73,238.77 | 73,153.27 | −36,405.10 | 0.849 |
NT10 | 94,429.20 | 94,951.97 | 94,697.75 | 94,612.26 | −47,134.60 | 0.861 |
NT15 | 93,727.81 | 94,250.55 | 93,996.33 | 93,910.86 | −46,783.90 | 0.862 |
NT20 | 93,031.28 | 93,553.99 | 93,299.77 | 93,214.32 | −46,435.64 | 0.863 |
NT25 | 92,271.10 | 92,793.77 | 92,539.56 | 92,454.14 | −46,055.55 | 0.864 |
Random Search | 91,603.80 | 92,126.50 | 91,872.28 | 91,786.85 | −45,721.90 | 0.864 |
Genetic Algorithm | 91,318.34 | 91,840.99 | 91,586.78 | 91,501.37 | −45,579.17 | 0.864 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bulut, O.; Gorgun, G.; Wongvorachan, T.; Tan, B. Rapid Guessing in Low-Stakes Assessments: Finding the Optimal Response Time Threshold with Random Search and Genetic Algorithm. Algorithms 2023, 16, 89. https://doi.org/10.3390/a16020089
Bulut O, Gorgun G, Wongvorachan T, Tan B. Rapid Guessing in Low-Stakes Assessments: Finding the Optimal Response Time Threshold with Random Search and Genetic Algorithm. Algorithms. 2023; 16(2):89. https://doi.org/10.3390/a16020089
Chicago/Turabian StyleBulut, Okan, Guher Gorgun, Tarid Wongvorachan, and Bin Tan. 2023. "Rapid Guessing in Low-Stakes Assessments: Finding the Optimal Response Time Threshold with Random Search and Genetic Algorithm" Algorithms 16, no. 2: 89. https://doi.org/10.3390/a16020089
APA StyleBulut, O., Gorgun, G., Wongvorachan, T., & Tan, B. (2023). Rapid Guessing in Low-Stakes Assessments: Finding the Optimal Response Time Threshold with Random Search and Genetic Algorithm. Algorithms, 16(2), 89. https://doi.org/10.3390/a16020089