The feasibility of computerized adaptive testing of the national benchmark test: A simulation study
Musa Adekunle Ayanwale 1 * , Mdutshekelwa Ndlovu 1
More Detail
1 Faculty of Education, University of Johannesburg, South Africa
* Corresponding Author

Abstract

The COVID-19 pandemic has had a significant impact on high-stakes testing, including the National Benchmark Tests in South Africa. Current linear testing formats have been criticized for their limitations, leading to a shift towards Computerized Adaptive Testing [CAT]. Assessments with CAT are more precise and take less time. Evaluation of CAT programs requires simulation studies. To assess the feasibility of implementing CAT in NBTs, SimulCAT, a simulation tool, was utilized. The SimulCAT simulation involved creating 10,000 examinees with a normal distribution characterized by a mean of 0 and a standard deviation of 1. A pool of 500 test items was employed, and specific parameters were established for the item selection algorithm, CAT administration rules, item exposure control, and termination criteria. The termination criteria required a standard error of less than 0.35 to ensure accurate abilities estimation. The findings from the simulation study demonstrated that fixed-length tests provided higher testing precision without any systematic error, as indicated by measurement statistics like CBIAS, CMAE, and CRMSE. However, fixed-length tests exhibited a higher item exposure rate, which could be mitigated by selecting items with fewer dependencies on specific item parameters (a-parameters). On the other hand, variable-length tests demonstrated increased redundancy. Based on these results, CAT is recommended as an alternative approach for conducting NBTs due to its capability to accurately measure individual abilities and reduce the testing duration. For high-stakes assessments like the NBTs, fixed-length tests are preferred as they offer superior testing precision while minimizing item exposure rates.

Keywords

References

  • Alabi, A. T., Issa, A. O., & Oyekunle, R. A. (2012). The use of computer based testing method for the conduct of examinations at the university of Ilorin. International Journal of Learning and Development, 2(3), 68-80. https://doi.org/10.5296/ijld.v2i3.1775
  • Ando, T., Yamamoto-Hanada, K., Nagao, M., Fujisawa, T., & Ohya, Y. (2016). Combined program with computer-based learning and peer education in early adolescents with asthma: a pilot study. Journal of Allergy and Clinical Immunology, 137(2), AB157. https://doi.org/10.1016/j.jaci.2015.12.642
  • Asiyai, R. I. (2014). Improving quality higher education in Nigeria: The roles of stakeholders. International Journal of Higher Education, 4(1), 61-70. https://doi.org/10.5430/ijhe.v4n1p61
  • Assessment Systems Corporation [ASC] (n.d.). Adaptive testing simulations with CATSim. https://assess.com/catsim/
  • Ayanwale M.A., Chere-Masopha, J. & Morena, M. (2022). The classical test or item response measurement theory: the status of the framework at the examination council of Lesotho. International Journal of Learning, Teaching and Educational Research, 21(8), 384-406. https://doi.org/10.26803/ijlter.21.8.22
  • Ayanwale, M. A. (2019). Efficacy of item response theory in the validation and score ranking of dichotomous and polytomous response mathematics achievement tests in Osun State, Nigeria [Unpublished doctoral dissertation]. University of Ibadan, Nigeria. https://doi.org/10.13140/RG.2.2.17461.22247
  • Ayanwale, M. A., Adeleke, J. O., & Mamadelo, T. I. (2019). Invariance person estimate of Basic Education Certificate Examination: Classical test theory and item response theory scoring perspective. Journal of the International Society for Teacher Education, 23(1), 18–26.
  • Ayanwale, M.A. & Adeleke, J.O. (2020). Efficacy of item response theory in the validation and score ranking of dichotomous response mathematics achievement test. Bulgarian Journal of Science and Education Policy, 14(2), 260-285.
  • Ayanwale, M.A., Adeleke, J.O. & Mamadelo, T.I. (2018). An assessment of item statistics estimates of basic education certificate examination through classical test theory and item response theory approach. International Journal of Educational Research Review, 3(4), 55-67. https://doi.org/10.24331/ijere.452555
  • Ayanwale, M.A., Isaac-Oloniyo, F.O. & Abayomi, F.R. (2020). Dimensionality assessment of binary response test items: a non-parametric approach of bayesian item response theory measurement. International Journal of Evaluation and Research in Education, 9(2), 412-420. https://doi.org/10.11591/ijere. v9i2.20454
  • Aybek, E. C. (2021). catIRT tools: A “Shiny” application for Item Response Theory calibration and computerized adaptive testing simulation. Journal of Applied Testing Technology, 22(1), 12–24.
  • Aybek, E. C., & Demirtasli, R. N. (2017). Computerized adaptive test (Cat) applications and item response theory models for polytomous items. International Journal of Research in Education and Science, 3(2), 475–487. https://doi.org/10.21890/IJRES.327907
  • Baker, F. B. (2004). Item Response Theory: Parameter Estimation Techniques. Biometrics, 50(3), 896. https://doi.org/10.2307/2532822
  • Baker, F. B., & Kim, S. (2017). The basics of item response theory using R. Springer. https://doi.org/10.1007/978-3-319-54205-8_1
  • Baker, F.B. (2001). The basics of item response theory (ED458219). ERIC.
  • Burhanettin, O. & Selahattin, G. (2022). Measuring language ability of students with compensatory multidimensional CAT: A post-hoc simulation study. Education and Information Technologies, 27, 6273–6294. https://doi.org/10.1007/s10639-021-10853-0
  • Cella, D., Gershon, R., Lai, J. S., & Choi, S. (2007). The future of outcomes measurement: item banking, tailored short-forms, and computerized adaptive assessment. Quality of Life Research, 16, 133-141. https://doi.org/10.1007/s11136-007-9204-6
  • CETAP. (2019). The national benchmark tests national report. Centre for Higher Education, University of Cape Town.
  • CETAP. (2020). Test dates National Benchmark Test (NBT). Central University of Technology.
  • Chalmers, R. P. (2016). Generating adaptive and non-adaptive test interfaces for multidimensional item response theory applications. Journal of Statistical Software, 71(5), 1-38. https://doi.org/10.18637/jss.v071.i05
  • Chen, S. Y., & Lei, P. W. (2015). Controlling item exposure and test overlap in computerized adaptive testing. Applied Psychological Measurement, 29(3), 204–217. https://doi.org/10.1177/0146621604271495
  • Choe, E. M., & Fu, Y. (2018). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Measurement: Interdisciplinary Research and Perspectives, 16(4), 264–267. https://doi.org/10.1080/15366367.2018.1520560
  • Choi, S. W. (2009). Firestar: Computerized adaptive testing simulation program for polytomous item response theory models. Applied Psychological Measurement, 33(8), 644–645. https://doi.org/10.1177/0146621608329892
  • Cliff, A. & Yeld, N. (2006). Test domains and constructs: Academic literacy. In H. Griesel (Ed.), Acccess and entry level benchmarks: The national benchmark tests project (pp. 19–25). Higher Education South Africa.
  • Deng, H., Ansley, T., & Chang, H. H. (2010). Stratified and maximum information item selection procedures in computer adaptive testing. Journal of Educational Measurement, 47(2), 202–226. https://doi.org/10.1111/j.1745-3984.2010.00109.x
  • Educational Testing Services. (2014). GRE A Snapshot of the Individuals Who Took the GRE ® revised General Test. Author.
  • Embretson, S. E., & Reise, S. P. (2013). Item response theory for psychologists. Psychology Press.
  • Erdem Kara, B. (2019). Computer adaptive testing simulations in R. International Journal of Assessment Tools in Education, 6(5), 44–56. https://doi.org/10.21449/ijate.621157
  • Flens, G., Smits, N., Carlier, I., van Hemert, A. M., & de Beurs, E. (2016). Simulating computer adaptive testing with the mood and anxiety symptom questionnaire. Psychological Assessment, 28(8), 953–962. https://doi.org/10.1037/pas0000240
  • Frith, V., & Prince, R. (2006). Quantitative literacy. In H. Griesel (Ed.), Access and entry level benchmarks, the National Benchmark Tests Project (pp. 47–54). Higher Education South Africa.
  • Frith, V., & Prince, R. N. (2018). The national benchmark quantitative literacy test for applicants to South African Higher Education. Numeracy, 11(2), Article 3.
  • Han, K. C. T. (2018a). Components of the item selection algorithm in computerized adaptive testing. Journal of Educational Evaluation for Health Professions, 15, Article 7. https://doi.org/10.3352/JEEHP.2018.15.7
  • Han, K. C. T. (2018b). Conducting simulation studies for computerized adaptive testing using SimulCAT: an instructional piece. Journal of Educational Evaluation for Health Professions, 15, Article 20. https://doi.org/10.3352/jeehp.2018.15.20
  • Han, K. T. (2007). WinGen: Windows software that generates item response theory parameters and item responses. Applied Psychological Measurement, 31(5), 457–459. https://doi.org/10.1177/0146621607299271
  • Han, K. T. (2012). SimulCAT: Windows software for simulating computerized adaptive test administration. Applied Psychological Measurement, 36(1), 64–66. https://doi.org/10.1177/0146621611414407
  • Han, K. T. (2016). Maximum likelihood score estimation method with fences for short-length tests and computerized adaptive tests. Applied Psychological Measurement, 40(4), 289–301. https://doi.org/10.1177/0146621616631317
  • Han, K. T., & Kosinski, M. (2016). Software tools for multistage testing simulations. In D. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized Multistage Testing: Theory and Applications (pp. 411–420). CRC Press. https://doi.org/10.1201/b16858-39
  • Kantrowitz, T. M., Dawson, C. R., & Fetzer, M. S. (2011). Computer adaptive testing (CAT): A faster, smarter, and more secure approach to pre-employment testing. Journal of Business and Psychology, 26(2), 227–232. https://doi.org/10.1007/s10869-011-9228-3
  • Kimura, T. (2017). The impacts of computer adaptive testing from a variety of perspectives. Journal of Educational Evaluation for Health Professions, 14, Article 12. https://doi.org/10.3352/JEEHP.2017.14.12
  • Linacre, J. M. (2000). Computer-adaptive testing: a methodology whose time has come. In S. Chae, U. Kang, E. Jeon & J. M. Linacre (Eds.), Development of computerised middle school achievement tests (p. 60). Komesa Press.
  • Ludlow, L. H., & O’Leary, M. (1999). Scoring omitted and not-reached items: Practical data analysis implications. Educational and Psychological Measurement, 59(4), 615–630. https://doi.org/10.1177/0013164499594004
  • Luecht, R. M. (2005). Computer-adaptive testing. Wiley. https://doi.org/10.1002/0470013192.BSA125
  • Luecht, R. M. (2016). Computer‐Adaptive Testing. In N. Balakrishnan, T. Colton, B. Everitt, W. Piegorsch, F. Ruggeri and J.L. Teugels (Eds.), Wiley StatsRef: Statistics Reference Online (pp. 1–10). Wiley. https://doi.org/10.1002/9781118445112.stat06405.pub2
  • Luecht, R., & Sireci, S. (2011). A review of models for computer-based testing. College Board Research Reports, 1, 1–56.
  • Magis, D., & Barrada, J. R. (2017). Computerized adaptive testing with R: Recent updates of the package catR. Journal of Statistical Software, 76, Article 1. https://doi.org/10.18637/jss.v076.c01
  • Magis, D., & Raǐche, G. (2012). Random generation of response patterns under computerized adaptive testing with the R package catR. Journal of Statistical Software, 48, Article 8. https://doi.org/10.18637/jss.v048.i08
  • Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized adaptive and multistage testing with R. Springer. https://doi.org/10.1007/978-3-319-69218-0
  • Mills, C. N., & Steffen, M. (2016). The GRE Computer Adaptive Test: Operational Issues. In W. J. Linden & G. A. W Glas (Eds.), Computerized Adaptive Testing: Theory and Practice (pp. 75-99). Kluwer Academic.
  • Moncaleano, S., & Russell, M. (2018). A historical analysis of technological advances to educational testing: A drive for efficiency and the interplay with validity. Journal of Applied Testing Technology, 19(1), 1–19.
  • Nandakumar, G. S., & Viswanandhne, S. (2018). A survey on item selection approaches for computer based adaptive testing. International Journal of Recent Technology and Engineering, 7(4), 417-419.
  • NBT. (2022). More about the NBTs: National Benchmark Test Project. Author.
  • Ogunjimi, M. O., Ayanwale, M. A., Oladele, J. I., Daramola, D. S., Jimoh, I. M., & Owolabi, H. O. (2021). Simulated evidence of computer adaptive test length: Implications for high stakes assessment in Nigeria. Journal of Higher Education Theory and Practice, 21(2), 202–212. https://doi.org/10.33423/JHETP.V21I2.4129
  • Oladele, J. I., Ndlovu, M., & Spangenberg, E. D. (2022). Simulated computer adaptive testing method choices for ability estimation with empirical evidence. International Journal of Evaluation and Research in Education, 3, 1392-1399. https://doi.org/10.11591/ijere.v11i3.21986
  • Oladele, J.I., Ayanwale, M.A & Owolabi, H. (2020). Paradigm shifts in computer adaptive testing in Nigeria in terms of simulated evidences. Journal of Social Sciences, 63(1-3), 9-20. https://doi.org/10.31901/24566756.2020/63.1 -3.2264
  • Oladele, J.I., Ayanwale, M.A. & Ndlovu, M. (2023). Simulated computer adaptive testing administration: trajectories for off-site assessment. PONTE: International Journal of Sciences and Research, 79(6), 41-50. https://doi.org/10.21506/j.ponte.2023.6.4
  • Olea, J., Barrada, J. R., Abad, F. J., Ponsoda, V., & Cuevas, L. (2012). Computerized adaptive testing: The capitalization on chance problem. The Spanish journal of psychology, 15(1), 424-441. https://doi.org/10.5209/rev_sjop.2012.v15.n1.37348
  • Prince, R. N., Frith, V., Steyn, S., & Cliff, A. (2021). Academic and quantitative literacy in higher education: Relationship with cognate school-leaving subjects. South African Journal of Higher Education, 35(3), 163-181.
  • Prince, R., Balarin, E., Nel, B., Padayashni, R. P., Mutakwa D., & Niekerk, A. D. J. (2018). The National Benchmark Tests national report: 2018 intake Cycle. NBT.
  • Robitzsch, A. (2021). A comprehensive simulation study of estimation methods for the Rasch model. Stats, 4(4), 814-836. https://doi.org/10.3390/stats4040048
  • Sango, T., Prince, R., Steyn, S., & Mudavanhu, P. (2022). High-stakes online assessments: A case study of National Benchmark Tests during COVID-19. Perspectives in Education, 40(1), 212–233. https://doi.org/10.18820/2519593X/PIE.V40.I1.13
  • Scheuermann, F., & Björnsson, J. (2009). The transition to computer-based assessment: new approaches to skills assessment and implications for large-scale testing. European Commission. https://doi.org/10.2788/60083
  • Sebolai, K. (2014). Do the academic and quantitative literacy tests of the national benchmark tests have discriminant validity?. Journal for Language Teaching, 48(1), 131-147.
  • Seo, D. G. (2017). Overview and current management of computerized adaptive testing in licensing/certification examinations. Journal of Educational Evaluation for Health Professions, 14, 17. https://doi.org/10.3352/JEEHP.2017.14.17
  • Thompson, G. (2017). Computer adaptive testing, big data and algorithmic approaches to education. British journal of sociology of education, 38(6), 827-840.
  • Thompson, N. A. (2007). A practitioner’s guide for variable-length computerized classification testing. Practical Assessment Research & Evaluation, 12(1), Article 1. https://doi.org/10.7275/fq3r-zz60
  • Thompson, N. A. (2009). Item selection in computerized classification testing. Educational and Psychological Measurement, 69(5), 778–793. https://doi.org/10.1177/0013164408324460
  • Thompson, N. A., & Weiss, D. J. (2009). Computerized and adaptive testing in educational assessment. In F. Scheuermann & J. Björnsson (Eds.), The Transition to Computer-Based Assessment: New Approaches to Skills Assessment and Implications for Large-scale Testing (pp. 127–133). European Commission.
  • Thompson, N. A., & Weiss, D. J. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research and Evaluation, 16(1), 1–9.
  • Thompson, N. A. (2011). Termination criteria for computerized classification testing. Practical Assessment, Research, and Evaluation, 16(1), 4.https://doi.org/10.7275/wq8m-zk25
  • Tsaousis, I., Sideridis, G. D., & AlGhamdi, H. M. (2021). Evaluating a computerized adaptive testing version of a cognitive ability test using a simulation study. Journal of Psychoeducational Assessment, 39(8), 954–968. https://doi.org/10.1177/07342829211027753
  • van der Linden, W. J., & Glas, C. A. W. (2010). Elements of adaptive testing. Springer. https://doi.org/10.1007/978-0-387-85461-8
  • Van der Linden, W. J., & Pashley, P. J. (2009). Item selection and ability estimation in adaptive testing. In W. J. van der Linden & C. A. Glas (Eds.), Elements of adaptive testing (pp. 3-30). Springer. https://doi.org/10.1007/0-306-47531-6_1
  • Veldkamp, B. P., & Matteucci, M. (2013). Bayesian computerized adaptive testing. Ensaio: Avaliação e Políticas Públicas em Educação, 21, 57-82. https://doi.org/10.1590/S0104-40362013005000001
  • Veldkamp, B. P., & Verschoor, A. J. (2019). Robust computerized adaptive testing. In B. P. Veldkamp & C. Sluijter (Eds.), Theoretical and practical advances in computer-based educational measurement (pp. 291–305). Springer. https://doi.org/10.1007/978-3-030-18480-3_15
  • Wang, W., & Kingston, N. (2019). Adaptive testing with a hierarchical item response theory model. Applied Psychological Measurement, 43(1), 51–67. https://doi.org/10.1177/0146621618765714
  • Weiss, D. J. (1985). Adaptive testing by computer. Journal of Consulting and Clinical Psychology, 53(6), 774–789. https://doi.org/10.1037//0022-006X.53.6.774
  • Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361–375. https://doi.org/10.1111/j.1745-3984.1984.tb01040.x
  • Zanon, C., Hutz, C. S., Yoo, H., & Hambleton, R. K. (2016). An application of item response theory to psychological test development. Psicologia: Reflexao e Critica, 29(1), Article 18. https://doi.org/10.1186/s41155-016-0040-x
  • Zhang, Y., Wang, D., Gao, X., Cai, Y., & Tu, D. (2019). Development of a computerized adaptive testing for internet addiction. Frontiers in Psychology, 10(5), 1010. https://doi.org/10.3389/FPSYG.2019.01010

License

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.