Files
Abstract
Computerized adaptive testing (CAT) is a popular mode to deliver educational assessments, which gradually tailors the test for each individual examinee. Despite its great potential and benefits, CAT has limited applications in small-scale assessments. One main reason is that CAT is built on the item response theory model, which requires large sample sizes to calibrate its item parameters. This requirement is difficult to meet due to typically small classroom sizes. Furthermore, because of practical concerns such as fatigue effects of answering too many items, complex missing data would inevitably appear, which increases the difficulty to obtain accurate item parameter estimations.The overall goal of this dissertation is to develop item bank calibration strategies using small pretesting sample sizes to address these challenges.
Specifically, this dissertation has three interdependent studies which proposed, refined, and evaluated the item bank calibration strategies based on incomplete calibration designs and matrix completion algorithms. Incomplete calibration designs purposely administer a subset of items to examinees to avoid fatigue effects. Matrix completion is a family of recently developed algorithms to recover a matrix with missing values and has shown potential with high missing percentages.
In Study 1, 1-bit matrix completion and structured matrix completion algorithms were adapted to address the complex missing issues from four types of incomplete calibration designs with different missing mechanisms and different ways of assembling test forms. They were also adapted for binary responses in the educational assessments. Through a simulation study, a few adaptations had satisfying item difficulty parameters but overestimated item discrimination parameters. In Study 2, new item bank calibration strategies were developed based on the findings from the simulation study 1 to refine the adaptations and mitigate the overestimation for item discrimination parameters. In Study 3, the proposed strategies were evaluated in a CAT simulation and implemented in an empirical data set. The estimated abilities from the CAT using the item bank calibrated using the proposed strategies were close to those using true item parameters. In addition, the results of the empirical data implementation were comparable to those of the previous simulation studies.