Files
Abstract
This study applies item response theory methods to the tests combining multiple-choice (MC) and constructed response (CR) item types. Issues discussed include the following: 1) the selection of the best fitting model from the most widely used three combinations of item response models; 2) the estimation of ability and item parameters; 3) the potential loss of information from both simultaneous and separate calibration runs. Empirical results are presented from a mathematics achievement test that includes both item types. Both two-parameter logistic (2PL) and three-parameter logistic (3PL) models fit to the data better than the one-parameter logistic (1PL) model for the MC items. Both graded response (GR) and generalized partial credit (GPC) models fit better to the CR items than the partial credit (PC) model. The 2PL&GR and 3PL&GPC model combinations provided better fit than did the 1PL&PC. Item and ability parameter estimates from separate and simultaneous calibration runs across various models were highly consistent. Calibrating the MC and CR items together or separately did not cause information loss. Use of the CR items in the test increased reliability. Simultaneous calibration of the MC and CR items provided consistent estimates and an implicitly weighted ability measure.