Files
Abstract
In item response theory (IRT), the origin and unit of the ability scale in IRT are arbitrary. This arbitrariness is referred to as scale indeterminacy or the identification problem. Standard IRT models may not fit the data when there is unexplained heterogeneity present. In such cases, a mixture IRT model, which models this heterogeneity by fitting an IRT model to latent classes in the data, may be useful. The purpose of this study was to explore the effect of three different kinds of constraints for identifying the metric in the mixture IRT (MixIRT) model: (1) equating in which an anchor item is used to anchor the metrics between latent classes, (2) person centering in which the mean of the ability parameters is set to zero after each calibration, and (3) item centering in which the mean of the item difficulty parameters is set to zero. Results based on an analysis of the empirical data indicated that the number of latent classes detected differed depending on the particular MixIRT model and constraint combination. The mean ability, proportion of group memberships, and item parameters also differed between the three constraints. Results of a simulation study are presented followed by an illustrative example using real data from the TIMSS 2011 8th grade science test. In the simulation study, the impact of the three identification methods was examined on classifications of latent class memberships and on item and ability parameter estimates for three dichotomous MixIRT models. There was no effect of identification constraint on the MixRM and Mix2PLM. Only the item anchoring constraint was found to work well with the Mix3PLM, although recovery was relatively poor for this model compared to the MixRM and Mix2PLM. When the types of constraint were compared, the person centering constraint produced the worst recovery results. Test length and sample size did not appear to have an effect on the recovery of item parameters. The longer test length improved group member ship identification. Percent of correct model selection using AIC was lower for the larger sample size. Recovery or group membership, item difficulty, and item discrimination decreased with an increase in the number of latent classes simulated. Recovery of the lower asymptote, however, was slightly better for the larger sample size and for more latent classes.