Sample Size and Test Length for Item Parameter Estimate and Exam Parameter Estimate

Riswan Riswan


The Item Response Theory (IRT) model contains one or more parameters in the model. These parameters are unknown, so it is necessary to predict them. This paper aims (1) to determine the sample size (N) on the stability of the item parameter (2) to determine the length (n) test on the stability of the estimate parameter examinee (3) to determine the effect of the model on the stability of the item and the parameter to examine (4) to find out Effect of sample size and test length on item stability and examinee parameter estimates (5) Effect of sample size, test length, and model on item stability and examinee parameter estimates. This paper is a simulation study in which the latent trait (q) sample simulation is derived from a standard normal population of ~ N (0.1), with a specific Sample Size (N) and test length (n) with the 1PL, 2PL and 3PL models using Wingen. Item analysis was carried out using the classical theory test approach and modern test theory. Item Response Theory and data were analyzed through software R with the ltm package. The results showed that the larger the sample size (N), the more stable the estimated parameter. For the length test, which is the greater the test length (n), the more stable the estimated parameter (q).


Item Response Theory; Item Stability; Sample Size; Test Lenght; Wingen.

Full Text:



Crocker, Linda, and James Algina. Introduction to Classical and Modern Test Theory. Rinchart and Winston: Inc. Amerika, 1986.

Hambleton, Ronald K, Hariharan Swaminathan, and H Jane Rogers. Fundamentals of Item Response Theory. California: Sage Publications, 1991.

Junker, Brian W. “Factor Analysis and Latent Structure: IRT and Rasch Models.” In International Encyclopedia of the Social & Behavioral Sciences (Second Edition), edited by James D. Wright, 698–702. Oxford: Elsevier, 2015.

Linn, Robert L., Michael V. Levine, C. Nicholas Hastings, and James L. Wardrop. “Item Bias in a Test of Reading Comprehension.” Applied Psychological Measurement 5, no. 2 (April 27, 1981): 159–73.

Nunnally, Jum C., and Ira H. Bernstein. Psychometric Theory. 3rd ed. New York: McGraw Hill, 1994.

Stark, S, S Chernyshenko, D Chuah, Wayne Lee, and P Wilington. IRT Modeling Lab: Test Development Using Classical Test Theory. Urbana: University of Illinois, 2001.

Worthington, Everett L., Caroline Lavelock, Charlotte vanOyen Witvliet, Mark S. Rye, Jo-Ann Tsang, and Loren Toussaint. “Chapter 17 - Measures of Forgiveness: Self-Report, Physiological, Chemical, and Behavioral Indicators.” In Measures of Personality and Social Psychological Constructs, edited by Gregory J. Boyle, Donald H. Saklofske, and Gerald Matthews, 474–502. San Diego: Academic Press, 2015.


Article Metrics

Abstract view : 57 times
PDF - 54 times


  • There are currently no refbacks.

View My Stats