Determination of differantial item functioning (DIF) according to SIBTEST, Lord’s χ2 , Raju’s area measurement and Breslow-Day methods
Fatma Gökçen Ayva-Yörü 1 * , Hakan Yavuz Atar 1
More Detail
1 Gazi University, Faculty of Education, Ankara, Turkey
* Corresponding Author


The aim of this study is to examine whether the items in the mathematics subtest of the Centralized High School Entrance Placement Test [HSEPT] administered in 2012 by the Ministry of National Education in Turkey show DIF according to gender and type of school. For this purpose, SIBTEST, Breslow-Day, Lord’s χ2 and Raju's area measurement methods were used to determine the DIF of the 20 items in the mathematics subtest of HSEPT in 2012, and it was determined whether the items show DIF according to these methods or not. The research was conducted on the basis of the data obtained from HSEPT that the eighth grade students took in 2012. After the missing data were removed from the data set, DIF analyses were performed for the mathematics subtest of 1,063,570 (nfemale=523,939 and nmale=539,631; nstate school=1,025,979 and nprivate school =37,591) in total. Since it is aimed to examine the current situation, this study is based on a descriptive research design. According to the methods used, the number and DIF levels of the items with DIF differed depending on the variables of gender and type of school. In line with the findings, this research suggests the researchers to use at least two methods to determine the DIF.



  • Aguerri, M. E., Galibert, M. S., Attorresi, H. F., & Marañón, P. P. (2009). Erroneous detection of nonuniform DIF using the breslow-day test in a short test. Quality & Quantity, 43(1), 35-44.
  • Arıkan, Ç. A., Uğurlu, S., & Atar, B. (2016). MIMIC, SIBTEST, lojistik regresyon ve mantel-haenszel yöntemleriyle gerçekleştirilen DMF ve yanlılık çalışması. Hacettepe University Journal of Education, 31(1), 34-52.
  • Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
  • Breslow, N. E., & Day, N. E. (1980). Statistical methods in cancer research. Volume I - The analysis of case-control studies. IARC Sci Publ, (32), 5–338
  • Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Newbury Park, CA: Sage.
  • Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31–44.
  • Crocker, L., & Algina J. (1986). Introduction to classical and modern test theory. Orlando: Harcourt Brace Jovanovich Inc.
  • Çepni, Z. (2011). Değişen madde fonksiyonlarinin SIBTEST, mantel haenzsel, lojistik regresyon ve madde tepki kuramı yöntemleriyle incelenmesi. (Doctoral dissertation, Hacettepe University). Retrieved from
  • Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35-66). Hillsdale, NJ: Lawrence Erlbaum.
  • Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance of female candidates on the scholastic aptitude test. Journal of Educational Measurement, 23(4), 355–368.
  • French, B. F., & Finch, W. H. (2013). Extensions of mantel-haenszel for multilevel dif detection. Educational and Psychological Measurement, 73(4), 648–671.
  • French, B. F., & Finch, W. H. (2015). Transforming SIBTEST to account for multilevel data structures. Journal of Educational Measurement, 52(2), 159-180.
  • Geranpayeh, A., & Kunnan, A. J. (2007). Differential item functioning in terms of age in the certificate in advanced english examination. Language Assessment Quarterly, 4(2), 190-222.
  • Gierl, M. J. (2005). Using dimensionality-based dif analyses to identify and interpret constructs that elicit group differences. Educational Measurement: Issues and Practice, 24(1), 3-14.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J., (1991). Fundamentals of item response theory. London: Sage.
  • Hidalgo, M. D., & Gomez-Benito, J. (2010). Education measurement: Differential item functioning. In P. Peterson, E. Baker, & B. McGaw (Eds.), International Encyclopedia of Education, (Vol. 4, pp. 36-44). Oxford: Elsevier.
  • Holland, P. W., & Wainer, H. (1993). Differential item functioning. London: Lawrence Erlbaum.
  • Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the mentel-haenszel procedure. In H. Holland & H.I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Erlbaum.
  • Jöreskog, K., & Sörbom, D. (1989). LISREL 7 User’s Reference Guide. Chicago: Scientific Software International.
  • Kan, A., Sünbül, Ö., & Ömür, S. (2013). 6.- 8. sınıf seviye belirleme sınavları alt testlerinin çeşitli yöntemlere göre değişen madde fonksiyonlarının incelenmesi. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 9(2), 207-222.
  • Karakaya, İ. (2012). An Investigation of item bias in science and technology subtests and mathematic subtests in in level determination exam. Theory and Practice in Educational Sciences, 12(1), 215–229.
  • Karakaya, İ., & Kutlu, Ö. (2012). An Investigation of item bias in Turkish subtests in level determination exam. Journal of Education and Science, 37(165), 348–362.
  • Karami, H. (2012). An introduction to differential item functioning. The International Journal of Educational and Psychological Assessment, 11(2), 59-76.
  • Kelecioğlu, H., Karabay, B., & Karabay, E. (2014). Investigation of placement test in terms of item biasness. Elementary Online, 13(3), 934–953.
  • Li, Z., & Zumbo, B. D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score data. Psicologica: 30(2), 343-370.
  • Magis, D., Beland, S., Teurlinckx, F., & Boeck, P. (2010). A general framework and an R package forthe detection of dichotomous differential item functioning. Behavior Research Methods, 42(3), 847-862.
  • Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17(4), 297–334.
  • Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning. California: Sage.
  • Penfield, R. D. (2003). Applying the breslow-day test of trend in odds ratio heterogeneity to the analysis of nonuniform dif. The Alberta Journal of Educational Research, 49(3), 231-243.
  • Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495-502.
  • Raju, N. (1990). Determinig the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197-207.
  • Raju, N. S., & Arenson, E. (2002). Developing a common metric in item response theory: An area-minimization approach. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.
  • Raju, N. S., Fortmann-Johnson, K. A., Kim, W., Morris, S. B., Nering, M. L., & Oshima, T. C. (2009). The item parameter replication method for detecting differential functioning in the polytomous DFIT framework. Applied Psychological Measurement, 33(2), 133–147.
  • Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type I error performance. Journal of Educational Measurement, 33(2), 215-230.
  • Shealy, R., & Stout, W. F. (1993). A model-based standardization approach that separates true bias/ dif from group ability differences and detects test bias / DTF as well as item bias / DIF. Psychometrika, 58(2), 159–194.
  • Steinberg, L., & Thissen, D. (2006). Using effect sizes for research reporting: Examples using item response theory to analyze differential item functioning. Psychological Methods, 11(4), 402–415.
  • Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370.
  • Terzi, R., & Yakar, L. (2018). Differential item and differential distractor functioning analyses on Turkish high school entrance exam. Journal of Measurement and Evaluation in Education and Psychology, 9(2), 136-149.
  • Toprak, E., & Yakar, L. (2017). Analysis of SBS 2011 Turkish subtest items in terms of differential item functioning by different methods. International Journal Of Eurasia Social Sciences, 8(26), 220-231.
  • Wiberg, M. (2007). Measuring and detecting differential item functioning incriterion-referenced licensing test: A theoretic comparison of methods (EM No. 60). Umea University Department of Educational Measurement, Umea.
  • Wright, K. D., & Oshima, T. C. (2015). An effect size measure for Raju’s differential functioning for items and tests. Educational and Psychological Measurement, 75(2), 338-358.
  • Yıldırım, H., & Büyüköztürk, Ş. (2018). Using the delphi technique and focus-group interviews to determine item bias on the mathematics section of the Level Determination Exam for 2012. Educational Sciences: Theory & Practice, 18(2), 447-470.
  • Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF). Ottawa: National Defense Headquarters.
  • Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233.


This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.