Background Accurate diagnosis of cognitive impairment requires measures of cognitive function that are free from bias. Assessing bias at the item level involves statistical techniques to detect differential item functioning (DIF). DIF occurs when subjects from different demographic groups have different probabilities of answering an item correctly, after controlling for overall ability. We developed a new technique to adjust scores for DIF.
Study Design and Methods Data Source. We analyzed baseline data from the Canadian Study of Health and Aging (CSHA), a prospective cohort study of elderly Canadians (n = 8,121). Participants completed the Modified Mini-Mental State Examination (3MS) in either English (n = 6,579) or French (n = 1542). CSHA investigators reached consensus diagnoses of cognitive impairment and dementia status for subjects with 3MS scores ≤78 (n = 1,209) and a 10% random sample of those with scores ≥77 (n=691). Scoring Techniques. Standard 3MS scoring assigns pre-specified weights to each of the test's 46 items. IRT scoring empirically estimates weights for each item. We used an ordinal logistic regression approach to detect items with DIF. We then estimated item parameters separately in each of four education groups for items found to have education DIF and constrained item parameters to be equal for items without education DIF. We used these revised item parameters to determine cognitive ability scores and again looked for education DIF using these updated scores. We continued the process until the same set of items displayed education DIF in successive cycles. We then used education-DIF adjusted IRT scores to look for DIF due to language of test administration and repeated the iterative procedure to adjust 3MS IRT scores for language DIF. Verification of Scoring. We compared DIF-adjusted IRT scores to standard 3MS scores to determine the impact adjusting for DIF on individual scores. We also compared Receiver Operator Characteristic (ROC) curves for standard 3MS scores and DIF-adjusted IRT scores.
Results Forty of the 46 3MS items had education DIF and 27 had language DIF. Most 3MS scores were associated with a wide range of DIF-adjusted IRT scores. DIF-adjusted IRT scoring was slightly less sensitive than standard 3MS scoring.
Conclusions Our analyses suggest that any gains in validity associated with adjustment for DIF may come at the cost of slightly lower sensitivity for detecting dementia, though we found no reduction in the ability to detect cognitive impairment. Further study of this approach is warranted.