Ψlogical
Testing

Chapter 19
Test Bias

House Keeping

Lectures:

  • Alpacas present
    Tuesday!! 🦙
  • expect more lab/lecture overspill
  • Final on 5/23 (merely Exam #3)

🔭🌠ASTEROID YR4 PANIC METER!!!🌌💥

Test Bias

Context

The most difficult problem is that some ethnic groups obtain lower average scores on some psychological tests. The most controversial case concerns intelligence tests (Kaplan & Saccuzzo, 2018, p. 514)

“Controversial” Question…

…are these observed (e.g., “X”) differences reflective of true scores (e.g., “T”) or are they indicating systematic error within the test scores?

Note

It’s important to note that, even though group differences are observed, individuals from different groups do attain scores all along the possible score continuua for these “g”–loaded tests.

Traditional Defense of Testing (I)

content–related evidence for validity:

  • are items that populate these tests “problematic” for some individuals with a shared group membership?
  • possible that children haven’t had opportunity to learn content
  • Black Intelligence Test of Cultural Homogeneity (Williams, 1972) a tongue–in–cheek attempt to demonstrate culturally–embedded (e.g., problematic) content

Traditional Defense of Testing (II)

Differential Item Functioning:

  • statistical procedure that compares item characteristic curves from separate groups
  • equal abilities (X–axis) should have equal probabilities of correct item response (Y–axis)

“Criterion–related” sources of bias (I)

Search for bias here is focused on the relationship between test scores (aggregate scale score) and some criterion of interest (e.g., DV)

  • uses statistical procedure (regression1) to explore different associations between IV (test score) and DV (outcome)

“Criterion–related” sources of bias (II)

  • focus traditionally placed on 2 regression coefficients
  • test bias here is synonymous with differential validity
    • different regression line slopes for each group1

Other approaches to testing

Some Ψs unsatisfied with lack of DIF and differential validity as defense for using tests that exhibit group differences…

  • one explanation for group differences is lack of exposure to content (e.g., “ignorance”)
    • if true, differences should be manipulable (flip \(\pm\) or obviate differences)

Ethical models of bias

…three mutually incompatible ethical positions in regard to the fair and unbiased use of psychological tests… (Hunter & Schmidt, 1976, p. 1053)

  • unqualified individualism – use any information that helps find the best candidates for a job or school
  • quotas – placement outcomes should be proportional to population constituency
  • qualified individualism – an ethical imperative exists to ignore group identification (not treat differently)

Statistical models of test bias…

…usually utilize regression procedures – all of which align with the fairness philosophies of Hunter & Schmidt (1976)


General consensus is that, “no bias exists if the regression equations relating the test and the criterion are indistinguishable for the groups in question(American Educational Research Association et al., 2014, p. 79)




Warning

The textbook’s presentation of Test Fairness Models (pp. 533–535) is out–of–date and in violation of the Civil Rights Act of 1991

Tests can harm?

Tests ideally provide greater insight into individuals, helping them build better and more productive lives (identify strengths & weaknesses and inform action plans)… however, the ideal isn’t always met:

  • potentially reinforce stereotypes or negative self–beliefs (e.g., “I’m not good at math”)
  • speculation that this could contribute to a fixed mindset (e.g., Dweck & Yeager, 2019)
  • growth mindset generally considered more beneficial for academic performance

According to the textbook, the most controversial test differences occur with ___________

  • intelligence
  • personality
  • interest
  • psychopathology

The statistical procedure that compares item characteristic curves is known as ___________

  • differential item functioning
  • prenatal toe grabbing
  • differing discrimination indices
  • moderated hierarchical regression

The statistical procedure most often leveraged to probe for test bias is ___________

  • regression
  • correlation
  • ANOVA
  • t–test

When seeking criterion–related validity, the criterion is the methodological _______

  • dependent variable
  • independent variable
  • moderator
  • mediator

Which of the following IS NOT one of the 3 discussed ethical models of bias?

  • unified collectivism
  • quotas
  • unqualified individualism
  • qualified individualism

Assessment groups:





Dingos🦊 Camels🐫 Alpacas🦙 Belugas🐳 Elephants🐘
Sarah M Sarah J Vanessa A Mae F Hannah T
Raelyn R Thomas J Sabina B Alaina G William T
Ellen R Grace K Nathan B Payton H Jennifer T
Rachel S Grace L Maritza B Elly J Lila W
4/29 5/1 5/6 5/8 5/13

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing (5th ed.). American Educational Research Association.
Dove, A. (1968). Taking the chitlin test. In Newsweek: Vols. July 15 (pp. 51–52).
Dweck, C. S., & Yeager, D. S. (2019). Mindsets: A view from two eras. Perspectives on Psychological Science, 14(3), 481–496.
Hunter, J. E., & Schmidt, F. L. (1976). Critical analysis of the statistical and ethical implications of various definitions of test bias. Psychological Bulletin, 83(6), 1053–1071.
Kaplan, R. M., & Saccuzzo, D. P. (2018). Psychological testing: Principles, application, and issues (9th ed.). Cengage.
Mercer, J. R., & Lewis, J. F. (1977). System of multicultural pluralistic assessment. Psychological Corporation.
Williams, R. L. (1972). The BITCH-100: A culture-specific test.