Validity - Outline: 1. Definition 2. Validity: Two Different Views 3. Types of Validity

1
Validity Outline
1. Definition 2. Validity: Two Different Views 3. Types of Validity
A. Face B. Content C. Criterion
i. Predictive vs. Concurrent ii. Validity Coefficients
D. Construct
i. Convergent ii. Discriminant
Validity Definition
Validity measures agreement between a test score and the characteristic it is believed to measure The basic question is: are you measuring what you think youre measuring?
Validity: two very different views

Traditional:
Validity is a property of tests Does the test measure what you think it measures?
Validity: two very different views

Traditional Recent (e.g, Messick, 1989; Committee on Standards for Educational and Psychological Testing (CSEPT)):
Validity is a property of test score interpretations Validity exists when actions based on the interpretation are justified given a theoretical basis and social consequences
Note the difference:

Does the test measure what you think it measures? Validity exists when actions based on the interpretation are justified given a theoretical basis and social consequences
A problem with the CSEPT view

Who is to say the social consequences of test use are good or bad? According to CSEPT validity is a subjective judgment In my view, this makes the concept useless: if you like the result the test gives you, you will consider it valid. If you dont, you wont. Thats not how scientists think.
Borsboom et al. (2004)

Borsboom et al reject CSEPTs view Validity is a very basic concept and was correctly formulated, for instance, by Kelley (1927, p. 14) when he stated that a test is valid if it measures what it purports to measure. (p. 1061)

a test is valid for measuring an attribute if and only if (a) the attribute exists and (b) variations in the attribute causally produce variations in the outcomes of the measurement procedure. Variations in what you are measuring cause variations in your measurements. E.g., variations across people in intelligence cause variations in their IQ scores This is not a correlational model of validity

You dont create a test and then do the analysis necessary to establish its validity Rather, you begin by doing the theoretical work necessary to create a valid test in the first place. On this view, validity is not a big issue.
10
Borsboom et al. vs. CSEPT

Who is right? Each scientist has to make up his or her own mind on that question I find Borsboom et al.s arguments compelling. Other psychologists may disagree
11
The CSEPT view

CSEPT recognizes 3 types of evidence for test validity:
Content-related Criterion-related Construct-related Boundaries not clearly defined
Cronbach (1980): Construct is basic, while Content & Criterion are subtypes.
12
Parenthetical Point Face Validity

Face validity refers to the appearance that a test measures what it is intended to measure. Face validity has P.R. value test-takers may have better motivation if the test appears to be a sensible way to measure what it measures.
13
CSEPT: Content validity

Content-related evidence considers coverage of the conceptual domain tested. Important in educational settings Like face validity, it is determined by logic rather than statistics Typically assessed by expert judges
14
CSEPT: Content validity

Content-related evidence considers coverage of the conceptual domain tested.
Construct-irrelevant variance Construct underrepresentation
Is each item relevant to domain? Is domain adequately covered or are parts of it left out? But if you are going to ask these questions, why not do it when creating the test?
15
Borsboom et al.: Content validity

Borsboom et al. would say that content validity is not something to be established after the test has been created. Rather, you build it into your test by having a good theory of what you are testing E.g., for a test in this course to have content validity, it should test your understanding of content validity!
16
CSEPT: Criterion validity

Criterion-related evidence tells us how well a test score corresponds to a particular criterion measure. A criterion is a standard against which a test is compared. The test score should tell us something about the criterion score.
17

A criterion is a standard against which a test is compared. E.g., we could compare GPAs to SAT scores to produce evidence of validity of conclusions drawn on basis of SAT scores Two basic types:
Predictive Concurrent
18

Predictive validity Test scores used to predict future performance how good is the prediction? E.g., SAT is used to predict final undergraduate GPA SAT GPA are moderately correlated
19

Predictive validity Concurrent validity Correlation between test scores and criterion when the two are measured at same time. Test illuminates current performance rather than predicting future performance (e.g., why does patient have a temperature? Why
20
Borsboom et al.: Criterion validity

Criterion validity involves a correlation, of test scores with some criterion such as GPA That does not establish the tests validity, only its utility. E.g., height and weight are correlated, but a test of height is not a test of what bathroom scales measure.
21

SAT is valid because it was developed on the sensible theory that past academic achievement is a good guide to future academic achievement Validity is built into the test, not established after the test has been created
22

Validation research aims at showing how variation in the attribute causes variation in the test score This requires a theory of the task: how does the testtaker do the mental operations needed to respond to test items?
23

Note: no point in developing a test if you already have a criterion unless impracticality or expense makes use of the criterion difficult. Criterion measure only available in the future? Criterion too expensive to use?
24

Validity Coefficient Compute correlation (r) between test score and criterion. r = .30 or .40 would be considered normal. r > .60 is rare
Note: r varies between -1.0 and +1.0
25

Validity Coefficient r2 gives proportion of variance in criterion explained by test score. E.g., if rxy = .30, r2 = .09, so 9% of variability in Y can be explained by variation in X
26

Interpreting Validity Coefficients watch out for: 1. Changes in causal relationships 2. What does criterion mean? Is it valid, reliable? 3. Is subject population for validity study appropriate? 4. Sample size
27

Interpreting Validity Coefficients watch out for: 5. Criterion/predictor confusion 6. Range restrictions 7. Do validity study results generalize? 8. Differential predictions
28
CSEPT: Construct validity

Problem: for many psychological characteristics of interest there is no agreed-upon universe of content and no clear criterion We cannot assess content or criterion validity for such characteristics These characteristics involve constructs: something built by mental synthesis.
29

Examples of constructs:
Intelligence Love Curiosity Mental health
CSEPT: We obtain evidence of validity by simultaneously defining the construct and developing instruments to measure it. This is bootstrapping.
30
Bootstrapping construct validity

assemble evidence about what a test means in other words, about the characteristic it is testing. CSEPT: this process is never finished Borsboom: this is part of the process of creating a test in the first place, not something done after the fact
31

assemble evidence show relationships between a test and other tests none of the other tests is a criterion Borsboom: these relationships do not tell us what a test score means
(e.g., age is correlated with annual income but a measure of age is not a measure of annual income).
32

assemble evidence show relationships each new relationship adds meaning to the test tests meaning is gradually clarified over time Borsboom would say, why all the mystery? The meaning of many tests (e.g., WAIS, academic exams, Piagets tests) is clear right from the start
33

Example from text: Rubin collected a set Rubins work on Love. of items for a Love scale He read poetry, novels; asked people for definitions created a scale of Love and one of Liking
34

Rubin gave scale to many subjects & factor-analyzed results Love integrates Attachment, Caring, & Intimacy Liking integrates Adjustment, Maturity, Good Judgment, and Intelligence
The two are independent: you can love someone you dont like (as songwriters know)
35
Campbell & Fiske (1959)

Two types of Construct-related Evidence When a test correlates well with other tests believed to measure the same construct
Convergent evidence
36
Campbell & Fiske (1959)

Two types of Construct-related Evidence When a test does not correlate with other tests believed to measure some other construct.
Convergent evidence Discriminant evidence
37
Convergent validity
Example Health Index Scores correlated with age, number of symptoms, chronic medical conditions, physiological measures Treatments designed to improve health should increase Health Index scores. They do.
38
Discriminant validity
low correlations between new test and tests believed to tap unrelated constructs. evidence that the new test measures something unique
39
CSEPT: Validity & Reliability

CSEPT: No point in trying to establish validity of an unreliable test. Its possible to have a reliable test that has no meaning (is not valid). Logically impossible to produce evidence of validity for an unreliable test.
40
Borsboom: Validity & Reliability

Borsboom et al: what does it mean to say that a test is reliable but not valid? What is it a test of? It isnt a test at all, just a collection of items
41
Borsboom: Validity & Reliability

Borsboom et al: validity is a necessary condition for reliability Reliability of a test of X estimates precision of measurement of X but how could you estimate the precision of measurement of X for a test that does not measure X? Thus, validity is presumed when you assess reliability
42
Blanton & Jaccard arbitrary metrics

We observe a behavior in order to learn about the underlying psychological characteristic A persons test score represents their standing on that underlying dimension Such scores form an arbitrary metric That is, we do not know how the observed scores are related to the true scores on the underlying dimension
Person A Underlying dimension
Person B
Neutral
Test 1 0 1 2 3 4 5 6
Test 2 0 1 2 3 4 5 6
Adapted from Blanton & Jaccard (2006) Figure 1, p. 29
44
Arbitrary metrics the IAT

Implicit Association IAT authors say you Test (IAT) claimed may have prejudices to diagnose implicit you dont know you attitudinal preferences have. or racist attitudes Are these claims true?
45

Task: categorize stimuli using two pairs of categories Two buttons to press, two assignments of categories to buttons, used in sequence
46

Assignment pattern A Button 1 press if stimulus refers to the category White or the category Pleasant Button 2 press if stimulus refers to the category Black or the category Unpleasant Assignment pattern B Button 1 press if stimulus refers to the category White or the category Unpleasant Button 2 press if stimulus refers to the category Black or the category Pleasant
47

IAT authors claim that if responses are faster to Pattern A than to Pattern B, that indicates a preference for Whites over Blacks in other words, a racist attitude IAT authors also give test-takers feedback about how strong their preferences are, based on how much faster their responses are to Pattern A than to Pattern B This is inappropriate
48

Blanton & Jaccard: The IAT does not tell us about racist attitudes IAT authors take a dimension which is non-arbitrary when used by physicists time and use it in an arbitrary way in psychology
49

The function relating the response dimension (time) to the underlying dimension (attitudes) is unknown Zero on the (Pattern A Pattern B) difference may not be zero on the underlying attitude preference dimension There are alternative models of how that (Pattern A Pattern B) difference could arise
50
Review
CSEPT: 1. Validity is a characteristic of evidence, not of tests. 2. Valid evidence supports conclusions drawn using test results 3. Validity is determined by social consequences of test
Borsboom et al. 1. Validity is not a methodological issue, but a substantive (theoretical) issue 2. A test of an attribute is valid if (a) the attribute exists, and (b) variation in the attribute causes variation in test scores
51
Review
CSEPT: 4. Validity can be established in three ways, though boundaries between them are fuzzy:
A. Content-related evidence B. Criterion-related evidence C. Construct-related evidence
Borsboom et al: 3. Its all the same validity: a test is valid if it measures what you think it measures 4. Validity is not mysterious
52
Review
CSEPT 5. Content-related evidence: do test items represent whole domain of interest? 6. Criterion-related evidence: do test scores relate to a criterion either now (concurrent) or in the future (predictive)? Borsboom et al. 5. These questions are properly part of the process of creating a test
53
Review
CSEPT 6. Construct-related evidence is obtained when we develop a psychological construct and the way to measure it at the same time. 7. A test can be reliable but not valid. A test cannot be valid if not reliable. Borsboom et al. 6. A test must be valid for a reliability estimate to have any meaning
54
Review
Blanton & Jaccard (2006) warn against over-interpretation of scores which are based on an arbitrary metric For an arbitrary metric, we have no idea how the test scores are actually related to the underlying dimension

Validity - Outline: 1. Definition 2. Validity: Two Different Views 3. Types of Validity

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Validity - Outline: 1. Definition 2. Validity: Two Different Views 3. Types of Validity

Hochgeladen von

Copyright:

Verfügbare Formate

1

Validity: two very different views

Validity: two very different views

Note the difference:

A problem with the CSEPT view

Borsboom et al. (2004)

Borsboom et al. (2004)

Borsboom et al. (2004)

Borsboom et al. vs. CSEPT

The CSEPT view

Parenthetical Point Face Validity

CSEPT: Content validity

CSEPT: Content validity

Borsboom et al.: Content validity

CSEPT: Criterion validity

CSEPT: Criterion validity

CSEPT: Criterion validity

CSEPT: Criterion validity

Borsboom et al.: Criterion validity

Borsboom et al.: Criterion validity

Borsboom et al.: Criterion validity

CSEPT: Criterion validity

CSEPT: Criterion validity

Note: r varies between -1.0 and +1.0

CSEPT: Criterion validity

CSEPT: Criterion validity

CSEPT: Criterion validity

CSEPT: Construct validity

CSEPT: Construct validity

Bootstrapping construct validity

Bootstrapping construct validity

Bootstrapping construct validity

CSEPT: Construct validity

CSEPT: Construct validity

Campbell & Fiske (1959)

Campbell & Fiske (1959)

Convergent evidence Discriminant evidence

CSEPT: Validity & Reliability

Borsboom: Validity & Reliability

Borsboom: Validity & Reliability

Blanton & Jaccard arbitrary metrics

Person A Underlying dimension

Adapted from Blanton & Jaccard (2006) Figure 1, p. 29

Arbitrary metrics the IAT

Arbitrary metrics the IAT

Arbitrary metrics the IAT

Arbitrary metrics the IAT

Arbitrary metrics the IAT

Arbitrary metrics the IAT

Das könnte Ihnen auch gefallen