Sie sind auf Seite 1von 43

Determining Validity

There are several ways to measure validity. The most commonly addressed include: - Face Validity - Construct & Content Validity - Convergent & Divergent Validity - Predictive Validity - Discriminant Validity

Validity
Refers to measuring what we intend to measure.

If math and vocabulary truly represent intelligence then a math and vocabulary test might be said to have high validity when used as a measure of intelligence.

Face Validity
Is the extent to which it is self-evident that a scale is measuring what it is suppose to measure. For Example - you might look at a measure of math ability, read through
the questions, and decide that yep, it seems like this is a good measure of math ability

It would clearly be weak evidence because it is essentially a subjective judgment call. Just because it is weak evidence doesn't mean that it is wrong. We need to rely on our subjective judgment throughout the research process. For example- suppose you were taking an instrument reportedly measuring your attractiveness, but the questions were asking you to identify the correctly spelled word in each list. Not much of a link between the claim of what it is supposed to do and what it actually does.

Face Validity (cont.)


The question is not whether it is measuring what it is supposed to measure, but whether it appears to measure what it is supposed to be measuring. Face validity is important for some tests. If examinees are not interested or do not see the relevance of a particular test, they may not take it seriously or participate to the best of their abilities.

Face Validity (cont.)


In all cases, face validity is not based on empirical research evidence.

When used alone, face validity provides very weak support for the overall validity of a scale.

Face Validity (cont.)


Possible Advantage of face validity...
If the respondent trusts the test and believes it is measuring what it should be measuring, they may provide more useful, thoughtful, and accurate answers. E.g., if a nursing test seems to measure constructs not associated with nursing, the respondent may become frustrated and answer poorly or without much thought

Possible Disadvantage of face validity...


If the respondent knows what information we are looking for, they might try to bend & shape their answers to what they think we want. i.e., fake good or fake bad

Content Validity
Does the test contain items from the desired content domain? Based on assessment by experts in that content domain. Is especially important when a test is designed to have low face validity. Is generally simpler for tests of ability than for psychological constructs to demonstrate content validity
.

For Example - Easier for math experts to agree on an item for an algebra test than it is for psych experts to agree whether or not an item should be placed in a personality measure

Content Validity (cont.)


Basic Procedure for Assessing Content Validity: 1. Describe the content domain 2. Determine the areas of the content domain that are measured by each test item

For Example - In developing a nursing licensure exam, experts on the field of nursing would identify the information and issues required to be an effective nurse and then choose (or rate) items that represent those areas of information and skills.

Content Validity (cont.)


Lawshe (1975) proposed that each rater should respond to the following question for each item in content validity:

Is the skill or knowledge measured by this item


1. Essential 2. Useful but not essential 3. Not necessary We need to have a high proportion of essential responses from the raters

Construct Validity
Construct Validity basically refers to the general validity of the measurement tool.

Does the instrument measure the construct that it is intended to measure? There is no statistical test that will provide an absolute measure for construct validity. Therefore, construct validity is never proven, it can only be supported.

A) Assess how well the test predicts the relevant construct Assume that we have developed the Johnson Inventory of Aggression (JAS) A person who scores high on the JAS should be more likely to engage in aggressive behavior in any situation relative to those who score low on the JAS

B) Compare the new measure to an existing, valid measure (i.e, the JAS should correlate highly with an existing valid measure of aggression) . Although a valid measure will exist but a new scale is being created that will have some advantage over the older measure.

Advantages of new measure A) more consistent with current theory B) shorter and more accurate -

Sometimes existing valid measures dont exist. That is often why the new scale is being created in the first place.

Tyson (1997) contends that there is a clear racial divide between Blacks and Whites in their perceptions of the prevalence of discrimination in America Crocker (1999) found that when compared to White Americans, Black Americans (across all socioeconomic levels) were much more likely to believe in conspiracy theories that suggest that the government engaged in organized efforts to harm Blacks

Monteith & Spicer (2000) showed that Black negative attitudes towards Whites generally reflected reactions to perceived racism (eg., They always have negative thoughts about Blacks)

A) Media and experiences

83% of Blacks have experienced racism in their day-to-day life and 68% experiencing racism during healthcare.
(Peters, 2006)

70% of Blacks reported being treated unfairly by strangers, 34% reported unfair treatment by helping professions, and 31% were called derogatory names connected to race.
(Pieterse & Carter, 2007).

67% of Black participants felt that their life would have been different over the past year if they had not experienced these racist events

However, despite popular opinions, Blacks are not a monolithic group! There is variation among Blacks regarding their racism expectations involving Whites

In his book Country of Strangers, Shipler suggests that Blacks in America range from those who tend to see racism in every encounter with a white to those who try mightily not to see it all (p. 448).

Yet there was no direct measure of variation in Black beliefs in White racism

To address the limitations in the racism studies literature, Johnson & Lecci (2003) developed the Johnson-Lecci Scale of Black antiWhite bias This scale was based on responses and life experiences of approximately 450 Black college students The scale had 20 items and four subscales A major subscale was the ingroup stigmatization and discrimination expectation (I.e., the expectation that the typical White person will discriminate against Blacks)

***All questions were answered on a four point scale (1strongly disagree, 4-strongly agree) I believe that most whites really do support the ideas and thoughts of racist political groups I believe that most whites really believe that blacks are genetically inferior. I believe that most whites would discriminate against blacks, if they could get away with it I believe that most of the negative actions of whites towards blacks are due to racist feelings I believe that most whites would harm blacks if they could get away with it. I believe that most whites think that they are superior to blacks.

However, scales are useless if they dont predict!

The JLS has been shown to predict: A) the probability of perceiving racism in an ambiguously racist scenario (Johnson & Lecci, 2003)
Restaurant example, airline choice

B) the number of a Black persons White friends (Johnson & Lecci, 2003) C) the probability that a Black person will confront a racist person (Johnson et al., 2006) D) racial preferences regarding a mental health counselor (Ferguson et al., 2008) E) the probability that of prosocial responses towards a White person-in-need (Johnson et al., 2008) F) bias against lighter skinned Blacks who were in need (Johnson et al., under review)

What would be a comparable measure in Pacific Island/Fiji culture? What might be some of the items on the scale?

Discriminant Validity
The statistical assessment of Construct Validity Does the instrument show the right pattern of interrelationships with other instruments.

Discriminant Validity has two parts: Convergent Validity Divergent Validity

Convergent & Divergent Validity


Convergent Validity: the extent to which the scale correlates with measures of the same or related concepts. e.g., A new scale to measure Assertiveness should correlate with existing measures of Assertiveness, and with existing measures of related concepts like Independence.

Divergent Validity: the extent to which does not correlate with measures of unrelated or distinct concepts. e.g., An assertiveness scale should not correlate with measures of aggressiveness.

Criterion-Related Validity (cont.)


This type of validity measures the relationship between the predictor and the criterion, and the accuracy with which the predictor is able to predict performance on the criterion. What is the actual association between a test to predict job performance and actual job performance

If the test has high criterion validity, then it should be close

Criterion-Related Validity (cont.)


HOW THIS TYPE IS ESTABLISHED: Criterion-related validity can be either concurrent or predictive. The distinguishing factor is the time when criterion and predictor data are collected.

Concurrent - criterion data are collected before or at the same time that the predictor is administered. Predictive - criterion data are collected after the predictor is administered.

Concurrent Validity
This type of validity indicates the correlation between the predictor and criterion when data on both were collected at around the same time. Is used to determine a persons current status. For Example If a person seems depressed, they should score fairly high on a depression inventory given the same day

Predictive Validity
This type of validity also indicates the correlation between the predictor (X) and the criterion (Y). However, criterion data are collected after predictor data are obtained. In other words, this method determines the degree, that X can accurately predict Y

For Example - giving high school juniors the ACT test for admission to a university. The test is the predictor and first semester grades in college are the criterion. If the correlation is large, this means the ACT is useful for predicting future grades.

Predictive Validity (cont.)


The extent to which scores on the scale are related to, and predictive of, some future outcome that is of practical utility. e.g., If higher scores on the SAT are positively correlated with higher G.P.A.s and visa versa, then the SAT is said to have predictive validity. The Predictive Validity of the SAT is mildly supported by the relation of that scale with performance in graduate school.

Predictive Validity
A predictive validity study consists of two basic steps: 1. Obtain test scores from a group of respondents, but do not use the test in making a decision.

2. At some later time, obtain a performance measure for those respondents, and correlate these measures with test scores to obtain predictive validity.

A Test Must Be Valid Predictive Validity


When evaluating test to real-life predictions, even very modest correlations of r = .02 or .03 can be of considerable importance. For example, the impact of chemotherapy on breast cancer survival is r = .03.

In selection, hiring, and counseling contexts, current interpretations suggest that correlations as low as r = .02 or .03 are meaningful, with many psychological (and medical test) assessments and real life criteria falling in the r = .10 to .30 level, and a few rising beyond that level.

Resilience-----an ability to recover from or adjust easily to misfortune or change

Should intelligence be predictive of the probability of joining a cult? More importance should a measure of resilience be predictive of (i.e., correlate with) joining a cult?

FACTORS THAT INFLUENCE VALIDITY


Inadequate sample Items that do not function as intended Improper arrangement/unclear directions Too few items for interpretation Improper test administration Scoring that is subjective

RELATIONSHIP BETWEEN RELIABILITY AND VALIDITY

Reliability means nothing when the problem is Validity. Reliability caps validity

Relationship Between Reliability & Validity


Reliability and validity are two different standards used to gage the usefulness of a test. Though different, they work together. It would not be beneficial to design a test with good reliability that did not measure what it was intended to measure. The inverse, accurately measuring what you desire to measure with a test that is so flawed that results are not reproducible, is impossible. Reliability is a necessary requirement for validity. This means that you have to have good reliability in order to have validity. Reliability actually puts a cap or limit on validity, and if a test is not reliable, it can not be valid. - from the MSCEIT Manual

Das könnte Ihnen auch gefallen