Sie sind auf Seite 1von 9

Lecture 4 What is Validity and How Do We Determine It?

Lecture 4 What is Validity and How Do We Determine It?


Slide 2 Introduction -Midterm exam will be held on March 11, 2009 in the Irvine Hall from 12 noon to 2 pm -Exam will cover material from first lecture up to todays lecture on validity -Please bring a calculator to the exam -Psychology Unit will be having their annual conference from March 11 to 13, 2009 -Conference will highlight work of both students and faculty -In todays lecture I will describe the concept of validity, -Then I will talk about the different types of validity and -Finally, I will describe the multitrait-multimethod matrix Slide 3 What is validity? -What do we mean when we say something is valid? -Validity is defined as whether a test measures what it purports to measure (Allen & Yen, 2002) - What a measure is usually measuring is a hypothetical construct which is a theory or an idea - It is not reality, according to Plato -Cant see it, feel it or taste it -For example, an intelligence test is valid if it measures intelligence - Unlike reliability, there are no commonly accepted standards for stating that a test is valid as there are for reliability -The validity of a test or measure is established through a series of logical relationships of the test with related and unrelated tests or measures -For example, if we are assessing the validity of an intelligence test, we must first determine what intelligence is and identify its features or characteristics, - Then we must determine how it is similar to related hypothetical constructs, and how it is different from other constructs -From this, a number of hypotheses are made regarding how scores on the intelligence test should be correlated with scores on the test measuring the related constructs -For example, after describing constructs related to intelligence, we might hypothesize that intelligence should be related to academic achievement but should be unrelated to liking the course instructor -Data are then gathered from a sample of participants on the various measures (in this case intelligence and academic achievement, and liking course instructor) and the scores correlated

File = /var/www/apps/conversion/tmp/scratch_1/182835839.doc

Lecture 4 What is Validity and How Do We Determine It? 2 -The pattern of the relationships obtained from the correlation will either support or not support the validity of the measure -So, the validation of a measure depends upon the finding of a pattern of relationships hypothesises amongst a series of measures -This pattern demonstrates the logical connections between what the measure being validated should be related to and what it should not be related to -It is the satisfaction of this logical pattern of relationships that determines if a measure is valid -The figure obtained from the correlation of scores to establish the validity of a test is called the validity coefficient Slide 4 Relationship of validity to reliability and measurement - Validity is related to both measurement and reliability -There is a fine line between describing measurement and describing validity as both validity and measurement deal with how accurately a measure captures a psychological attribute -In the case of measurement, accuracy is based on the match between how measures collect and record data on phenomena -For validity, accuracy is described in terms of the pattern of logical relationships that are satisfied -Validity is also tied to reliability in that a measure must be reliable before it can be valid -If there is no stable variability then it is impossible for validity to be established -If scores on a measure are completely random, and hence completely unstable, then a pattern of logical relationships cannot be established Slide 5 Types of validity - Validity can be divided into seven distinct types: 1) Face validity, 2) Content validity 3) Concurrent validity 4) Predictive validity 5) Construct Validity 6) Convergent validity, and 7) Discriminant validity -Each of these various types of validity is used for a different purpose and assesses validity in a different way

File = /var/www/apps/conversion/tmp/scratch_1/182835839.doc

Lecture 4 What is Validity and How Do We Determine It? 3 Slide 6 Face validity -Face validity is the simplest form of validity -Refers to the appearance of the content of the test or measure -Simply whether the items which make up the measure look like they assess what the measure claims to measure -For example, for a measure of shyness to have face validity it should consist of items which relate to the characteristics of shyness such as fear and awkwardness in social situations -Face validity does not relate to whether the measure truly measures what it purports to measure -Only justthat the measure looks like it could measure the trait -A test can have other types of validity but have little or no face validity, that is, it doesnt look like it measures what it says it is measuring -For example, reaction time is a valid measure of intelligence. It correlates strongly (about 0.80) with the Verbal score on the WAIS. However, reaction time has little face validity as a test of intelligence Another example is the MMPI; this test is said to have no face validity but has other types of validity Slide 7 -Face validity, though simple, is not completely useless or meaningless -Important for gaining respondents acceptance and cooperation with what you are doing -Assessed mainly by an inspection of the items of the test, that is, by just looking at the items to see if they are really asking about what is being measured Slide 8 Content validity - Content validity is simply whether a test samples a specified body of material in the correct proportions -Imagine that I were to construct the mid-term exam completely from material taught in an introductory physics course -You would probably say this is not what you taught miss! The exam is not fair! You gave us a test of physics! -Well you would be right -No matter how good this exam is as a test of physics, it would not adequately reflect the content of the course so it would lack content validity -There are 2 important things here: -Most often used in academic achievement tests where there is a clearly defined and detailed body of material called a curriculum. The test from the curriculum must accurately sample the material and in the right proportion -Indeed, the most important aspect of the validity of an academic achievement test is its content validity -Requires a detailed and comprehensive content domain
File = /var/www/apps/conversion/tmp/scratch_1/182835839.doc

Lecture 4 What is Validity and How Do We Determine It? 4

-Evaluation of content validity starts by carefully and comprehensively defining the construct to be measured -Then material from this body of content is sampled and included in the test -Should ensure that the proportion of material that appears on the test is similar to the proportion of that same material which appears in the content of the curriculum -Content validity is a somewhat subjective process, though, as it involves expert judgments -Of all types of validity, content validity tends to be the most subject to error as it relies on human view and so there will always be variations in views on how the content should be sampled Slide 9 Concurrent validity -In concurrent validity two measures of the same construct are administered within the same testing period -One of the two measures serves as a benchmark or gold standard against which scores on the other test (usually a new measure) are compared - The benchmark or gold standard is an already existing measure of the construct -Scores on the two measures are correlated and if the correlation of the measures is high then we could suggest that the non-benchmark measure (new measure) has concurrent validity -This correlation between the two measures is called a concurrent validity coefficient -For example, imagine that I have developed a new measure of loneliness specially designed to capture the unique social context of Jamaica -To establish the concurrent validity of this Jamaican loneliness scale I administer it to a group of students at the University of the West Indies along with the UCLA loneliness scale, the gold standard of loneliness measures. The UCLA loneliness scale is an already existing loneliness scale -After processing and editing the data I correlate scores on my Jamaican loneliness scale to those of the UCLA loneliness scale -I find that scores on the Jamaican loneliness scale correlate 0.60 with scores on the UCLA loneliness scale -Because of the magnitude of the correlation I would conclude that my measure of Jamaican loneliness has concurrent validity Slide 10 -Two major problems with concurrent validity -First, there must be a gold standard or benchmark measure, that is, an already existing measure of the construct -However, in psychology there are few such measures against which other measures can be evaluated -Sometimes there is no gold standard -Second, if a benchmark test already exists why would one need to develop another measure?
File = /var/www/apps/conversion/tmp/scratch_1/182835839.doc

Lecture 4 What is Validity and How Do We Determine It? 5 -The answer to this is that a new measure is usually created to improve on an already existing measure in some meaningful, by either being longer or shorter, or it might be tapping some additional dimension(s) that researchers think should be included Slide 11 Predictive validity -In predictive validity scores on a measure are used to predict either a future behaviour or performance on a future criterion -If the measure can predict future behaviour or performance reasonably well then it is said to have predictive validity -It is estimated by giving a group of people a test and after a length of time collecting information on some relevant criterion (behaviour o performance) Slide 12 -Scores on the test are then correlated with the performance on the criterion -If the correlation between the test and the criterion is sufficiently high then the test is said to have predictive validity -This correlation is the predictive validity coefficient -For example, imagine I have created a measure of manual dexterity -And I am interested in using my test of manual dexterity to predict how well someone will do as an electronics assembler -I administer my measure of manual dexterity to all applicants for a job in an electronics assembly plant -All applicants are then hired for jobs in the assembly plant -I then follow these people for 12 months and measure how many computers boards they produce each hour -Scores on the test of manual dexterity given 12 months earlier correlated with the number of computer boards they produce after 1 year working as electronic assemblers -I find that scores on the test of manual dexterity are correlated 0.70 with the number of computer boards produced by these workers -In general, the higher correlation between manual dexterity score and later job performance the greater the predictive validity Slide 13 -Two major problems with predictive validity -First, the longer the interval between administering the test and noting performance on the criterion task the lower the correlation or predictive validity coefficients -The interval between testing and later evaluation can be too long that there may be many other variables which influence the criterion -Second problem is that it may be very difficult to find an appropriate criterion against which to evaluate future performance

File = /var/www/apps/conversion/tmp/scratch_1/182835839.doc

Lecture 4 What is Validity and How Do We Determine It? 6 Slide 14 -Together concurrent and predictive validity form criterion related validity because they involve correlation of a new measure with some criterion -Criterion can be another test administered at same time as the new measure (Concurrent validity) -Criterion can be future performance on a measure, task or a future behaviour (Predictive validity) -So far weve discuss validating a measure by just examining the items, and using another measure -The next type of validity we will discuss uses many measures to evaluate the validity of a new measure Slide 15 Construct Validity -Construct validity is the degree to which a test measures the theoretical construct it was designed to assess - A construct is simply the psychological concept which the test developer hopes his/her test will measure -Demonstration of construct validity is an on-going process conducted usually through a series of small studies -Test developer begins the process by carefully defining and specifying the features of the construct the test is supposed to measure -A further step in deifning the construct is include a description of how it is similar and distinct from related constructs -This framework or network of relationships is called a nomothetic network Slide 16 -Using the network of relationships the test developer creates a series of hypotheses about how the new test will correlate with tests of the related constructs -A series of small studies are then conducted to test the various predictions -If the pattern of hypothesized relationships are found then the new test is said to have good construct validity -For example, a test developer who wants to create a measure of loneliness would first precisely define loneliness and how it relates to a series of other psychological constructs -She defines loneliness as a perceived discrepancy between a persons actual social relationships and their desires for social relationships -These desires may be for a difference in the quality of relationships or for a difference in the quantity of social relationships -She suggests that loneliness should be related to how often a person engages in solitary activities, to feelings of sadness, and to thoughts of not belonging, of being worthless and of being undesirable as a social partner

File = /var/www/apps/conversion/tmp/scratch_1/182835839.doc

Lecture 4 What is Validity and How Do We Determine It? 7 -Loneliness should not be strongly related to depression as not all depressed people will be lonely. It should also not be correlated with pleasurable solitary activities such as listening to music, or watching television -Based on this network of relationships, the test developer hypothesizes that her new measure of loneliness will have a strong positive relationship to measures of the quality of social relationships and the quantity of social relationships a person has. Also that it should be moderately but not strongly related to tests of depression. -The test developer then conducts a series of studies which test each of these hypotheses -The pattern of the relationships from the correlations of the scores will determine whether the new test has construct validity Slide 17 -If the pattern of relationships is not found then one of three different conclusions can be made: 1) The research studies are flawed 2) The theory of how construct should be related to others is flawed or incorrect and should be revised 3) The new test does not measure the construct Slide 18 -Construct validity heavily depends on the pattern of relationships between a new measure and measures of similar and different constructs - Two problems with establishing this type of validity -First, the pattern of hypothesized relationships can be quite difficult to find -Not a lot of new tests have been found to have construct validity -Second, the process of evaluating evidence for construct validity, especially of when a hypothesized relationship is not found, can be quite subjective - Campbell and Fiske devloped the Multitrait-Multimethod Matrix (MTMM) to determine construct validity -This involves the use of two or more methods to measure two or more constructs (Allen & Yen, 2002) -The Multitrait-Multimethod Matrix (MTMM) provides a powerful approach to establishing the pattern of relationships needed in construct validity Slide 19 Multitrait-Multimethod Matrix -Gold standard for establishing the construct validity of a measure -Construct validity in Multitrait-Multimethod Matrix consists of convergent validity and discriminant validity -Convergent validity is demonstrated when there is a strong correlation between the scores on different measures of the same construct

File = /var/www/apps/conversion/tmp/scratch_1/182835839.doc

Lecture 4 What is Validity and How Do We Determine It? 8 -Disciminant validity is demonstrated when there is low correlation between tests measuring different construct -In the Multitrait-Multimethod Matrix, the use of two or more methods to measure two or more constructs is based on the view that the construct can be separated from variability due to the methods of collecting data Slide 20 -There are three key features to the MTMM approach -First, similar or identical constructs are measured using different methods and are expected to be strongly correlated -Second, conceptually similar but distinct constructs measured using the same method or different methods are expected to be less correlated -Third, different methods of collecting information are used Slide 21 -An example of the approach will make the approach clear -A test developer wants to validate a measure of hyperactivity for children. -He asks childrens school teachers to complete the measure of hyperactivity for these childrens behaviour at school and also asks these childrens mothers to complete the identical measure for these childrens behaviour at home. -Represents similar measures assessed using different methods -He also asks the childrens school teachers to complete a measure of these childrens restless behaviours at school and asks the mothers to complete the same measure of restlessness for their childrens behaviour at home -Represents a conceptually similar variable measured in different ways Slide 22 -All measures are correlated with each other -Correlations are placed into a correlation matrix -Correlation matrix is like an Excel spreadsheet -Each measure has a spot down the columns -Each measure also has a spot along each row -When a column intersects with each row there is a correlation between two variables Slide 23 -Where a cell in the matrix represents the correlation of one variable with itself there is a correlation of 1.00 (a perfect correlation) -In a Multitrait-Multimethod Matrix the 1.00s along the diagonal are replaced with reliability coefficients

File = /var/www/apps/conversion/tmp/scratch_1/182835839.doc

Lecture 4 What is Validity and How Do We Determine It? 9 Slide 24 -In MTMM Matrix you can see that both the teachers rating of hyperactivity and the mothers rating of hyperactivity are strongly correlated -This suggests that the measure of hyperactivity has convergent validity as correlation of the teachers rating of hyperactivity with the mothers rating of hyperactivity yielded a strong correlation coefficient of 0.84. So this is the same construct being measured using different methods -You can also see that teachers ratings of hyperactivity and mothers ratings of restlessness have low correlation coefficient, that of 0.03. This is 2 disctint constructs being assessed using different methods. -Similarly, teachers rating of hyperactivity and teachers rating of restless have a low correlation coeficient, that of 0.01. Here it the 2 disticnt constructs assessed using the same method. -These low correlations suggest that the hyperactivity measure has discriminant validity as different constructs measured using the same and different methods had low correlation coefficients when their relationships were examined Slide 25 -To interpret a MTMM matrix need to follow several rules: 1) Correlations between the same construct measured using different methods should be large (Convergent Validity) 2) Correlations between different constructs should be low (Discriminant validity) 3) Correlations between different constructs should be smaller than correlations between the same constructs 4) Correlations between different constructs measured using different methods should be lower than correlations measuring the same construct using different methods (No method bias) Slide 26 -Here is another example of the MTMM -A researcher has developed a new student based measure of depression -Correlates this new measure with a scale where teachers rate how depressed the student is -Correlates the new measure with a scale where teachers rate how lonely the student is -Also correlates the new measure of depression with students ratings of loneliness -Does the new measure have construct validity? References 1. Allen, M. J. & Yen, W. M. (2002). Introduction to Measurement Theory. Prospect Heights, Illinois: Waveland Press. 2. Anastasi, A. & Urbina, S. (1997). Psychological Testing (Seventh Edition). Upper Saddle River, New Jersey, USA: Prentice Hall.

File = /var/www/apps/conversion/tmp/scratch_1/182835839.doc

Das könnte Ihnen auch gefallen