Sie sind auf Seite 1von 4

DELPOSO,JOANNA MARIE B.

BS IOP II-2

PSYCHOMETRICS 2 NOVEMBER 28,2011

Steps in Test Construction


1. Choose the purpose of the instrument Is it a screener? What will people use the test for? What decisions are you trying to make? 2. Identify the construct(s) Review the literature Ask experts 3. Construct initial item pool Try to develop 3 times as many items as you wish to have on the on the test. Develop items based on the purpose of the test and based on who will take the test. Take time now to develop "good" items -- try to cover the entire domain of the construct, but do not write items that are likely to "load" on other constructs. Make items as simple to read as possible. Reverse code some items. 4. Review, Revise and Tryout Conduct an expert review Conduct a bias review Administer the items to a few individuals who are representative of potential test takers and obtain their feedback (if possible). 5. Alpha study Administer the items to a sample of subjects who are representative of potential test users. Do item analysis & choose final items (criteria include coefficient alpha, domain coverage, difficulty level, endorsement rates). 6. Beta study Reliability studies (test-retest, alternate forms, internal consistency). Validity studies (known groups, convergent & discriminant, predictive, factor analysis). Collect normative data from a representative sample. Steps in Constructing the Employee Assistance Program Inventory 1. Choose the purpose of the instrument. Intake/screening tool for professionals in EAPs who provide counseling to working adults. Purpose: to identify rapidly common psychological problems of employed adults to guide referrals or short-term interventions. 2. Identify the construct(s). Based on the authors' experience and the literature, 50 possible assessment areas were identified Survey listing the 50 areas was sent to 200 professionals randomly selected from EAPA database Survey asked to choose 10 areas to best meed needs for initial screening 3. Construct initial item pool. Literature review conducted to identify the behavioral expression of each content area Initial item pool: 344 4. Review, revise, & tryout 5-member bias panel (gender, ethnic background, religious belief) 10 items were identified as having potential bias (4 were deleted and 6 were rewritten) 5-member expert panel of PhD psychologists working in EAPs for scoring Criterion: 4/5 agreement; panel failed to agree on 34 of 344 items 33 items deleted; one was rewritten (6 of these items were also considered problematic by the bias committee) 5. Alpha study Remaining 307 items administered to 215 employed adults Items were eliminated based on a. all scales would contain an equal number of items b. each scale would be as internally consistent as possible while still providing comprehensive sampling of the content area c. when items were similar in terms of item characteristics, items that provided the broadest domain coverage would be retained and items with significant relations with gender or ethnic group would be eliminated. Final items: 120 Alpha ranged from .73 to .92 (M = .86) 6. Beta study: detailed results.

ms > General Steps in Test Construction General Steps in Test Construction 1. Before or after each class period, write ideas for test items on index cards based on the lesson's learning objectives. 2. Produce a test blueprint, plotting the content to be tested against some hierarchy representing levels of cognitive difficulty or depth of processing. (e.g., a commonly used three category hierarchy based on the Bloom taxonomy is knowledge/understanding, application, higher order thinking) 3. Sort test item cards from step one to match blueprint. Create new test ideas as indicated by blue print. 4. Write the first draft of test items on index cards. Include one item per card and indicate the topic and level of difficulty on the card. 5. Put all the cards on the same topic together in order to cross-check questions so that no question gives the answer to another question. 6. Put the cards aside for one or two days. 7. Reread the items from the point of view of the student, checking for construction errors. 8. Order the selected questions logically: 1. Place some simpler items at the beginning to ease students into the exam; 2. group item types together under common instructions to save reading time; 3. if desirable, order the questions logically from a content standpoint (e.g., chronologically, in conceptual groups, etc.). 9. Have someone else review the questions for clarity. 10. Time yourself in actually taking the test and then multiply that by four to six depending on the level of the students. Remember, there is a certain absolute minimum time required to simply physically record an answer, aside from the thinking time. 11. Once the test is given and graded, analyze the items and student responses for clues about well-written and poorly written items as well as problems in understanding of instruction. Record any problems indicated by student responses on the item card so the information can be used in future exams. Revised from "Test Construction: Some Practical Ideas," Marilla D. Svinicki, The University of Texas at Austin

10 Steps of Thesis Writing Writing a thesis is always a very complicated task that requires much time to be spent on thinking, analyzing, researching and drawing the right conclusions. That is why our 10 steps of thesis writing can be very useful. 1st Step of Thesis Writing Choose the topic for your thesis. Make sure you will be able to find enough sources to investigate the topic you have chosen. You do not want to change the topic and be in a hurry in order to prepare your thesis on time. 2nd Step of Thesis Writing Collect the sources. The most important factor promoting successful writing of your thesis is relevant information. Resort to the public library, your university library, the internet resources, latest research papers, etc. 3rd Step of Thesis Writing Make a plan of your thesis. It is necessary that you consult your supervisor on the plan of your thesis. 4th Step of Thesis Writing Compose the introduction of your thesis. Say what you will be investigating and how you are planning to do it. 5th Step of Thesis Writing Present data taken from the sources you have collected. Be very attentive when copying data from the source found. 6th Step of Thesis Writing Interpret your data. This point implies that you express your personal point of view on the basis of what you have read. 7th Step of Thesis Writing Draw a strong conclusion. To make a good conclusion means to answer the questions What did you investigate? Did you manage to achieve your purposes? Did you face any difficulties? What helped you overcome them? Give your recommendations.

8th Step of Thesis Writing Make appendices. It is extremely essential to make them if you want your grade to be high. 9th Step of Thesis Writing Edit the final variant of your thesis. 10th Step of Thesis Writing Enjoy your work done! Reliability and Validity Measurement experts (and many educators) believe that every measurement device should possess certain qualities. Perhaps the two most common technical concepts in measurement are reliability and validity. Any kind of assessment, whether traditional or "authentic," must be developed in a way that gives the assessor accurate information about the performance of the individual. At one extreme, we wouldn't have an individual paint a picture if we wanted to assess writing skills. A. Reliability: Definition The degree of consistency between two measures of the same thing. (Mehrens and Lehman, 1987). The measure of how stable, dependable, trustworthy, and consistent a test is in measuring the same thing each time (Worthen et al., 1993) For example, if we wish to measure a person's weight, we would hope that the scale would register the same measure each time the person stepped on the scale. Another approach to studying consistency would be to have a whole group of people weigh themselves twice (changing scales and/or times and/or the reader and recorder of the measure) and determine whether the relative standing of the persons remains about the same. This would give us an estimate of the reliability of the measure. Or, if we wanted to measure the length of a piece of wood, the tape used better yield the same measure each time. Even if you had someone else remeasure the wood, the result should be consistent. Assume that you gave a student a history test yesterday and then gave the test again today. You found that the student scored very high the first day and very low the second day. It could have been that the student had an off day or that the test is simply unreliable. A student's test score may vary for many reasons. The amount of the characteristic we are measuring may change across time; the particular questions we ask in order to infer a person's knowledge could affect the score; any change in directions, timing, or amount of rapport with the test administrator could cause score variability; inaccuracies in scoring a test paper will affect the scores and finally such things as health, motivation, degree of fatigue of the person, and good or bad luck in guessing could cause score variability. B. Validity 1. Definition: Truthfulness: Does the test measure what it purports to measure? the extent to which certain inferences can be made from test scores or other measurement. (Mehrens and Lehman, 1987) The degree to which they accomplish the purpose for which they are being used. (Worthen et al., 1993) For a test to be valid, or truthful, it must first be reliable. If we cannot even get a bathroom scale to give us a consistent weight measure, we certainly cannot expect it to be accurate. Note, however, that a measure might be consistent (reliable) but not accurate (valid). A scale may record weights as two pounds too heavy each time. In other words, reliability is a necessary but insufficient condition for validity. (Neither validity nor reliability is an either/or dichotomy; there are degrees of each.) Since a single test may be used for many different purposes, there is no single validity index for a test. A test that has some validity for one purpose may be invalid for another. 2. Kinds of Validity Evidence Content Validity Evidence - Refers to the extent to which the content of a test's items represents the entire body of content to be measured. The basic issue in content validation is representativeness. That is, how adequately does the content of the test represent the entire body of content to which the test user intends to generalize? Since the responses to a test are only a sample of a student's behavior, the validity of any inferences about that student depends upon the representativeness of that sample. Two questions to be asked: 1. To what degree does the test include a representative sample of all important parts of the behavioral domain? 2. To what extent is the test free from the influence of irrelevant variables that would threaten the validity of inferences based on the observed scores? Criterion-Related Validity - Refers to the extent to which one can infer from an individual's score on a test how well she will perform some other external task or activity that is supposedly measured by the test in question. That is, is the test score useful in predicting some future performance (predictive validity) or can the test score be substituted for some less efficient way of gathering data (concurrent validity)? Examples of criteria are success in school, success in class, or success on-the-job as an employee. Construct Validity - Words like assertiveness, giftedness, and hyperactivity refer to abstract ideas that humans construct in their minds to help them explain observed patterns or differences in the behavior of themselves or other people. Intelligence, self-esteem, aggressiveness, and achievement motivation, creativity, critical thinking ability, reading comprehension, mathematical reasoning ability, shyness, curiosity, hypocrisy, and procrastination are also examples of constructs. A construct is an unobservable, postulated attribute of individuals created in our minds to help explain or theorize about human behavior. Since constructs do not exist outside the human mind, they are not directly measurable. In other words, the degree to which one can infer certain constructs in a psychological theory from the test scores.

For example, people who are interested in studying a construct such as creativity have probably hypothesized that creative people will perform differently from those who are not creative. It is possible to build a theory specifying how creative people (people who possess the construct creativity) behave differently from others. Once this is done, creative people can be identified by observing the behavior of individuals and classifying them according to the theory. Suppose one wishes to build a paper-and-pencil test to measure creativity. Once developed, the creativity test would be considered to have construct validity to the degree that the test scores are related to the judgments made from observing behavior identified by the psychological theory as creative. If the anticipated relationships are not found, then the construct validity of the inference that the test is measuring creativity is not supported. Face Validity - Refers to whether the test looks valid "on the face of it." That is, would untrained people who look at or take the test be likely to think the test is measuring what its author claims? Face validity often is a desirable feature of a test in the sense that it is useful from a public acceptance standpoint. If a test appears irrelevant, examinees may not take the test seriously, or potential users may not consider the results useful.

Das könnte Ihnen auch gefallen