Sie sind auf Seite 1von 8

PSYCHOLOGICAL ASSESSMENT ACHIEVEMENT APTITUDE

Assess knowledge or skills of Measure cognitive skills, abilities,


individuals in content where and knowledge as a result of
 BASIC CONCEPT IN ASSESSMENT learning took place overall life experiences
Cumulative impact of life
Linked to a specific program
experiences
History of Testing Measure what has been Used to predict future
learned or achieved performance or reflect potential
Key Events and People Impartiality or absence of Requires personal judgment in
 Chinese were the first to use tests for their public officials personal bias in scoring scoring
 Galton is considered to be the founder of mental tests and Multiple Choice, True-False,
measurement Essays
Matching
 Cattell was the first to use to word mental test in an article
he published in 1890
 1905 - Binet Simon Scale was released started the era of OBJECTIVE SUBJECTIVE
intelligence testing
Performance reflects Performance reflects the difficulty
 1917 - Army Alpha and Army Beta were released first
differences in the speed of of the items the examinee is able
group of intelligence tests
performance to answer correctly
 1918 - Woodworth Personal Data Sheet was released
Easy items with a strict time Given ample time to complete
ushers the era of personality testing
limit problems
 1920s - Scholastic Aptitude Test (SAT) and Rorschach
Distinction between the two
Inkblot test were developed Most are combination of both
is not absolute
 1930s - David Wechsler released the Wechsler Bellevue
Intelligence
 1940s - The Minnesotta Multiphasic Personality Inventory  Typical response tests – are designed to measure the typical
was released behavior and characteristics of examinees
 often referred to as personality tests (non cognitive
Definition of Terms characteristics such as attitudes, behaviors, emotions, and
 Psychological Test - a device or procedure (in which) a interests)
sample of (an individual’s) behavior (is) obtained, evaluated,  Scales or inventories
and scored (using) standardized procedures (AERA et al.,
1999) SPEED POWER
 Psychological Measurement - a set of rules for assigning Objective Personality Tests Projective Personality Tests
numbers to represent objects, traits, attributes, or aka Structured Personality
aka Projective Techniques
behaviors. Tests
*Rules: administration guidelines, scoring criteria
 Psychological Assessment - any systematic procedure for Type of Scores
collecting information that can be used to make inferences  Norm-referenced score interpretations compare an
about the characteristics of people or objects (AERA et al., examinee’s performance to the performance of other people
1999)  Standardization sample serves as the reference group
Tests are one systematic method of collecting  Interpretation is relative to the performance of other
information people
Aside from tests, other tools include interviews,  Criterion-referenced score interpretations compare an
behaviors, historical records etc. examinee’s performance to a specified level of performance
ASSESSMENT involves the integration of  Emphasis is on what the examinee knows or can do
information obtained from multiple sources using  Interpretation is absolute
multiple methods
Broader and more comprehensive process than Uses of Tests
testing  Diagnosis
 Testing, assessment and measurement are sometimes used  Treatment Planning and Evaluation
interchangeably by many professionals  Selection, Placement, and Classification
 Assessment is the contemporary popular and accepted term  Self-understanding
 Evaluation is an activity that involves judging or appraising  Evaluation
 Licensing
the value or worth of something, e.g., assigning grades to
students  Program Evaluation
 Research
 Psychometrics is the science of psychological measurement
 Psychometrician is a psychological or educational Participants in Assessment Process
professional who has specialized in the area of testing  People Who Develop Tests
/assessment  For commercial purposes
 For research purposes
Types of Tests  For teaching and evaluation purposes
 Maximum performance tests – are designed to assess the  People Who Use Tests
upper limits of the examinee’s knowledge and abilities  Those who administer and interpret tests
 referred to as ability tests (including achievement tests)  Those who make decisions based on test results
 test items may be scored as either “correct” or “incorrect”  People Who Take Tests
 examinees are encouraged to give their best  People Who Market and Sell Tests
Subcategories
Achievement vs. Aptitude  BASIC STATISTICS OF MEASUREMENT
Objective vs. Subjective
Speed vs. Power Scales of Measurement
 What is Measurement?
 Set of rules for assigning numbers to represent objects, Normalized Standardized Scores
traits, attributes or behaviors  Standard scores based on underlying distributions that were
not originally normal but were transformed into normal
Level of Measurement distributions
 Nominal Scales classify people into categories, classes, or
sets Standard
Mean Standard Deviation
e.g., Gender, Religion, Ethnicity, College Degree Scores
 Ordinal Scales rank people or objects according to the Stanine cores 5 2
amount of a characteristic they display or posses Wechsler scaled
10 3
scores
e.g., Percentile ranks, age-equivalent, grade equivalent
Normal curve
 Interval Scales rank people or objects like an ordinal scale equivalent (NCE)
50 21.06
but on a scale with equal units
e.g., IQ, other standard scores Other Commonly Used Measures
 Ratio Scales have the properties of interval scales plus a
 Percentile Rank
true zero point  Grade Equivalents
e.g., Height, Weight, Acceleration

DESCRIPTION OF TEST SCORES


 RELIABILITY AND VALIDITY
Measures of Central Tendency
What is Reliability?
 Mean is the arithmetic average of a distribution
 General: Dependability, trustworthiness, and confidence in
 Median is the score or potential score that divides a
something
distribution in had
 In psychological measurement: also refers to stability and
 Mode is the most frequently occurring score in a distribution
consistency
Measures of Variability Characteristics:
 Range is the distance between the smallest and largest
 Common Notion: Reliability of a test?
score in a distribution  Standards for Educational and Psychological Testing (AERA
 Standard deviation is the measure of the average
et al., 1999): Reliability is considered to be a characteristic
distance that scores vary from the mean of the distribution of scores or assessment results, and NOT test themselves
 Variance is a measure of variability that has special
meaning as a theoretical concept in measurement theory Important Points
and statistics
 In the context of measurement, reliability refers to the
consistency or stability of assessment results
Normal Distribution
 Some degree of measurement error is inherent in all
 Is a symmetrical unimodal distribution in which the mean,
measurement
median, and mode are all equal
Classical Test Theory
Meaning of Test Scores
 Classical Test Theory (CTT) – the most influential theory to
help understand measurement issues
Raw Scores
 Also known as True Score Theory
 Charles Spearman laid the groundwork for CTT in the early
 Number of items scored or coded in a specific manner such 1900s
as correct/incorrect, true/false, and so on
 Every score on a mental test is composed of two
 Does not give much information about a person or object components: true score and error score
Xi = T + E
Norm-Referenced Interpretations (Xi is obtained or observed score)
Norms and Reference Groups Measurement Error
 Standardization samples should be representative of the
 Difference between an individual’s obtained and true score
type of individuals to take the test
 CTT focuses on random measurement error - result of
 Normative data need to be current and the samples
chance factors that can either increase or decrease an
should be large enough to produce stable statistical
individual’s observed score
information
 Random events vary from person to person, from test to
test, and from administration to administration
Derived Scores
 Limits the extent to which test results can be generalized
 Standard scores are transformations of raw scores to a
and reduces the confidence we have in test results
desired scale with a predetermined mean and standard
deviation
Sources of Measurement Error
 Linear transformations of raw scores  Content Sampling Error
 z-scores are the simplest of the standard scores and
 Content sampling is typically considered the largest source
indicate how far above and below the mean on the of error in test scores
distribution the raw score is in standard deviation units
 Time Sampling Error
 Measurement error due to time sampling reflects random
Common Standard Scores
fluctuations in performance from one situation to another
and limits our ability to generalize test scores across
Standard Standard
Scores
Mean
Deviation
Applications different situations
Useful in research but difficult to
Z-scores 0 1 Reliability Coefficients
use and interpret
Personality Tests (MMPI, NEO-PI  Reliability can be defined as the proportion of test score
T-scores 50 10
R) variance due to true score differences
IQs 100 15
Achievement and Ability Tests Variance of X + Var of T + Var of E
(Wechsler Scales, K-ABC)
CEEB 500 100 Entrance Exams (GRE, SAT)
Major Types of Reliability Older Concepts
 Test-Retest  Content Validity (Content-related validity)
 Alternate Forms  Criterion Validity (Criterion-related validity
 Simultaneous administration  Construct Validity (Construct-related validity
 Delayed administration  Content Validity (Content-related validity)
 Split-half  Criterion Validity (Criterion-related validity
 Coefficient Alpha or KR20  Concurrent
 Inter-rater  Predictive
 Construct Validity (Construct-related validity
Standard Error of Measurement (SEM)  Convergent
 Reliability are useful when comparing the reliability produced  Divergent
by different tests
 Standard error of measurement (SEM) is a more practical Types of Validity
statistic when interpreting test scores of individuals  Content Validity
 The greater the reliability of a test score, the smaller the  Involves how adequately the test samples the content
SEM and the more confidence we have in the precision of area of the identified construct
test scores  Typically based on professional judgments about the
appropriateness of the test content
Confidence Intervals  Criterion-related Validity
 A confidence interval reflects a range of scores that will  Involves examining the relationships between the test and
contain the individual’s true score with a prescribed external variables that are thought to be direct measures
probability of the construct
 A major advantage of the SEM and the use of confidence  Studies examine the relationship between test scores and
intervals is that they remind us that measurement error is criterion scores using correlation or regression analyses
present in all scores and that we should interpret scores  Construct Validity
cautiously  Involves an integration of evidence that relates to the
meaning or interpretation of test scores
Validity  Evidence can be collected using a wide variety of research
 The degree to which evidence and theory support the strategies and designs
interpretations of test scores entailed by the proposed uses
of the tests (AERA et al., 1999) Sources of Validity Evidence
 Refers to the appropriateness or accuracy of the  Evidence based on TEST CONTENT
interpretation of test scores (Reynolds, 1998)  Evidence based on RELATIONS TO OTHER VARIABLES
 Evidence based on INTERNAL STRUCTURE
Some Important Points  Evidence based on RESPONSE PROCESSES
 “There is a consensus in the profession of psychometrics  Evidence based on CONSEQUENCES OF TESTING
that older concepts of validity as referring to a test are
abandoned in favor of an approach that emphasizes that Important Point
validity refers to the appropriateness or accuracy of  Sources of validity evidence differ in their importance
interpretation of test scores. In other words, it is not according to factors such as the construct being measured,
technically correct to refer to the validity of a test” (p. 155, the intended use of test scores and the population being
Reynolds and Livingston, 2012) assessed
 When test scores are interpreted in multiple ways, each
interpretation needs to be evaluated Evidence Based on Test Content
 Examine the relationship between the content of the test
Threats to Validity and the construct it is designed to measure
 Construct underrepresentation  Item relevance and content coverage are 2 important factors
 Not an adequate representation of the construct to be considered
 Construct-irrelevant variance  Preferred approach for establishing the validity of
 Test measures characteristics, content, or skills that are achievement tests and tests used in the selection and
unrelated to the test construct classification of employees
 Validity is threatened when a test measures either less or
more than the construct it is designed to measure Face Validity
 Other factors (external to test) that can influence validity  Not an approach to finding evidence
 Examinee characteristics (test anxiety, low motivation,  Refers to a test “appearing” to measure what it is designed
fake good, fake bad) to measure
 Test administration and scoring procedures (deviations
from standard procedures, unreliable or biased scoring) Evidence Based on Relation to Other Variables
 Instruction and coaching  Test-Criterion Evidence
 Predictive studies: Test, time interval, criterion is
Remember… measured
 Validity is not an all-or-none concept but exists on a  Concurrent studies: Test is administered and criterion is
continuum measured at the same time
 Reliability is a necessary but insufficient condition for validity  Convergent and Discriminant Evidence (under Construct
 Validity is a unitary concept Validity in the past)
 It is not type of validity but rather types of validity evidence  Convergent: when you correlate a test with existing tests
that measure the same or similar constructs
Current Definition  Discriminant: when you correlate a test with existing tests
 Validity is a unitary concept. It is the degree to which that measure dissimilar constructs
evidence and theory support the interpretations of test  Contrasted Group Studies
scores entailed by the proposed uses of the tests (AERA et  Examining different groups which are expected based on
al., 1999) theory to differ on the construct the test is designed to
measure
Evidence Based on Internal Structure Item Analysis
 By examining the internal structure, we can determine if its  The reliability and validity of test scores are dependent on
actual structure is consistent with the hypothesized structure the quality of the items
of the construct it measures  If you can improve the overall quality of the individual items,
 Factor Analysis you will improve the overall quality of the test

Item Difficulty Index (Item Difficulty Level)


 ITEM DEVELOPMENT AND ITEM ANALYSIS  Defined as the percentage or proportion ot test takers who
correctly answer the item
Item Development  For maximizing variability among test takers, the optimal
 Item Formats item difficulty is 0.50
 Objective or subjective (how the items are scored)  Different levels are desirable in many testing applications
 Selected-response (multiple choice, true or false,  For typical response items, percent endorsement indicates
matching items) or constructed response (identification, the percentage of examinees who responded in an item to a
fill in the blank, short answer, essay) items given manner

Selected Response Items Item Discrimination


 Refers to how well an item can accurately discriminate
STRENGTHS WEAKNESSES between test takers who differ on the construct being
Typically include a relatively measured
large number of selected- Relatively difficult to write  One way to calculate: difference in performance between
response items those who score well on the overall test and those who
Adequate sampling Not able to assess all abilities score poorly
Can be scored in an efficient,  As a general rule, it is recommended that that items with D
Subject to random guessing values over .30 are acceptable and those below 0.30 should
objective and reliable manner
Flexible and can be used to be reviewed and possible revised
assess a wide range of abilities
Can reduce the influence of Item Discrimination Index
certain construct irrelevant  Item-total correlation is another way of computing item
factors discrimination index

Constructed Response Items


 DEVELOPING A PSYCHOLOGICAL TEST
STRENGTHS WEAKNESSES
Phases:
Often easier to write Cannot include as many
I. Test Conceptualization
Well suited for assessing higher
More difficult to score in a II. Specification of Test Structure and Format
order cognitive abilities and
reliable manner III. Planning Standardization and Psychometric Studies
complex task performance
IV. Plan Implementation
Eliminate random guessing Vulnerable to feigning
Vulnerable to the influence of Phase I. Test Conceptualization
construct irrelevant factors  Conduct a review of literature and develop a statement of
need for the test
General Item Writing Guidelines  Describe the proposed uses and interpretations of results
 Provide clear directions from the test
 Present the item in as clear and straightforward a manner as  Decide who will use the test and why
possible  Develop conceptual and operational definitions of constructs
 Develop items and tasks that can be scored in a decisive you intend to measure
manner  Determine whether measures of dissimulation are needed,
 Avoid inadvertent cues to the answers and if so what kind
 Ensure that items are contained on one page
 Tailor the items to that target population Phase II. Specification of Test Structure and Format
 Avoid using biased or offensive language
 Determine how many items to include Describe the test in terms of
1. Age range appropriate for this measure
Typical Response Item Guidelines 2. Testing format (e.g., individual or group, print or
 Focus on thoughts, feelings and behaviors and NOT facts computerized); who will complete the test (examiner, examinee, or
 Limit statements to a single thought, feeling or behavior other informant)
 Avoid statements that everyone will endorse in a specific 3. The structure of the test (subscales and composites) and
manner how will they be organized
 Include statements that are worded in both 4. Written table of specifications
positive/favorable and negative/unfavorable directions 5. Item formats
 Use an appropriate number of options (4 or 5) a. Number of items for each subtest or subscale
 Weigh the benefits of using an odd or even number of b. The type of medium required (verbal cue, visual cue,
options manipulables, etc.)
 Likert: generally recommended to use an odd number with 6. WRITE ITEMS
the middle choice being neutral or undecided 7. Summary of instructions for administration and scoring
 For rating scales and Likert items, clearly label the options 8. Methods for item development: how items will be
 Minimize the use of specific determiners determined (content experts), try-out and final selection
 With young children, you may want to structure the scale as
an interview Phase III. Planning Standardization and Psychometric Studies
 Describe the reference or norm group and sampling plan for
standardization
 Describe your choice of scaling methods’
 Outline the reliability studies to be performed  Distracters: should reflect common misconceptions. Should
 Outline the validity studies be grammatically consistent with the stem—otherwise the
 List the component of the test (manual, record form, test grammar will give away which distracter is correct.
booklet, or stimulus materials)  You can assess higher-level thinking with multiple choice
questions.
Phase IV. Plan Implementation
 Re-evaluate the test content and structure Matching
 Prepare the test manual  Matching format: a measurement format that requires
 Submit a test proposal learners to classify a series of examples using the same
alternatives.
How Difficult Should the Items Be?  Content should be homogeneous (all material of the same
 Each test should include: type).
 Some easy tasks that most everyone should perform  Use more statements than alternatives so students can’t use
satisfactorily; elimination to get items right.
 Some difficult tasks that only a few will perform  Let students know that alternatives can be used more than
satisfactorily; and once or not at all (keeps students from guessing instead of
 Some tasks that students will perform with varying knowing).
degrees of success.  Keep the material all on one page—if you have material for
more than one page, break it up into two different groups.
What kind of tasks will be given in the tests to the students?  Don’t overload students’ working memories with excessively
 The choice of the kind of tasks to use in a test should be long matching items—if you have more than ten possibilities,
determined, taking consideration of the following: break the item into two.
 the nature of the subject matter
 the specific objectives set for the students True – False
 the type of information needed for evaluative purposes  True-false format is a measurement format that includes
 the degree and developmental level of students statements of varying complexity that learners have to judge
 the size of the group to be tested as being correct or incorrect.
 Don’t put both a true fact and a false fact in the same item.
Constructed – Response Items  “Most” is a hint that the item is true.
 Constructed-response items present tasks that require the  “Never” and “always” are hints that the item is false.
examinee to create responses within the structure provided  Negative wording can be confusing.
by each item.
 The demand on the student to supply information is from Completion
least to most, moving from completion, to short-answer, to  Completion format is a measurement format that includes a
essay. question or an incomplete statement that requires the
 Completion learner to supply appropriate words, numbers, or symbols.
 Short-answer essay  It is very difficult to create completion items where only one
 Extended-answer essay answer is correct.
 These items usually measure low-level forms of thinking.
Analyzing the Test
 Quantitative item analysis Essay
 Difficulty Index --- proportion of students who answered  Essay format is a measurement format that requires
the item correctly. students to make extended written responses to questions
 Discrimination Index --- measure of the extent to which a or problems.
test item discriminates or differentiates between students  They assess creative and critical thinking
who do well on the overall test and those who do not do  They measure progress students make in creating and
well on the overall test. defending thesis statements.
 They change how students study and learn.
Establishing Validity in Quantitative Research  Scoring them is a challenge.
 Empirical validation  They can be ambiguous.
 Theoretical validation
Rubrics
Empirical Validation  A rubric is a scoring scale that describes the criteria for
 Criterion or pragmatic validity grading.
 Concurrent validity  Rubrics help students to plan the material that will be
 Predictive validity assessed.
 Establish criteria based on essential elements that must be
Theoretical Validation present in students’ work.
 Face validity  Decide on number of levels of achievement for each
 Content validity criterion.
 Construct validity  Develop clear descriptors for each level of achievement.
 Determine a rating scale for entire rubric.
Multiple Choice
 Multiple-choice format is a measurement format that Performance Assessment
consists of a question or statement, called a stem, and a  A form of assessment in which students demonstrate their
series of answer choices. The individual responding to the knowledge and skill by carrying out an activity or producing
items chooses the correct or best answer. a product.
 Distracters: the incorrect alternatives, so-called because they  Specify the type of performance.
are designed to distract students who don’t understand the  Select the focus.
content being measured in the item.  Structure the evaluation setting—making it realistic but
 Stem: the question part of a multiple-choice item. practical.
 Design evaluation procedures
Stem and Distractors
 Stem: should pose one question or problem.
Evaluating Performance Assessment Products/Processes Step 1. Referral Question
 Use systematic observation, the process of specifying criteria  Clinical Setting
for acceptable performance on an activity and taking notes  Diagnosis and interventions
during observation based on the criteria. -psychiatric cases (mood disorder, psychotic symptoms)
 Checklists are written descriptions of dimensions that must -drug rehab cases (substance disorder)
be present in an acceptable performance of an activity. -career development
These help you to keep track of student performance. -personal growth
 Rating scales are written descriptions of the evaluative  Source of Referral
dimensions of an acceptable performance of an activity and Psychiatrists
scales of values on which each dimension is rated. Clinical psychologists
Pastoral /marital counselors
Exhibition Social workers
 A performance test or demonstration of learning that is Self-referral
public and usually takes an extended time to prepare.
 Examples: a music recital, an art exhibit, a project that is  Psycho-educational
presented to a class.  Diagnosis, placement, and interventions
-Cognitive delay
Common Error No. 1 -Learning disability
 Double-barreled questions -Attention deficit
Single questions that ask for opinions about two different -Behavioral and emotional problems
things  Source of Referral
Teachers
Common Error No. 2 Guidance counselors
 Vague, ambiguous words School Principal/ Directress
When the usual response to a question is “What do you Parents
mean?” then this item still has words that are not clear in
meaning to the respondents.  Forensic
 Assessing competency of a witness
Common Error No. 3  Being an expert witness in marital nullification and child
 Overlapping alternatives custody
One cardinal rule for question writing is to make the
alternatives mutually exclusive, i.e., a respondent can give  Industrial
one and only one answer to a question  Hiring, promotion, and retrenchment
 Training purposes
Common Error No. 4  Source of Referral
 Double negatives Supervisors
Any question that asks respondents whether they approve or Managers
disapprove of prohibiting something, not buying something, HR Consultants
or not having something, is a potential source of confusion
Step 2. Evaluate the Referral Question/ Identify the Problems
Common Error No. 5  Initial request for an evaluation is not usually adequately
 Intentions to act stated
Questions that ask people to predict their own behavior  Gather more data during interview
 Clarify information
Common Error No. 6  Go beyond what was obviously stated
 Leading questions  Need to uncover hidden agendas, unspoken expectations
Questions that suggest the right answer  Contact referral source to carefully evaluate the referral
question.
Common Error No. 7
 Loaded questions Step 3. Plan Data Collection
Questions that are emotionally charged  Assumption – take on and do assessment
 Consider the nature of the problem in relation to the
Common Error No. 8 adequacy of the tests;
 The false premise  Nature of the problem
A false premise or a premise that not everyone will agree  Adequacy and psychometric properties of the tests to be
with is embedded into the question used, and
 The specific applicability of the tests to an individual’s
unique situation
 THE ASSESSMENT PROCESS  The choice of assessment tools depends on
 The nature of the presenting problems,
Process of Assessment  The skills and perspectives of the psychologist,
 Good assessment begins with translating requests for  The objectives and willingness of the client, (consider
consultation into questions that can be meaningfully resistance or willingness)
answered  Practical matters such as cost and time (value to your
work/ profession)
I. Referral Question
II. Evaluate the Referral Question/ Identify the Problems Step 4. Collect Data
III.Plan Data Collection  4 Pillars of assessment (Sattler): Information may come
IV. Collect Assessment Data from varied sources:
V. Formulate Hypotheses (report opinions & recommendations) 1 - Tests scores (most frequent)
Integrate Data (analyze & integrate data) 2 - Personal history
VI. Communicate Findings (report & oral feedback) 3 - Behavioral assessment
4 - Interview data
 Test scores are usually insufficient to answer the referral Principle IV. Professional and Scientific responsibilities to
question Society
 Multiple sources and, using these sources, check to assess
the consistency of the observations they make. General Ethical Standards and Procedures
I. How we resolve ethical issues in our professional lives and
Step 4. Collect Data (In Sum) communities
1 - Results (Scores) on Psychological Tests – II. How we adhere to the highest standards of professional
Tests themselves are merely one tool for obtaining data. competence
 Ability tests III. How we respect for the rights and dignity of our clients, our
intelligence peers, our students, and our other stakeholders in the
aptitude profession and scientific discipline
achievement IV. How we maintain confidentiality in the important aspects of
 Personality tests our professional and scholarly functions
structured personality V. How we ensure truthfulness and accuracy in all our public
projective statement
 Checklists and inventories VI. How we observe professionalism in our records and fees
interest
self-reports Ethical Issues in Assessment

2 - Personal history – of equal importance because it Bases for Assessment


provides a context for understanding the client’s current  Expert opinions are based on substantial information and
problems and, through this understanding, renders the test appropriate assessment techniques
scores meaningful
 History is even more significant in making Informed Consent in Assessment
predictions and in assessing the seriousness of  Educating clients about the nature of services, financial
condition than test scores arrangements, potential risks, and limits of confidentiality
 Additional Data: discussion with other
professionals familiar with person or situation in Assessment Tools
question (physician, teacher, therapist, lawyer,  Select and administer tests which are pertinent to the
social worker, etc) reasons for referral and purpose of the assessment
 Other sources: journals, school records, previous  Methods and procedures that are consistent with current
psychological observations, medical records, police scientific and professional developments
reports  Tests that are standardized, valid, and reliable and has a
normative data
3 - Behavioral Observations: Interview and test  Tools that are appropriate to the language, competence and
behaviors, classroom observations other relevant characteristics

4 - Interview data: Client or significant others (family Choosing the Appropriate Instrument
member, caregiver, friend or colleague)
Test worthiness
Step 5: Interpret Data and Formulate Hypotheses  Reliability
 The crucial part of assessment is often referred to as data  Validity
processing, i.e., translating raw data into inferences  Normative base
 Description of the client should not be a mere labeling or  Cross-cultural fairness
classification, (describe behaviors, make sense of test scores  Practicality of tests
in relation to ref question)
 A deeper and more accurate understanding of the person Test Administration
 Test should be administered APPROPRIATELY as defined by
Step 6. Communicate Assessment Data the
 Written psychological report - an organized presentation of  Way they were established and standardized
assessment results  Alterations should be noted and
 Results must be presented in a way that is clear, relevant to  Interpretations of test data ADJUSTED if testing conditions
the goal of assessment, and useful to the intended were not ideal
consumer
 Clarity of report is prerequisite to evaluating its relevance Obsolete and Out-dated Test Results
and usefulness  We do not base interpretations, conclusions and
 The writer may feel he understands the client, but does the recommendations on out-dated test results.
reader understand the writer? (Nietzel, p. 117)  We do not provide interpretations, conclusions, and
 Watch out for recommendations on obsolete tests
-excessive length,
-excessive brevity, Interpreting Assessment Results
-technical jargon (statistics or esoteric test scores),  Under no circumstances should we report the test results
-lack of coherent organization without taking into consideration the validity, reliability, and
 Usefulness appropriateness of the test.
-Does the information it contains add anything important to  We interpret assessment results while considering the
what we already know about the person / client? purpose of the assessment and other factors such as the
client’s test taking abilities, characteristics, situational,
personal, and cultural differences.
 ETHICAL PRINCIPLES AND ASSESSMENT
Release of Test Data
Principle I. Respect for Dignity of Persons and Peoples  We do not release test data in the forms of raw and scaled
Principle II. Competent Caring for the Well-Being of Persons scores, client’s responses to test questions or stimuli, and
and Peoples notes regarding the client’s statements and behaviors during
Principle III. Integrity the examination unless regulated by the court.
Explaining Assessment Results making interpretations difficult
 Only to the sources of referral and with a written permission
from the client if it is a self-referral  Projective techniques ...
 Non-technical language 1. Are highly controversial
2. Are susceptible to faking
Test Security 3. Are routinely used for purposes for which they are invalid
 The administration and handling of all test materials or poorly supported by research
(manuals, keys, answer sheets, reusable booklets, etc.) shall 4. Scoring can be unreliable or poor
be handled only by qualified users or personnel 5. Norms are often non-existent, poor, or misleading
6. May be biased against minority groups or those who live
Assessment by Unqualified Persons outside North America
 We do not promote the use of assessment tools and
methods by unqualified persons except for training purposes Projective Assessment and School Psychology: Contemporary
with adequate supervision Validity Issues and Implications for Practice Miller Nickerson(
2006)
Test User Competence  No incremental validity
 Adequate knowledge about testing  Treatment utility of projective techniques with children and
 Familiarity with tests adolescents is lacking
 3-level qualifications  Many are relying on their own clinical experience and
Level A professional judgment when interpreting projective
Level B techniques
Level C  Negative correlations between professionals overconfidence
and diagnostic accuracy
Test construction  Use of projective techniques would appear inconsistent with
 We develop tests and other assessment tools using current best practices in school psychology assessment
scientific findings and knowledge, appropriate psychometric  Techniques are generally not useful for identifying
properties, validation, and standardization procedures Significant variables that are causing, supporting, or
maintaining the problem
Confidentiality Interventions that will effectively and efficiently resolve the
problem
 Protect the client information
 Confidentiality is an ETHICAL GUIDELINE and NOT A LEGAL
RIGHT
 PRIVILEGED INFORMATION - legal term that ensures the
rights of professionals not to reveal information about their
clients

When can one reveal confidential information?


1. If client is in danger of harming self or someone else
2. If a child is minor and the law states that the parents have a
right to information about their child
3. If a client asks you to break confidentiality (testimony is
needed in court)
4. If you are bound by the law to break confidentiality
5. To reveal information to supervisor in order to benefit the
client
6. When you have a written agreement from your client to reveal
information to specified sources

 OTHER ISSUES

Anastasi (1982), p. 564


 Projective techniques present a curious discrepancy between
research and practice. When evaluated as psychometric
instruments, the large majority make a poor showing. Yet
their popularity in clinical use continues to be unabated.

Criticisms of Lilienfeld, Wood, Garb (2000)


Norms
Validity
Construct
Predictive
Reliability
Incremental Validity
The extent to which an instrument contributes
information above and beyond other information
Treatment Utility / Validity

 Most of these techniques do not include


Standardized stimuli and testing instructions
Systematic algorithms for scoring the responses to these
stimuli
Well calibrated norm for scoring the responses

Das könnte Ihnen auch gefallen