Sie sind auf Seite 1von 10

A FUNDAMENTAL CONUNDRUM IN PSYCHOLOGYS STANDARD MODEL OF MEASUREMENT AND ITS CONSEQUENCES FOR PISA GLOBAL RANKINGS. Dr.

Hugh Morrison Formerly The Queens University of Belfast ( drhmorrison@gmail.com) INTRODUCTION This paper is concerned with current approaches to measurement in psychology and their use by organisations like the Organisation for Economic Co-operation and Development (OECD) to hold the education systems of nation states to global standards. The OECDs league table the Programme for International Student Assessment (PISA) has the potential to throw a countrys education system into crisis. For example, Ertl (2006) documents the effects of so-called PISA-shock in Germany, and Takayama (2008) describes a similar reaction in Japan. Given that a countrys PISA ranking can play a role in decisions concerning foreign direct investment, it is important to confirm that the measurement model which produces the ranks is sound. Moreover, the OECD has already spread its remit beyond the PISA league table to include teacher evaluation through its Teaching and Learning International Survey (TALIS). The OECD is currently developing PISA-like tests to facilitate global comparisons of the education on offer in universities through its Assessment of Higher Education Learning Outcomes (AHELO) programme: Governments and individuals have never invested more in higher education. No reliable international data exists on the outcomes of learning: the few studies that exist are nationally focused (Rinne & Ozga, 2013, p. 99). Given the sheer global reach of the OECD project, it is important to investigate the coherence of the measurement model which underpins its data. At the heart of 21st century approaches to measurement in psychology is the Generalised Linear Item Response Theory (GLIRT) approach (Borsboom, Mellenbergh and Van Heerden, 2003, p. 204) and the OECD uses Item Response Theory (IRT) to generate its PISA ranks. A particular attraction of IRT for the OECD is its claim that estimates of examinee ability are item-independent. This is vital to PISAs notion of plausible values because each examinee only takes a subset of items from the whole item battery. Without the Rasch models claim to item-independent ability measures, PISAs assertion that student performance can be reported on common scales, even when these students have taken different subsets of items, would be invalid. This paper will focus on the particular IRT model used by OECD, the so-called Rasch model, but the arguments generalise to all IRT models. Proponents of the model portray Rasch as closing the gap between psychological measurement and measurement in the physical sciences. Elliot, Murray and Pearson (1978, pp. 25-26) claim that Rasch ability scores have many similar characteristics to physical measurement and Wright (1997, p. 44) argues that the arrival of the Rasch model means that there is no methodical reason why social science cannot become as stable, as reproducible, and hence as useful as physics. This paper highlights the incoherence of the model. THE RASCH MODEL AND ITS PARADOX The Rasch model is defined as follows: P(X_is=1 | _(s,) _i)= e^((_s-_i))/(1+ e^((_s-_i)) ) X_is is the response (X) made by subject s to item i;

_(s )is the trait level of subject s; _i is the difficulty of item i; and X_is=1 indicates a correct response to the item. On the face of it, the model uses a mathematical function to allow the psychometrician to compute the probability that a randomly selected individual of ability will provide the correct response to an item of difficulty . A particular ability and difficulty value will be chosen for illustration, but the analysis which follows has universal application. When the values = 1 and = 2, for example, are substituted in the Rasch model, a scientific calculator will quickly confirm that the probability that an individual of ability = 1 will respond correctly to an item of difficulty = 2 is given as 0.27 approximately. It follows that if a large sample of individuals, all with this same ability, respond to this item, 27% will give the correct response. In the Rasch model the abilities specified in the model are the only factors influencing examinees responses to test items (Hambleton, Swaminathan & Rogers, 1991, p. 10). This results in a paradox. If a large sample of individuals of exactly the same ability respond to the same item, designed to measure that ability, why would 27% get it right and 73% get it wrong? If the item measures ability and the individuals are all of equal ability, then surely the model must indicate that they all get it right, or they all get it wrong? DOES THE RASCH MODEL REALLY REPRESENT AN ADVANCE ON CLASSICAL TEST THEORY? The Rasch model is portrayed as a radical advance on what went before classical test theory (CTT). In classical test theory, [p]erhaps the most important shortcoming is that examinee characteristics and test characteristics cannot be separated: each can be interpreted only in the context of the other. The examinee characteristic we are interested in is the ability measured by the test (Hambleton, Swaminathan & Rogers, 1991, p. 2). An examinees ability is defined only in terms of a particular test. When the test is hard, the examinee will appear to have low ability; when the test is easy, the examinee will appear to have higher ability. What do we mean by hard and easy tests? The difficulty of a test item is defined as the proportion of examinees in a group of interest who answer the item correctly. Whether an item is hard or easy depends on the ability of the examinees being measured, and the ability of the examinees depends on whether the items are hard or easy! (Hambleton, Swaminathan & Rogers, 1991, pp. 2-3) Measures of ability in the Rasch model, on the other hand, are claimed to be completely independent of the items used to measure such abilities. This is vital to the computation of plausible values because no student answers more than a fraction of the totality of PISA items. A puzzle emerges immediately: if the Rasch model treats as separable what classical test theory treats as profoundly entangled with Rasch regarded as a significant advance on classical test theory why does the empirical data not reflect two radically different measurement frameworks? Based on large scale comparisons of item and person statistics, Fan (1998) notes: These very high correlations indicate that CTT- and IRT-based person ability estimates are very comparable with each other. In other words, regardless of which measurement framework we rely on, the same or very similar conclusions will be drawn regarding the ability levels of individual examinees (p. 8), and concludes: the results here would suggest that the Rasch model might not offer any empirical advantage over the much simpler CTT framework (p. 9). Fan (1998) confirms Thorndikes (1962, p.

12) pessimism concerning the likely impact of IRT: For the large bulk of testing, both with locally developed and standardized tests, I doubt that there will be a great deal of change. The items that we select for a test will not be much different, and the resulting tests will have much the same properties. In what follows, the case is made that in the Rasch model, just as in Classical Test Theory, ability cannot be separated from the item used to measure it. Raschs model is shown to be incoherent and this has clear consequences for the entire OECD project. Moreover, the arguments presented here undermine psychologys standard measurement model (Borsboom, Mellenbergh & van Heerden, 2003) with implications for all IRT models and Structural Equation Modelling. THE RASCH MODEL: EARLY INDICATIONS OF INCOHERENCE The first hints of Raschs confusion appear in the early pages of his 1960 treatise which sets out the Rasch model, Probabilistic Models for Some Intelligence and Attainment Tests. Raschs lifelong obsession captured in his closely associated notions of models of measurement and specific objectivity with measurement models capable of application to the social and natural sciences can be recognized in his portrayal of the Rasch model. In constructing his model Rasch (1960, p. 10) rejects deterministic Newtonian measurement for the indeterminism of quantum mechanics: For the construction of the models referred to I shall take recourse to some points of view of a more general character. Into the system of classical physics enter a number of fundamental laws, e.g. the Newtonian laws. A characteristic property of these laws is that they are deterministic. None the less it should not be overlooked that the laws do not give an accurate picture of nature. In modern physics the deterministic view has been abandoned. No deterministic description for e.g. radioactive emission seems within reach, but for the description of such irregularities the theory of probability has proved an extremely valuable tool. Rasch (1960, p. 11) likens the unmeasured individual to a radioactive nuclide about to decay. Quantum mechanics teaches that, unlike Newtonian mechanics, if one had complete information about the nuclide, one still couldnt predict the moment of decay with accuracy. Indeterminism is a constitutive feature of quantum mechanics: one cannot know, even if one had complete knowledge of the universe, what will happen next to a quantum system. Irreducible uncertainty applies. For Rasch (1960, p. 11): Where it is a question of human beings and their actions, it appears quite hopeless to construct models which will be useful for purposes of prediction in separate cases. On the contrary, what a human being actually does seems quite haphazard, none less than radioactive emission. Rasch (1960, p. 11) makes clear his rejection of deterministic Newtonian models: This way of speaking points to the possibility of mapping upon models of a kind different from those used in classical physics, more like the models in modern physics models that are indeterministic. Quantum indeterminism has implications for Raschs models of measurement. In quantum mechanics, measurement doesnt simply produce information about some pre-existing state. Rather, measurement transforms the indeterminate to the determinate. Measurement causes what is indeterminate to take on a determinate value. In the classical model which Rasch rejects, measurement is simply a process of checking up on what pre-existed the act of measurement, while quantum measurement causes the previously indeterminate to take on a definite value. However, latent variable theorists in general, and Rasch in particular, treat ability as an intrinsic attribute of the person, and they view measurement as an act of checking up on that attribute. The early pages of Raschs (1960) text raise doubts about his understanding of the central mathematical conceit of his model: probability. One gets the clear impression that Rasch associates

probability with indeterminism. But completely determinate situations can involve probability. The outcome of the toss of a coin is completely determined from the moment the coin leaves the throwers hand. If one had knowledge of the initial speed of projection, the angle of inclination of the initial motion to the horizontal, the initial angular momentum, the local acceleration of gravity, and so on, one could use Newtonian mechanics to predict the outcome. Probability is invoked because of the coin-throwers ignorance of these parameters. Such probabilities are referred to as subjective probabilities. In modern physics, uncertainty is constitutive and not a consequence of the limitations of human beings or their measuring instruments. Quantum physicists deal in objective probability. Finally, the notion of separability or specific objectivity as Rasch labelled it, is absolutely central to his thinking: Raschs demand for specific objective measurement means that the measure of a persons ability must be independent of which items were used (Rost, 2001, p. 28). However, quantum mechanics is founded on non-separabilty; one cannot break the conceptual link between what is measured and the measuring instrument. The mathematics of the early pages of Rasch (1960) do not auger well for the mathematical coherence of his model, but it is important to set out the case against the model with greater rigour. BOHR AND WITTGENSTEIN: INDETERMINISM IN PSYCHOLOGICAL MEASUREMENT A possible source of Raschs efforts to find models of measurement which would apply equally to both psychometric measurement and measurement in physics was the writings of Raschs famous countryman, Niels Bohr. (Indeed, Rasch attended lecture courses in mathematics given by the great physicists brother.) Bohr argued for all of his professional life that there existed a structural similarity between psychological predicates and the attributes of interest to quantum physicists. Although he never published the details, he believed he had identified an epistemological argument common to both fields (Bohr, 1958, p. 27). For Bohr, no psychologist has direct access to mind just as no physicist has direct access to the atom. Both disciplines use descriptive language which was developed to make sense of the world of direct experience, to describe what cannot be available to direct experience. Bohr summarized this common challenge in the question, How does one use concepts acquired through direct experience of the world to describe features of reality beyond direct experience? Given the central preoccupation of this paper, Bohrs words are particularly striking: I want to emphasize that what we have learned in physics arose from a situation where we could not neglect the interaction between the measuring instrument and the object. In psychology, we meet the quite similar situation (Favrholdt, 1999, p. 203). Also, prominent psychologists echo Bohrs thinking: The study of the human mind is so difficult, so caught in the dilemma of being both the object and the agent of its own study, that it cannot limit its inquiries to ways of thinking that grew out of yesterdays physics (Bruner, 1990, p. xiii). Given that Bohr never developed his ideas for the epistemological argument common to both fields, what follows also addresses en passant a lacuna in Bohr scholarship. If all this sounds fanciful (after all, what possible parallels can be drawn between Raschs radionuclide on the point of decaying and an individual on the point of answering a question?) it is instructive to return to Raschs (1960, p. 11) claim that what a human being does seems quite haphazard, none less than radioactive emission. In fact there are striking parallels between the experimenters futile attempts to predict the moment of decay and the psychometricians attempts to predict the childs response to a (hitherto unseen) addition problem such as 68 + 57 = ?

If one restricts oneself to all of the facts about the nuclide, the outcome is completely indeterminate. Similarly, Wittgensteins celebrated rule-following argument (central to his philosophies of mind, mathematics and language), set out in his Philosophical Investigations, makes clear that if one restricts oneself to the totality of facts (inner and outer) about the child, these facts are in accord with the right answer (68 + 57 = 125) and an infinity of wrong answers. Mathematics will be used for illustration but the reasoning applies to all rule-following. The reader interested in an accessible exposition of this claim is directed to the second chapter of Kripkes (1982) Wittgenstein on Rules and Private Language. (The reader should come to appreciate the power of the rule-following reasoning without being troubled by Kripkes questionable take on the so-called skeptical argument.) The author will now attempt the barest outlines of Wittgensteins writing on rule-following . By their nature, human beings are destined to complete only a finite number of arithmetical problems over a lifetime. The child who is about to answer the question 68 + 57 = ? for the first time has, of necessity, a finite computational history in respect of addition. Through mathematical reasoning which dates back to Leibniz, this finite number of completed addition problems can be brought under an infinite number of different rules, only one of which is the rule for addition. In short, any answer the child gives to the problem can be demonstrated to be in accord with a rule which generates that answer and all of the answers the child gave to all of the problems he or she has tackled to date. If one had access to the totality of facts about the childs achievements in arithmetic, one couldnt use these facts to predict the answer the child will give to the novel problem 68 + 57 = ? because one can always derive a rule which generates the childs entire past problem-solving history and any particular answer to 68 + 57 = ? Now what of facts concerned with the contents of the childs mind? Surely an all-seeing God could peer into the childs mind and determine which rule was guiding the childs problem-solving? By substituting the numbers 68 and 57 into the rule, God could predict with certainty the childs response. Alas, having access to inner facts (about the mind or brain) wont help because having a rule in mind is neither sufficient nor necessary for responding correctly to mathematical problems. Is having a rule in mind sufficient? Clearly not since all pupils taking GCSE mathematics, for example, have access to the quadratic formula and yet only a fraction of these pupils will provide the correct answer to the examination question requiring the application of that formula. Is having the rule in mind necessary? Once again, clearly not because one can be entirely ignorant of the quadratic formula and yet produce the correct answers to algebraic problems involving quadratics using alternative procedures like completing the square, graphical methods, the Newton-Raphson procedure, and so on. It is important to be clear what is being said here. If one could identify an addition problem beyond the set of problems Einstein had completed during his lifetime, is the claim that one couldnt predict with certainty Einsteins response to that problem? Obviously not. But the correct answer and an infinity of incorrect answers are in keeping with all the facts (inner and outer) about Einstein. When one is restricted to these facts, Einsteins ability to respond correctly is indeterminate. In summary, before the child answers the question 68 + 57 = ? his or her ability with respect to this question is indeterminate. The moment he or she answers, the childs ability is determinate with respect to the question (125 is pronounced correct, and all other answers are deemed incorrect). One might portray this as follows: before responding the child is right and wrong and, at the moment of response, he or she is right or wrong.

THE PROBLEM WITH THE RASCH MODEL Ability only becomes determinate in context of a measurement; its indeterminate before the act of measurement. The conclusion is inescapable ability is a relational property rather than something intrinsic to the individual, as psychologys standard measurement model would have it. A definite ability cannot be ascribed to an individual prior to measurement. Ability is a joint property of the individual and the measurement instrument; take away the instrument and ability becomes indeterminate. It is difficult to escape the conclusion that ability (and intelligence, and self-concept, and so on) is a property of the interaction between individual and measuring instrument rather than an intrinsic property of the individual. If psychological constructs were viewed as joint properties of individuals and measuring instruments, then intractable questions such as what is intelligence?, what is memory? need no longer trouble the discipline. What can be concluded in respect of Rasch? It is clear that the Rasch model is no more capable of separating ability from the item used to measure it than was its predecessor, classical test theory. Pick up any textbook on IRT and one finds the same assumption stated again and again in model development: individuals carry a determinate ability with them from moment to moment and measurement involves checking up on that ability. The ideas of Bohr and Wittgenstein can be used to reject this; for them, measurement effects a jump from the indeterminate to the determinate, transforming a potentiality to an actuality. In simple terms it can be argued that ability has two facets; it is indeterminate before measurement and determinate immediately afterwards. The single description of the standard measurement model is replaced by two mutually exclusive descriptions. Ability is indeterminate before measurement and only determinate with respect to a measurement context. Neither of these descriptions can be dispensed with. The indeterminate and the determinate are mutually exclusive facets of one and the same ability. Returning to the child who has been taught to add but hasnt yet encountered the question 68 + 57 = ? what can be said of his or her ability with respect to this question? When one ponders ability as a thing-in-itself, its tempting to think of it as something inner, something that resides in the child prior to being expressed when the child answers. If ability is to be found anywhere, surely its to the unmeasured mind one should look? Isnt it tempting to think of it as something the child carries in his or her mind? When the focus is on ability as a thing-in-itself, it seems the childs eventual answer to the question is somehow inferior; its the mere application of the childs ability rather than the ability itself. The concept of causality in classical physics is replaced by the notion of complementarity in quantum mechanics. Complementarity treats pre-measurement indeterminism and the determinate outcome of measurement as non-separable. Whitaker (1996, p. 184) portrays complementarity as mutual exclusion but joint completion. One cannot meaningfully separate the pre-measurement facet of ability from its measurement-determined counterpart. The analogue of Bohrs complementarity is what Wittgensteinians refer to as first-person/third-person asymmetry. The firstperson facet of ability (characterised by indeterminism) and the third-person measurement perspective cannot be meaningfully separated. Suter (1989, pp. 152-153) distinguished the firstperson/third-person symmetry of Newtonian attributes from the first-person/third-person asymmetry of psychological predicates: This asymmetry in the use of psychological and mental predicates between the first-person present-tense and second- and third-person present-tense we may take as one of the special features of the mental. Nagel (1986, p. 22) notes: the conditions of first-person and third-person ascription of an experience are inextricably bound together in a single public concept.

This non-separability of first-person and third-person perspectives obviates the need to conclude, with Rasch, that the individuals response need be haphazard. The first-person indeterminism detailed earlier seems to indicate that individuals offer responses entirely at random. After all, the totality of facts is in keeping with an infinity of answers, only one of which is correct. But one need only infer random variation located within the person (Borsboom, 2005, p. 55) if one mistakenly treats the first-person facet as separable from the third-person. (The authors earlier practice of stressing the restriction to the totality of facts about the individual was intended to highlight this taken-for-granted separability.) Lords (1980) admonition that item response theorists eschew the stochastic subject interpretation for the repeated sampling interpretation led IRT practitioners astray by purging entirely the first-person facet from an indivisible whole. One only arrives at conclusions that are absurd in practice (p. 227) if one follows Lord (1980) and divorces ability from the item which measures it. Like Rasch, Lord failed to grasp that the within-subject and the betweensubject aspects of psychological measurement are profoundly entangled. HOLLAND, LORD AND THE ENSEMBLE INTERPRETATION AS THE ROUTE OUT OF PARADOX Holland (1990) repeats Lords error by eschewing the stochastic subject interpretation for the random sampling interpretation, despite acknowledging that most users think intuitively about IRT models in terms of stochastic subjects (p. 584). The stochastic subject rationale traces the probabilities of the Rasch model to randomness in the individual subject: Even if we know a person to be very capable, we cannot be sure that he will solve a certain difficult problem, not even a much easier one. There is always a possibility that he fails he may be tired or his attention is led astray, or some other excuse may be given. And a person of slight ability may hit upon the correct solution to a difficult problem. Furthermore, if the problem is neither too easy nor too difficult for a certain person, the outcome is quite unpredictable. (Rasch, 1960, p. 73) Rasch is proposing what quantum physicists call a local hidden variables measurement model. While Wittgenstein argues that ability is indefinite before the act of measurement (an act which effects a jump from indefinite to definite), psychometricians in general and Rasch in particular, treat ability as definite before measurement. The local hidden variables of the Rasch model are variables such as examinee fatigue, degree of distraction, and any other influence militating against his or her capacity to provide a correct answer. Rasch is suggesting that if one had complete information concerning the examinees ability, his or her level of fatigue, propensity for distraction, and so on, one could predict, in principle, the examinees response with a high degree of confidence. It is the absence of variables capable of capturing fatigue, attention, and so on, from the Rasch algorithm, that makes its probabilistic nature inevitable. In this local hidden variable model, probability is being invoked because of the measurers ignorance of the effects of fatigue, attention loss, and so on. But Bell (1964) proved beyond doubt that local hidden variables models are impossible in quantum measurement. One can avoid the difficulties thrown up by Bells celebrated inequalities by treating unmeasured predicates as indefinite (Fuchs, 2011). This would have profound implications for how one conceives of latent variables in the Rasch model. If local hidden variables are ruled out, latent variables could not be assigned investigation-independent values. Ability only takes on a definite value in a measurement context. IRT can no more separate these two entities (ability and the item used to measure it) than could classical test theory. The random sampling approach that Holland (1990) recommends is a so-called ensemble interpretation. The definitive text on ensembles Home and Whitaker (1992) finds ensembles illegitimate because they mistakenly replace superpositions by mixtures (Whitaker, 2012, p. 279).

One gets the distinct impression from the IRT literature that the random sampling method is being urged on the field because of embarrassments that lurk in the stochastic subject model. For example Lord (1980, p. 228) refers to the later as unsuitable: The trouble comes from an unsuitable interpretation of the practical meaning of the item response function If we try to interpret Pi(A) as the probability that a particular examinee A will answer a particular item i correctly, we are likely to reach absurd conclusions. (Lord, 1980, p. 228) Lord (1980) and Holland (1990) both attempt to avoid embarrassment by taking the simple step of ignoring the stochastic subject for the comfort of an ensemble interpretation. Home and Whitaker (1992) close their text with the words: [W]e see the ensemble interpretation as the comfortable option, creating the illusion that all difficulties may be removed by taking one simple step (p. 311). WHAT OF THE PARADOX IDENTIFIED EARLIER? It is now possible to address the paradox presented earlier. Here is a restatement: If a large sample of individuals of exactly the same ability respond to the same item, designed to measure that ability, why would 27% get it right and 73% get it wrong? Suppose a large number of individuals answer a question (labelled Q1), and, of those who give the correct answer, 100 individuals, say, are posed a second question (labelled Q2). When these 100 individuals respond to Q2, 27% give the correct answer and 73% respond with the wrong answer. What can be said about the ability of each individual immediately after answering Q1 but before answering Q2? Given the natural tendency to think of ability as an attribute of mind, it seems reasonable to focus on the individuals ability between questions as it were. Poised between questions, each individuals ability with respect to Q1 is determinate; they have answered Q1 correctly moments before. What of their ability with respect to Q2, the question they have yet to encounter? According to the reasoning presented above, all the facts are in keeping with both a correct and an incorrect answer. The individuals ability relative to Q2 is indeterminate. Quantum mechanics portrays such states as superpositions the individuals all have the same indefinite ability characterised as: correct with probability 27% and incorrect with probability 73%. It is easy to see why 100 individuals each with an ability characterised in this way could be portrayed as subsequently producing 27 correct responses and 73 incorrect responses to Q2. In this approach the paradox dissolves. All 100 individuals have definite abilities (as measured by Q1), but only 27% go on to answer Q2 correctly. But note the crucial step in the logic required to dissolve the paradox: each individuals ability is simultaneously determinate with respect to Q1 and indeterminate with respect to Q2. A change in question (from Q1 to Q2) effects a radical change from indeterminate to determinate. It is therefore only meaningful to talk about a definite ability in relation to a measurement context. Ability is a joint property of the individual and the item; pace Rasch they cannot be construed as separable! It follows therefore that the examiner (the person who selects the item) participates in the ability manifest in a response to that item. Pace Rasch measurement in education and psychology is a more dynamic affair than measurement in classical physics. The former is dynamic while the latter is merely a matter of checking up on whats already there. Because that which is measured is inseparable from the question posed, the measurer participates in what he or she sees. Newtonian detachment is as unattainable in psychology and education as it is in quantum theory.

CONCLUSION Returning to the real life consequences of this refutation of latent variable modelling in general and Rasch modelling in particular, one cannot escape the conclusion that the OECDs claims in respect of its PISA project have scant validity given the central dependence of these claims on the clear separability of ability from the items designed to measure that ability.

REFERENCES Bell, J.S. (1964). On the Einstein-Podolsky-Rosin paradox. Physics, 1, 195-200. Bohr, N. (1929/1987). The philosophical writings of Niels Bohr: Volume 1 Atomic theory and the description of nature. Woodbridge: Ox Bow Press. Bohr, N. (1958/1987). The philosophical writings of Niels Bohr: Volume 2 Essays 1933 1957 on atomic physics and human knowledge. Woodbridge: Ox Bow Press. Borsboom, D. (2005). Measuring the mind: conceptual issues in contemporary psychometrics. Cambridge: Cambridge University Press. Borsboom, D., Mellenbergh, G.J., & van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110 (2), 203-219. Bruner, J.S. (1990). Acts of meaning. Cambridge, MA: Harvard University Press. Davies, E.B. (2003). Science in the looking glass. Oxford: Oxford University Press. Davies, E.B. (2010). Why beliefs matter. Oxford: Oxford University Press. Elliot, C.D., Murray, D., & Pearson, L.S. (1978). The British ability scales. Windsor: National Foundation for Educational Research. Ertl, H. (2006). Educational standards and the changing discourse on education: the reception and consequences of the PISA study in Germany. Oxford Review of Education, 32(5), 619-634. Fan, X. (1998). Item response theory and classical test theory: an empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58(3), 357-381. Favrholdt, D. (Ed.). (1999). Niels Bohr collected works (Volume 10). Amsterdam: Elsevier Science B.V. Fuchs, C.A. (2011). Coming of age with quantum information: Notes on a Paulian idea. Cambridge: Cambridge University Press. Hacker, P.M.S. (1993). Wittgenstein, mind and meaning Part 1 Essays. Oxford: Blackwell. Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamental of item response theory. Newbury Park, CA: Sage Publications. Hark ter, M.R.M. (1990). Beyond the inner and the outer. Dordrecht: Kluwer Academic Publishers. Holland, P.W. (1990). On the sampling theory foundations of item response theory models. Psychometrika, 55(4), 577-601. Home, D., & Whitaker, M.A.B. (1992). Ensemble interpretation of quantum mechanics. A modern perspective. Physics Reports (Review section of Physics Letters), 210 (4), 223-317. Jreskog, K.G., & Srbom, D. (1993). LISREL 8 users reference guide. Chicago: Scientific Software International. Kalckar, J. (Ed.). (1985). Niels Bohr collected works (Volume 6). Amsterdam: Elsevier Science B.V. Kripke, S.A. (1982). Wittgenstein on rules and private language. Oxford: Blackwell. Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers. Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88, 355-383. Nagel, T. (1986). The view from nowhere. New York: Oxford University Press. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Paedagogiske Institut. Rinne, R., & Ozga, J. (2013). The OECD and the global re-regulation of teachers work: Knowledge-

based regulation tools and teachers in Finland. In T. Seddon & J.S. Levin Eds.), World yearbook of education (pp. 97-116). London: Routledge. Rost, J. (2001). The growing family of Rasch models. In A. Boomsma, M.A.J. van Duijn, & T.A.B. Snijders (Eds.), Essays on item response theory (pp. 25-42). New York: Springer. Sobel, M.E. (1994). Causal inference in latent variable models. In A. von Eye & C.C. Clogg (Eds.), Latent variable analysis (pp. 3-35). Thousand Oakes: Sage. Suter, R. (1989). Interpreting Wittgenstein: A cloud of philosophy, a drop of grammar. Philadelphia: Temple University Press. Takayama, K. (2008). The politics of international league tables: PISA in Japans achievement crisis debate. Comparative Education, 44(4), 387-407. Thorndike, R.L. (1982). Educational measurement: Theory and practice. In D. Spearritt (Ed.), The improvement of measurement in education and psychology: Contributions of latent trait theory (pp. 3-13). Melbourne: Australian Council for Educational Research. Whitaker, A. (1996). Einstein, Bohr and the quantum dilemma. Cambridge: Cambridge University Press. Whitaker, A. (2012). The new quantum age. Oxford: Oxford University Press. Wittgenstein, L. (1953). Philosophical Investigations. G.E.M. Anscombe, & R. Rhees (Eds.), G.E.M. Anscombe (Tr.). Oxford: Blackwell. Wittgenstein, L. (1980a). Remarks on the philosophy of psychology Volume 1 (Edited by G.E.M. Anscombe & G.H. von Wright; translated by G.E.M. Anscombe). Oxford: Basil Blackwell. Wittgenstein, L. (1980b). Remarks on the philosophy of psychology Volume 2 (Edited by G.H. von Wright & H. Nyman; translated by C.G. Luckhardt & M.A.E. Aue). Oxford: Basil Blackwell. Wright, B.D. (1997). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-52 Wright, C. (2001). Rails to infinity. Cambridge, MA: Harvard University Press.

Das könnte Ihnen auch gefallen