Sie sind auf Seite 1von 5

Assumptions and True Score. In every analysis, there are assumptions involved.

In your own words, what


are statistical assumptions? How do these assumptions influence the quality of your analysis? Contextualize
your points by comparing and contrasting the parallel-model, the tau-equivalent model, the essential tau-
equivalent model, and the congeneric model in the estimation of a true score.

Statistical Assumptions are prescribed characteristics of data based on substantive knowledge. These
assumptions serve as ground rules upon which certain results and ideas can be founded upon and proved
true. Violation of these assumptions result in the invalidation as well as misinterpretation of data. These
assumptions control and account for probabilistic representation of data variation in order to allow one to
make more accurate statistical inferences from the data. Statistical assumptions also consider that there is no
computable definite value for the true score, but allow us to compute the ranges where the underlying true
score may be seen and to what extent values might differ from each other, giving us reliable estimates of
scores. It also provides structure to interpretation,allowing replicability through presupposing conditions in
which items are interpreted, and also accounting for variance between these data across all situations.

These statistical assumptions change according to the models being used to estimate the true score.
There are four models: parallel model, tau-equivalent model, essential tau-equivalent model, and the
congeneric model. The parallel model is the most restrictive measurement model which assumes that the true
score is equal across all observations. It assumes unidimensionality in which all test items measure a single
latent construct. Likewise it assumes that all items are equivalent to one another and errors are equivalent
throughout observations. An example would be when a set of college students are instructed to take the same
test on multiple occasions. The parallel models assumes that there will not be any change or any kind of
testing effect affecting their true score across these periods of administration.

The tau-equivalent model is similar to the parallel model but differs in the way that error is accounted
for. This model recognizes that individual item error variances differ from each person. This implies that
individual items measure the same latent variable on the same scale with the same degree of precision, but
with possibly different amounts of error (Raykov, 1997a, 1997b). Given this, the variance unique to a specific
item is taken as the result of error. This model assumes equality in item true scores but differences in each
persons unique error.

The essential tau-equivalent model is fundamentally similar to the tau-equivalent model in terms of
assuming that each item measures a single latent construct on the same scale but the essential tau-
equivalent model takes into account the different degrees of precision that may show different error variances.
This model highlights that even if the true scores are measured through similar variances, the means on
which these variances are derived from may be different. The precision accounted for in this model allows one
to see whether the measured values for different items are widely spread out, or closely grouped together.

In comparison to the previous model, the congeneric model serves to be the least restrictive and the
model most used in estimating reliability. It assumes that the items measure a particular latent construct but
with the possibility of different amounts of error unlike the parallel model mentioned earlier. It differentiates
itself with the tau-equivalent model because it accounts for the differences in precision. Opening itself to the
assumption that items may also be measured in different scales is its unique characteristic distinguishing itself
from the essentially-tau equivalent model in accounting for ones true score.
Internal Consistency and Classical Theory. What is internal consistency and why is it important to estimate
it in a quantitative research? How is it related to the concepts of reliability and validity? How is the estimation
of internal consistency conceived in classical test theory? What are the strengths and the loopholes of
estimating it this way?

Internal consistency serves as one of the measures of reliability concerned with determining the
degree to which a set of test items correlate with each other to measure a proposed construct, making it
essential in test development. Determining the internal consistency allows one to weed out cloned items, as
well as items that cannot represent a construct well. It is essential in test development as internal consistency
measures how well test items correlate with each other, and therefore how consistent items are at
representing a construct. It now leads to question as to what construct is being represented-- that is, with the
validity of the items. High internal consistency does not necessarily signify the consistent measurement of a
particular latent variable and thus, it does not indicate validity.

In classical test theory, a single test is assumed to represent a single construct. Items within that test
are then assumed to represent that latent construct equally, with each item increasing the probability of
observing the construct. That said, the more items available in the test, the better its internal consistency. It
also assumes that with test items being equal, they would elicit similar responses for the construct being
measured. To see how similar these responses are to each other, internal consistency can be quantified by
looking at the variance shared among the items in proportion to the total variance (Streiner, 2003).

By construing internal
consistency in this way, it
simplifies the distinction between
the true score and error from the
observed score. It also becomes
easier to calculate internal
consistency in most testing conditions, as it is reliant upon the variance of
scores.

With the assumptions of CTT, longer tests would have higher internal consistency, but this makes for
expensive testing. High level of internal consistency might also point to the presence of clone items. There
may also be issues about the construct itself -- is it truly one construct being elicited, or are there other
constructs being tapped? And as it assumes that responses would have similar amounts of error or variation,
it would also discount personal differences of test takers. This means that there is a heavy reliance on having
a normative sample, which leads to poorer measures of internal consistency in samples that do not match the
norm.
Use and Misuse of Alpha. Enumerate at least five ways of how we conventionally misuse or abuse the alpha
coefficient. Explain the unfavorable impact of each of these misuses and abuses. Given these misuses and
abuses, how should the alpha coefficient be used correctly? You may develop a table of guidelines or a set of
rules of thumb but briefly explain each point why they should be observed.

Alpha equated to homogeneity. While the coefficient alpha is a function of the interrelatedness of the items
in a test, it is often mistakenly used as an indicator of the unidimensionality of items or homogeneity of
interitem correlations. A large spread of data that contains similar constructs (such as confidence and self-
esteem) may give a significant alpha value when tested. A high value for alpha may indicate the test
adequately capturing the construct, but may also be due to the fact that the item responses are a function of
more than one construct. There may be more than a single latent construct present being triggered which is
why alpha is not a sufficient indicator of unidimensionality.
Take home: A high alpha coefficient does not indicate a unidimensionality. The alpha coefficient just
assumes to measure a single latent variable until tested for the presence of other constructs (Validity).
Run confirmatory factor analysis prior reliability analysis to remove items that do not load on Factor 1.
Focusing on a high alpha coefficient. Since the presence of one or more similar constructs may account for
a high alpha coefficient, in the face of multidimensionality, Cronbach stated that the alpha is an underestimate
of the measures reliability. In fact, a high alpha coefficient may be caused by an increased test length
according to the Spearman-Brown formula (Lord and Novick, 1968). In addition, a high alpha may also
indicate a high level of item redundancy. Test developers may indeed be unaware that one question may be
asked in somewhat different ways and overlooking this instance may indeed yield a significant increase in the
alpha coefficient.
Take home: One must not solely focus on the value of the alpha coefficient. A high value may be
indicative of high internal consistency but on the other hand, it can also be due to clones or other
factors.
Alpha is All We Need (Schmitt, 1995). A high alpha coefficient does not provide enough information to see if
multiple measures would lend itself to any degree of discriminant validity. Therefore, the correlation between
two variables might indicate an extremely high correlation that the differentiation between the measures may
not be of any use and so the test results might be rendered meaningless.
Take home: Alpha coefficients, intercorrelations, and corrected intercorrelations for attenuation must
be presented (Schmitt, 1995) as well as the confidence intervals for alpha (Iacobucci and
Duhacheck, 2003). When interpreting the relationships of multiple measures, the coefficient alpha
must not be the only characteristic reported. Intercorrelations guide in distinguishing between good
and bad levels of alpha. If two measures correlate highly, they may simply be asking the same
questions. Confidence intervals are also important as they ensure the adequacy of alpha.
Alpha is a Property of the Test (Streiner, 2003). When alpha is used as a sole determinant for how good a
scale X is, with the underlying notion that reliability is an inherent property of that scale X. But as evidenced by
differing degrees of reliability after having numerous and varied populations
answer scale X, alpha is more a property of the test score than of the actual
scale. This can be seen in the equation below, where we see the ratio of the true
and total score variances. And variance, as we know, differs by the test scores
obtained, and that which in turn vary according to the population taking the test.
Take home: A significant alpha value should not be the sole basis of the quality of the test.
Having a threshold for alpha in tests. Just as in the previous number, we cannot rely on alpha levels to
distinguish between good tests and bad tests as alpha varies according to the scores of the population that
took the test. This misuse is hinged upon the idea that alpha is a property of a test (Streiner, 2003).
Take home: There should not be a threshold for alpha in tests because it is the property of the
sample. Rather, a valid indicator for a good test is defined by a normal sample distribution of scores.
Confidence Intervals. Discuss the advantages and disadvantages of reporting a confidence interval for the
alpha coefficient? Which method of estimation for this confidence interval are you most likely to recommend
and why? What are the advantages and disadvantages of using the method you chose? In what way is this
method superior (and inferior) when compared to other methods of estimation?

Advantages of Using Confidence Intervals. By providing confidence interval limits, we avoid under or
overinterpreting the alpha coefficient which is only a point estimate - a poor indicator of the true score.
Providing a confidence interval identifies a range of possible values that contain the true score with a better
degree of certainty. Hence, it provides an estimate of both the location and precision of a measure. Since
statistical measurements will always contain error that prohibits us from knowing the absolute true score, a
confidence interval allows for a better measurement to provide a range to account for such. Since there are
different confidence levels on which we could set to obtain a wider range of the values we are interested in,
there is also a tradeoff that occurs. While we could lower our confidence level in order to get a more precise
(narrower) confidence interval, it lessens the certainty to which the confidence interval actually captures the
true score. If lowering the confidence level on the expense of less confidence is not ideal, then increasing the
sample size instead increases the precision of your confidence interval without compromising certainty.
Disadvantages of Using Confidence Intervals. The use of confidence intervals as a gauge for the location
and precision of a true score or any parameter is put under contention by Bayesian statisticians. With their
camps line of thinking, the use of confidence intervals lends itself to confusion, as the conditions for its
calculation are too simplistic in the sense that it only accounts for sample size, and not conditional
probabilities which may affect the emergence of a true value within a given range. Morey, Hoekstra, Rouden,
Lee, and Wagenmaker (2016) list multiple fallacies in using and interpreting confidence intervals. For
instance, a confidence interval of 95% would mean having a 95% probability of finding the true score within a
given range. Another source of confusion would be having a narrow range for the confidence interval would
mean having a more precise estimate of the location of the true score. These researchers have stipulated
guidelines for use of confidence intervals, such as using confidence intervals that are backed by theory and
rationale.
Attia (2005) also cites some disadvantages to using confidence intervals. Attia notes that confidence
intervals are concerned with probability only, and does not lend itself to the interpretatioin of clinical
significance.
Endorsing a Method for Confidence Intervals. We compared different methods for calculating confidence
intervals, from a frequentist perspective. With Cortinas formula, it does not account for sample size,
something which stabilizes intercorrelations the larger it gets (Iacobucci and Duhacheck, 2003), and is
essential in calculating confidence intervals. Miller (1995) also notes the indirect relationship between sample
size and variation present in the data, further emphasizing the importance of sample size.
In comparison to the formula of Cortina used to determine confidence intervals, Iacobucci and
Duhachecks formula does not ignore sample sizes and includes it in its calculations, which we believe makes
it more superior than Cortinas method. Accounting for sample size allows us to incorporate this information in
order to increase the level of confidence we have in our sample estimates. This could be due to a higher
sample size causing the intervals of each score to decrease indicating a more precise estimate of the
population value. By taking sample size into consideration, we are able to increase variability of the population
as well as the certainty of our estimations. Thir syntax also allows us to use a small sample size in
determining standard errors for the alpha (Maydeu-Olivares, Coffman, Garca-Forero, and Gallardo-Pujol,
2011). According to these same authors, using the syntax by Duhachek and Iacobucci might still leave room to
doubt the accuracy of the reported standard errors because of variations in testing conditions. Perhaps, this is
why Morey et al. recommend using a Bayesian approach to calculating confidence intervals, as this would
account for differences in testing conditions.
References

Attia. A. (2005). Why should researchers report the confidence interval in modern research? . Middle East
Fertility Society Journal,10(1), 78-81. Retrieved March 24, 2017.

Iacobucci, D., & Duhachek, A. (2003). Advancing Alpha: Measuring Reliability With Confidence. Journal of
Consumer Psychology,13(4), 478-487. doi:10.1207/s15327663jcp1304_14

Maydeu-Olivares, A., Coffman, D. L., Garca-Forero, C., & Gallardo-Pujol, D. (2010). Hypothesis testing for
coefficient alpha: An SEM approach. Behavior Research Methods,42(2), 618-625. doi:10.3758/brm.42.2.618

Miller, M. B. (1995). Coefficient alpha: A basic introduction from the perspectives of classical test theory and
structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal,2(3), 255-273.
doi:10.1080/10705519509540013

Morey, R., Wagenmakers, E., Hoekstra, R., Rouder, J., & Lee, M. D. (2016). The Fallacy of Placing
Confidence in Confidence Intervals. Psychonomic Bulletin and Review,23, 103-123. doi:10.1037/e528942014-
099

Raykov, T. (1997a). Estimation of composite reliability for congeneric measures. Applied Psychological
Measurement, 21, 173-184.

Raykov, T. (1997b). Scale reliability, Cronbachs coefficient alpha, and violations of essential tauequivalence
with fixed congeneric components. Multivariate Behavioral Research, 32, 329-353.

Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological Assessment,8(4), 350-353.
doi:10.1037/1040-3590.8.4.350

Sijtsma, K. (2008). On the Use, the Misuse, and the Very Limited Usefulness of Cronbachs Alpha.
Psychometrika,74(1), 107-120. doi:10.1007/s11336-008-9101-0

Streiner, D. L. (2003). Starting at the Beginning: An Introduction to Coefficient Alpha and Internal Consistency.
Journal of Personality Assessment,80(1), 99-103. doi:10.1207/s15327752jpa8001_18

Streiner, D. L. (2003). Being Inconsistent About Consistency: When Coefficient Alpha Does and Doesn't
Matter. Journal of Personality Assessment,80(3), 217-222. doi:10.1207/s15327752jpa8003_01

Das könnte Ihnen auch gefallen