Interpret 3.1 Summary Statistic

SUMMARY OF 34 MEASURED (NON-EXTREME) KID ------------------------------------------------------------------------------| TOTAL MODEL INFIT OUTFIT | | SCORE COUNT MEASURE ERROR MNSQ ZSTD
MNSQ ZSTD | |-----------------------------------------------------------------------------| | MEAN 9.9 18.0 -.19 1.01 .99 -.2 .68 -.1 | | S.D. 2.1 .0 1.97 .10 .94 1.2 1.29 .7 | | MAX. 14.0 18.0 3.73 1.11 4.12 2.5 6.07 2.2 | | MIN. 5.0 18.0 -4.32 .82 .18 -1.5 .08 -.7 | |-----------------------------------------------------------------------------| | REAL RMSE 1.18 TRUE SD 1.58 SEPARATION 1.34 KID RELIABILITY .64 | |MODEL RMSE 1.01 TRUE SD 1.69 SEPARATION 1.67 KID RELIABILITY .74 | | S.E. OF KID MEAN = .34 | ------------------------------------------------------------------------------MINIMUM EXTREME SCORE: 1 KID 2.9% SUMMARY OF 35 MEASURED (EXTREME AND NON-EXTREME) KID ------------------------------------------------------------------------------| TOTAL MODEL INFIT OUTFIT | | SCORE COUNT MEASURE ERROR MNSQ ZSTD MNSQ ZSTD | |-----------------------------------------------------------------------------| | MEAN 9.7 18.0 -.37 1.03 | | S.D. 2.4 .0 2.22 .17 | | MAX. 14.0 18.0 3.73 1.85 | | MIN. 3.0 18.0 -6.62 .82 | |-----------------------------------------------------------------------------| | REAL RMSE 1.21 TRUE SD 1.86 SEPARATION 1.55 KID RELIABILITY .70 | |MODEL RMSE 1.05 TRUE SD 1.96 SEPARATION 1.87 KID RELIABILITY .78 | | S.E. OF KID MEAN = .38 | ------------------------------------------------------------------------------KID RAW SCORE-TO-MEASURE CORRELATION = 1.00 CRONBACH ALPHA (KR-20) KID RAW SCORE "TEST" RELIABILITY = .75 SUMMARY OF 14 MEASURED (NON-EXTREME) TAP ------------------------------------------------------------------------------| TOTAL MODEL INFIT OUTFIT | | SCORE COUNT MEASURE ERROR MNSQ ZSTD MNSQ ZSTD | |-----------------------------------------------------------------------------| | MEAN 16.9 35.0 .00 .71 .96 .0 .68 -.1 | | S.D. 12.9 .0 3.48 .21 .28 .7 .58 .5 | | MAX. 32.0 35.0 4.80 1.07 1.56 1.2 2.21 1.1 | | MIN. 1.0 35.0 -4.40 .45 .59 -1.3 .11 -.6 | |-----------------------------------------------------------------------------| | REAL RMSE .77 TRUE SD 3.39 SEPARATION 4.41 TAP RELIABILITY .95 | |MODEL RMSE .74 TRUE SD 3.40 SEPARATION 4.59 TAP RELIABILITY .95 | | S.E. OF TAP MEAN = .97 | ------------------------------------------------------------------------------MAXIMUM EXTREME SCORE: 3 TAP 21.4% MINIMUM EXTREME SCORE: 1 TAP 7.1% SUMMARY OF 18 MEASURED (EXTREME AND NON-EXTREME) TAP ------------------------------------------------------------------------------| TOTAL MODEL INFIT OUTFIT | | SCORE COUNT MEASURE ERROR MNSQ ZSTD MNSQ ZSTD | |-----------------------------------------------------------------------------| | MEAN 18.9 35.0 -.76 .96 | | S.D. 14.0 .0 4.26 .51 | | MAX. 35.0 35.0 6.13 1.85 | | MIN. .0 35.0 -6.59 .45 | |-----------------------------------------------------------------------------|
| REAL RMSE 1.10 TRUE SD 4.12 SEPARATION 3.73 TAP RELIABILITY .93 | |MODEL RMSE 1.09 TRUE SD 4.12 SEPARATION 3.79 TAP RELIABILITY .93 | | S.E. OF TAP MEAN = 1.03 | ------------------------------------------------------------------------------TAP RAW SCORE-TO-MEASURE CORRELATION = -.99 476 DATA POINTS. LOG-LIKELIHOOD CHI-SQUARE: 221.61 with 429 d.f. p=1.0000 Global Root-Mean-Square Residual (excluding extreme scores): .2667 Capped Binomial Deviance = .0785 for 626.0 dichotomous observations UMEAN=.0000 USCALE=1.0000 EXTREME AND NON-EXTREME SCORES All items with estimated measures NON-EXTREME SCORES ONLY Items with non-extreme scores (omits items or persons with 0% and 100% success r ates) ITEM or PERSON COUNT count of items or persons. "ITEM" is the name assigned with ITEM= : "PERSON" is the name assigned with PERSON= MEAN MEASURE etc. average measure of items or persons. REAL/MODEL ERROR standard errors of the measures (REAL = inflated for misfit). REAL/MODEL RMSE statistical "root-mean-square" average of the standard errors TRUE S.D. (previously ADJ.SD) observed S.D. adjusted for measurement error (RMSE). This is an estimate of the measurement-error-free S.D. REAL/MODEL SEPARATION the separation coefficient: G = TRUE S.D. / RMSE Strata = (4*G + 1)/3 REAL/MODEL RELIABILITY the measure reproducibility = ("True" item measure variance / Observed variance) = Separation / (1 + Separation) S.E. MEAN standard error of the mean measure of items or persons For valid observations used in the estimation, NON-EXTREME persons or items - summarizes persons (or items) with non-extreme sc ores (omits zero and perfect scores). EXTREME AND NON-EXTREME persons or items - summarizes persons (or items) with al l estimable scores (includes zero and perfect scores). Extreme scores (zero, min imum possible and perfect, maximum possible scores) have no exact measure under Rasch model conditions. Using a Bayesian technique, however, reasonable measures are reported for each extreme score, see EXTRSC=. Totals including extreme scor es are reported, but are necessarily less inferentially secure than those totals only for non-extreme scores. RAW SCORE is the raw score (number of correct responses excluding extreme scores , TOTALSCORE=N). TOTAL SCORE is the raw score (number of correct responsesincluding extreme score s, TOTALSCORE=Y). COUNT is the number of responses made. MEASURE is the estimated measure (for persons) or calibration (for items).
ERROR is the standard error of the estimate. INFIT is an information-weighted fit statistic, which is more sensitive to unexp ected behavior affecting responses to items near the person's measure level. MNSQ is the mean-square infit statistic with expectation 1. Values substantiall y below 1 indicate dependency in your data; values substantially above 1 indicat e noise. ZSTD is the infit mean-square fit statistic t standardized to approximate a theo retical mean 0 and variance 1 distribution. ZSTD (standardized as a z-score) is used of a t-test result when either the t-test value has effectively infinite de grees of freedom (i.e., approximates a unit normal value) or the Student's t-sta tistic distribution value has been adjusted to a unit normal value. When LOCAL=Y , then EMP is shown, indicating a local {0,1} standardization. When LOCAL=L, th en LOG is shown, and the natural logarithms of the mean-squares are reported. OUTFIT is an outlier-sensitive fit statistic, more sensitive to unexpected behav ior by persons on items far from the person's measure level. MNSQ is the mean-square outfit statistic with expectation 1. Values substantial ly less than 1 indicate dependency in your data; values substantially greater th an 1 indicate the presence of unexpected outliers. ZSTD is the outfit mean-square fit statistic t standardized to approximate a the oretical mean 0 and variance 1 distribution. ZSTD (standardized as a z-score) is used of a t-test result when either the t-test value has effectively infinite d egrees of freedom (i.e., approximates a unit normal value) or the Student's t-st atistic distribution value has been adjusted to a unit normal value. When LOCAL= Y, then EMP is shown, indicating a local {0,1} standardization. When LOCAL=L, t hen LOG is shown, and the natural logarithms of the mean-squares are reported. MEAN S.D. MAX. MIN. is is is is the its its its average value of the statistic. standard deviation. maximum value. minimum value.
MODEL RMSE is computed on the basis that the data fit the model, and that all mi sfit in the data is merely a reflection of the stochastic nature of the model. This is a "best case" reliability, which reports an upper limit to the reliabili ty of measures based on this set of items for this sample. REAL RMSE is computed on the basis that misfit in the data is due to departures in the data from model specifications. This is a "worst case" reliability, whic h reports a lower limit to the reliability of measures based on this set of item s for this sample. RMSE is the square-root of the average error variance. It is the Root Mean Squar e standard Error computed over the persons or over the items. Here is how RMSE i s calculated in Winsteps: George ability measure = 2.34 logits. Standard error of the ability measure = 0 .40 logits. Mary ability measure = 3.62 logits. Standard error of the ability measure = 0.30 logits. Error = 0.40 and 0.30 logits. Square error = 0.40*0.40 = 0.16 and 0.30*0.30 = 0.09 Mean (average) square error = (0.16+0.09) / 2 = 0.25 / 2 = 0.125 RMSE = Root mean square error = sqrt (0.125) = 0.354 logits TRUE S.D. is the standard deviation of the estimates after subtracting the error
variance (attributable to their standard errors of measurement) from their obse rved variance. (TRUE S.D.) = (S.D. of MEASURE) - (RMSE) The TRUE S.D. is an estimate of the unobservable exact standard deviation, obtai ned by removing the bias caused by measurement error. SEPARATION coefficient is the ratio of the PERSON (or ITEM) TRUE S.D., the "true " standard deviation, to RMSE, the error standard deviation. It provides a ratio measure of separation in RMSE units, which is easier to interpret than the reli ability correlation. (SEPARATION coefficient) is the signal-to-noise ratio, the ratio of "true" variance to error variance. RELIABILITY is a separation reliability (separation index). The PERSON (or ITEM) reliability is equivalent to KR-20, Cronbach Alpha, and the Generalizability Co efficient. See much more at Reliability. S.E. OF MEAN is the standard error of the mean of the person (or item) measures for this sample. MEDIAN is the median measure of the sample (in Tables 27, 28). Message Meaning for Persons or Items MAXIMUM EXTREME SCORE All non-missing responses are scored correct (perfect score) or in the top categ ories. Measures are estimated. MINIMUM EXTREME SCORE All non-missing responses are scored incorrect (zero score) or in the bottom cat egories. Measures are estimated. LACKING RESPONSES All responses are missing. No measures are estimated. DELETED Persons deleted with PDFILE= or PDELETE= Items deleted with IDFILE= or IDELETE= IGNORED BEYOND CAPACITY Deleted and not reported with entry numbers higher than highest active entry num ber VALID RESPONSES Percentage of non-missing observations. Not shown if 100% CUTLO= CUTHI= CUTLO= and CUTHI= values if these are active. They reduce the number of valid re sponses. PERSON RAW SCORE-TO-MEASURE CORRELATION is the Pearson correlation between raw s cores and measures, including extreme scores. When data are complete, this corre lation is expected to be near 1.0 for persons. CRONBACH ALPHA (KR-20) KID RAW SCORE "TEST" RELIABILITY is the conventional "tes t" reliability index. It reports an approximate test reliability based on the ra w scores of this sample. It is only reported for complete data. See more at Reli ability. Cronbach Alpha is an estimate of the person-sample reliability (= perso n-score-order reproducibility). Classical Test Theory does not usually compute a n estimate of the item reliability (= item-pvalue-order reproducibility), but it could. Winsteps reports both person-sample reliability (=person-measure-order r eproducibility) and item reliability (= item-measure-order-reproducibility). ITEM RAW SCORE-TO-MEASURE CORRELATION is the Pearson correlation between raw sco
res and measures, including extreme scores. When data are complete, this correla tion is expected to be near -1.0 for items. This is because higher measure impli es lower probability of success and so lower item scores. 476 DATA POINTS is the number of observations that are used for standard estimat ion, and so are not missing and not in extreme scores. LOG-LIKELIHOOD CHI-SQUARE: 221.61 is the approximate value of the global fit sta tistic. The chi-square value is approximate = -2 * log-likelihood of the data. I t is based on the current reported estimates which may depart noticeably from th e "true" maximum likelihood estimates for these data. The degrees of freedom are approximately the number of datapoints used in the free estimation (i.e., exclu ding missing data, data in extreme scores, etc.) less the number of free paramet ers. For an unanchored analysis, free parameters = non-extreme items + non-extre me persons - 1 + (categories in estimated rating-scale structures - 2 * rating-s cale structures). To obtain the exact d.f. for your dataset, use the Winsteps "s imulate data" option. Generate 100 datasets, analyze them and obtain their chi-s quares. The average of the chi-squares will be the d.f. for your dataset. It is typical in Rasch analysis that the probability of the chi-square is 0.0, b ecause empirical data rarely fit a theoretical ideal. A log-likelihood ratio test for pair of models (e.g., rating-scale and partial-c redit), where one nests within the other, would be the difference between the ch i-square values from the two analyses, with d.f. given by the difference between the d.f. Global Root-Mean-Square Residual (excluding extreme scores) is ((X-E)) where the sum is across X, each of the observations, and E, the expectat ion of each observation according to the Rasch model. Capped Binomial Deviance for dichotomous observations is the average of -[X*LOG10(E) + (1-X)*LOG10(1-E)] for all dichotomous observati ons where X=0,1 is the observation and E is its Rasch-model expectation. E is li mited to the range 0.01 to 0.99 UMEAN=.000 USCALE=1.000 are the current settings of UMEAN= and USCALE=. Example: Rating Scale Model (RSM) and Partial Credit Model (PCM) of the same dat aset. When the models are nested (as they are with RSM and PCM), then we have: RSM LL chi-square and RSM d.f. PCM LL chi-square (which should be smaller) and PCM d.f. (which will be smaller) Then the model choice is based on: (RSM LL chi-square - PCM LL chi-square) with (RSM-PCM) d.f. (RSM-PCM) d.f. is the number of extra free categories (i.e., extra categories mo re than dichotomies) in the PCM model over the RSM model. The number of categori es is reported in the heading of most Winsteps tables, e.g., with 10 items and a 5-category rating-scale, a PCM analysis has 50 categories giving 50 - 2*10 = 30 free categories, and an RSM analysis has 5 categories giving 5 - 2 = 3 free cat egories . So the (RSM - PCM) d.f. = 30-3 = 27. If global fit statistics are the decisive evidence for choice of analytical mode l, then Winsteps is not suitable. In the statistical philosophy underlying Winst eps, the decisive evidence for choice of model is "which set of measures is more useful" (a practical decision), not "which set of measures fit the model better " (a statistical decision). The global fit statistics obtained by analyzing your data with log-linear models (e.g., in SPSS) will be more exact than those produ
ced by Winsteps.

Interpret 3.1 Summary Statistic

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Interpret 3.1 Summary Statistic

Hochgeladen von

Copyright:

Verfügbare Formate

SUMMARY OF 34 MEASURED (NON-EXTREME) KID ------------------------------------------------------------------------------| TOTAL MODEL INFIT OUTFIT | | SCORE COUNT MEASURE ERROR MNSQ ZSTD

Das könnte Ihnen auch gefallen