Beruflich Dokumente
Kultur Dokumente
Psychometrics
It 's the field of study of the
LEZIONE 1
Statistics
Tests produce measures and these are expressed in
numbers.
The Statistics offer a way to interpret and make sense
of numbers.
Statistics deals
with the ways (described by mathematical formulas) in which a phenomenal reality can be synthesized and then understood.
Statistics
Statistics can be
Descriptive Statistics
Aim: synthesize the data through graphical tools and indexes Which describe the salient features of the observed data.
bar charts, pie charts, boxplots, histograms
Inferential Statistics
Aim
To make statements about the nature of a phenomenon observed with a controlled probability of error.
The knowledge of this nature will allow then to make predictions. It includes hypotesis testing.
Inferential Statistics
A hypothesis is a statement about the statistical probability distribution of one or more of random variables.
The statistical test verifies, in terms of probabilities, the validity of a statistical hypothesis, called the null hypothesis
Ho
Inferential Statistics
Experimental hypotesis
Null hypothesis
Ho
Procedure
Null hypothesis Ho
Experimental hypotesis
Sample selection
Significativity p
Statistical test
We should establish a priori the probability of error which we consider acceptable for the hypothesis verification
The mythical
The 'p is the appropriate quantification of the hazard intervention in the evaluation of our statistical tests. The value of p <0.05 is an arbitrary limit and it indicates the acceptance of an error = 5%.
There are no certainties, only a reasonable probability
E. W. Hower
Decision Errors
Rejecting the H0 when true Type I error Accepting H0 when false Type II error
Students t). We compare its value with the critical value tabulated in the appropriate tables. If the test result exceeds the critical value of significance, then the null hypothesis is rejected and the test results are reported as "statistically significant". Otherwise, the null hypothesis is accepted.
Students t
Values of the distribution t as a function of n (number of subjects) and alpha (a). For n = infinite the distribution is normal
n\
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 100
0.10
3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.303 1.299 1.296 1.290 1.282
0.05
6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.684 1.676 1.671 1.660 1.645
0.025
12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 2.021 2.009 2.000 1.984 1.960
0.01
31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.423 2.403 2.390 2.364 2.326
0.005
63.657 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.678 2.660 2.626 2.576
0.0025
0.001
0.0005
127.321 318.309 636.619 14.089 7.453 5.598 4.773 4.317 4.029 3.833 3.690 3.581 3.497 3.428 3.372 3.326 3.286 3.252 3.222 3.197 3.174 3.153 3.135 3.119 3.104 3.091 3.078 3.067 3.057 3.047 3.038 3.030 2.971 2.937 2.915 2.871 2.807 22.327 10.215 7.173 5.893 5.208 4.785 4.501 4.297 4.144 4.025 3.930 3.852 3.787 3.733 3.686 3.646 3.610 3.579 3.552 3.527 3.505 3.485 3.467 3.450 3.435 3.421 3.408 3.396 3.385 3.307 3.261 3.232 3.174 3.090 31.599 12.924 8.610 6.869 5.959 5.408 5.041 4.781 4.587 4.437 4.318 4.221 4.140 4.073 4.015 3.965 3.922 3.883 3.850 3.819 3.792 3.768 3.745 3.725 3.707 3.690 3.674 3.659 3.646 3.551 3.496 3.460 3.390 3.291
Lesson 1
Assumptions for parametric statistical
analysis
Parametric tests
Can be applied in the presence of a normal distribution of data and errors. Data and errors must be independent. If the data are divided into groups, their variances should be homogeneous.
Data transformation
dividing by their SE 4) With statistical tests (Kolmogorov-Smirnov, Shapiro Wilk) (significativity = non normality).
10
Frequency
2 Mean = 8,82 Std. Dev. = 3,94187 N = 50 0,00 5,00 10,00 15,00 20,00
HADS_Depressione
Statistics SAL FISICA N Mean Median Mode Minimum Maximum Valid Missing 76 0 7,26 7,00 7a 1 10
Skewness - Asimmetria
Degree of departure from symmetry in the distribution of scores. The value of skewness in the normal distribution is = zero. Positive value = many low scores. Negative value = many high scores. Limit values: 1-2 (absolute value)
Kurtosis - curtosi
Degree of concentration of a distribution of scores.
The value of kurtosis in the normal distribution is = zero. Positive value = many scores around themean Negative values = many negative scores in the tails. Limit values: 1-2 (ab. v.)
< 2-3
Statisticsa SAL FISICA N Mean Std. Error of Mean Median Mode Skew ness Std. Error of Skew ness Kurtosis Std. Error of Kurtosis Valid Mis sin g 50 0 6,88 ,287 7,00 7 -,703 ,337 ,767 ,662
Normality test
Sig.
Sig.
Kolmogorov-Smirnov Statistic Residual for HADS_ansia Residual for HADS_ Depressione b. Gruppo = Carcerati ,139 ,162 df 50 50
Sig.
Kolmogorov-Smirnov Statistic Residual for HADS_ansia Residual for HADS_ Depressione b. Gruppo = Controlli ,188 ,216 df 26 26
Sig.
Data transformation
The logarithmic transformation to base 10 or natural base
It applies when the distribution has positive asymmetry, to obtain a normal distribution.
The transformation to the square root and cubic
It is useful to normalize distributions with positive asymmetry, and for homogenize the variances.
The mutual transformation (Y = 1 / X)
Histogram
Gruppo: Carcerati
Frequency
transf_HADS_A
b Tes ts of Nor mality
Sig.
*. This is a low er bound of the true signific anc e. a. Lilliefors Significance Correc tion b. Gruppo = Carcerati
Psychometrics
Structural construct validity
Factorial analysis
Validity
"The validity of a test is the degree to which it actually measure what it intends to measure " - Garret, 1937 Structural validity
It is the degree to which the test measures a theoretical dimension or construct.
FACTORIAL ANALYSIS
It has its roots in the attempts of early
twentieth century by Karl Pearson and Charles Spearman to define and measure intelligence. It aims to describe the function of many observed variables in a few underlying nonobservable (latent) variables which gives the name of "factors". It is applied to analyze the correlations between variables in order to identify a latent structure. The factors are latent variables, which are the parts common to multiple observed variables (indicators or items).
FACTOR ANALYSIS
Two types:
EXPLORATIVE (EFA) Descriptive techniques Exploratory objectives No theory, it is inferred from the data the latent structure CONFERMATIVE (CFA) Relational techniques Confirmatory objectives Theory to test confirmatory application of structural equation modeling.
The exploratory factor analysis does not proceed from default assumptions; the researcher has no hypothesis to evaluate about the relationship between the variables and factors.
FACTOR ANALYSIS
EFA
Exploratory analysis of
CFA
Confirmatory analysis of
multidimensional data symmetrical method considers the variables in the same plane, two-way relationship Metrical analysis: quantitative data Linear function that binds the variables Normally distributed data.
multidimensional data asymmetrical method shows dependencies between variables (VI vs.VD) Metrical analysis: quantitative data Linear function that binds the variables Normally distributed data.
EFA
Graphically, the factors represent the amount of covariance shared by multiple indicators X2
X1, X2 e X3: indicators
F1: factor
X3
F1
X1
On a formal level, the factors are defined by analyzing the matrix of correlations between indicators. Those indicators that show high correlations with each other and lower with other indicators generate a factor.
the same psychological variable or if they measure different aspects, identified as subgroups of measures Percentage of variance explained
TEST DIMENSIONALITY
NUMBER OF FACTORS
Angle of rotation that makes the items as possible related to a factor and as little as possible to the others
ROTATION PROCEDURE
Is EFA applicable?
The model of factor analysis is not
always applicable. Assumption: Normal distribution of variables Through some indexes we can assess whether the application of the model. Is correct or not.
Determinant
Measure of adequacy for the single variable Comunality Index of Kaiser-Meyer e Olkin (KMO) Bartletts Test of sphericity
Determinant
analysis when the correlation values are high with the other variables (p <0.05), otherwise it may also be deleted.
Sig. (1-tailed)
a. Determinant = ,396
single variable.
The values are on the main diagonal of
MSA
Anti-image Matrices
M1 A nti-image Covariance M1 M2 M3 M4 M5 M6 M7 M8 ,650 ,075 -,027 -,138 ,031 -,319 ,001 -,049 M2 ,075 ,894 -,107 -,037 ,026 ,133 -,008 -,113 ,098 M3 -,027 -,107 ,812 -,188 -,050 -,048 ,271 -,036 -,037 -,125 M4 -,138 -,037 -,188 ,823 -,128 -,001 -,090 -,107 -,189 -,043 -,229 M5 ,031 ,026 -,050 -,128 ,896 -,139 ,064 -,084 ,041 ,029 -,059 -,149 M6 -,319 ,133 -,048 -,001 -,139 ,630 -,097 -,012 -,499 ,178 -,067 -,001 -,186 M7 ,001 -,008 ,271 -,090 ,064 -,097 ,869 -,063 ,001 -,010 ,322 -,107 ,073 -,131 M8 -,049 -,113 -,036 -,107 -,084 -,012 -,063 ,929 -,063 -,124 -,042 -,123 -,092 -,016 -,071
Anti-image Correlation
M1 M2 M3 M4 M5 M6 M7 M8
,612a
,098 -,037 -,189 ,650 -,499 ,001 -,063
,637a
-,125 -,043 ,029 ,178 -,010 -,124
,543a
-,229 -,059 -,067 ,322 -,042
,654a
-,149 -,001 -,107 -,123
,679a
-,186 ,073 -,092
,596a
-,131 -,016
,459a
-,071
,671a
Communalities
Communality: before the application of factor
communalities indicate the share of variance of each variable, considering the estimated factor model.
The communalities represent the portion of the variance of
the observed variables explained by the common factor. Communality preferable value > 0.50 for all variables. Values <0.50 indicate variables that are not very appropriate for the factor solution emerged and can be removed.
Communalities
Communalities
M1 M2 M3 M4 M5 M6 M7 M8 Initial 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 Extraction ,633 ,587 ,656 ,494 ,285 ,691 ,729 ,556
matrix between variables. If the value is > 0.50 the factor model is adequate to explain the correlation between the variables. The KMO statistic indicates the proportion of variance in the variables that is common, that can be caused by latent factors.
X1
X2 X3
1
0 0
0
1 0
0
0 1
p < 0,0001
SUMMARY
Indices for assessing the applicability of the factor model
MSA > 0,50 Communality >0,50 KMO = > 0,50 Chi2, p < 0,05
PCA Does not differentiate Common and Specific Variances (Probably Inflated Results) PAF and ML The factors are created based only on the Commom Variance (Specific and Error Excluded) A more rigorous criterion
The algorithm
To reduce the complexity while preserving most of the
ML allows you to find a structure that represents the best fit to the observed data.
if data are relatively normally distributed, maximum likelihood is the best choice. If the assumption of multivariate normality is severely violated they recommend one of the principal factor methods; in SPSS this procedure is called "principal axis factors"
(Fabrigar et al., 1999).
PCA
PCA is not a factor analysis It is used to find so many components (factors) as there are variables, for which all the variance of the variables is explained by the main components. It transform a set of variables into a smaller number of uncorrelated variables. It is a method of linear transformation of data. The factors are smaller in number of variables. The first factor (component) explains the maximum variance possible,the 2nd factor explains the maximum residual variance, etc. ...
Factors Retention
Eigenvalue Criteria Only eigenvalues higher than 1
are considered; Scree Test Eigenvalue Graphic (The curve is analyzed to determine the cut-point); A priori Criteria The researcher knows how many factors he/she would like to obtain
- Confirm pre-existent models (Researchs previously done); - Confirm a theoretical perspective;
Eigenvalues
Kaiser criterion: extraction of factors with eige nvalue > 1, explaining a portion of variability at least equal to that of the initial variables.
Explained variance
With all the variables we can explain 100% of the variance
reduce the complexity. Factors, however, account for an amount of variance less than 100%.
The
factors
reflect
the
common
variance
among
the
57,890
68,774 78,634 87,609 94,909 100,000
best number of factors to extract 1 For a good EFA, the chart should look like at the intersection of two lines. The factors to be extracted are above the peak of a change of direction of the curve. Factors (components) in the flat part of the curve explain little variance and can be excluded.
2,1 1,8 1,5
Eigenvalue
2 3
1,2
0,9
0,6
0,3
Component Number
Axes rotation
In the PCA factors are extracted in a hierarchical form (the first
Consequently, the first factor shows very high coefficients for all The researcher must interpret results. indicators, making it difficult to choose which ite ms to consider portions of afactor. To work around this problem, we apply the rotation axis.
Consequently, the results of factor analysis to interpret are those
Factorial Rotation
Process of data manipulation or adjustment of the factorial axis, which aims to get a simplest and more significative factorial solution.
There is no widely preferred method of oblique rotation; all tend to produce similar results and it is fine to use the default delta (0) or kappa (4) values in the software packages.
(Fabrigar et al., 1999),
Factorial Rotation
In the social sciences we generally expect some
correlation among factors. Using orthogonal rotation results in a loss of valuable information if the factors are correlated. Oblique rotation generally render a more accurate, and perhaps more reproducible, solution. If the factors are truly uncorrelated, orthogonal and oblique rotation produce nearly identical results.
(Costello & Osbourne, 2005)
57,890
68,774 78,634 87,609 94,909 100,000
57,890
Loading
Through analysis of the
factor loading (saturation index). The higher (in absolute value), the better the indicator is described by that factor.
Loading represents the correlation between a single item and the
factor. .
The indicators related to each other show high coefficients of
Loading
a Compone nt M atrix
a Rotated Component Matrix
Component
Component
M6 M1 M4 M5 M3 M2 M7 M8
M6 M1 M2 M5 M8 M4 M7 M3
Ex traction Method: Principal Component Analys is . Rotation Method: Varimax w ith Kaiser Normaliz ation. a. Rotation converged in 5 iterations .
Reverse score
a Rotated Component Matrix
a Rotated Com pone nt M atrix
Component
Component
M6 M1 M2 M5 M8 M4 M7 M3
3
M6 M1
2 ,105 ,178
M2R M5 M8 M4 M7R M3
,810 ,754
Ex traction Method: Principal Component Analys is . Rotation Method: Varimax w ith Kaiser Normaliz ation. a. Rotation converged in 5 iterations .
Ex traction Method: Principal Component Analys is . Rotation Method: Varimax w ith Kaiser Normaliz ation. a. Rotation converged in 5 iterations .
Interpretation of factors
F1 F1 F3 F2 F1 F1
F3 F2
component and remains more an art than a science. Good advice: Search for the stability of results, including the variation of the analyzed data. Use different independent samples.