Beruflich Dokumente
Kultur Dokumente
Chinmoy Jana
OR
where di is the difference in the ranks of ith individual or unit and n is the
number of individuals or units.
In real life the knowledge types of situations are very rare and have to contended 0 is the intercept of the systematic component of the regression relationship.
with the next best, i.e., statistical relationship. 1 is the slope of the systematic component.
where b0 estimates the intercept of the population regression line, 0 ; The least squares regression line is that which minimizes the SSE
b1 estimates the slope of the population regression line, 1; with respect to the estimates b 0 and b 1 .
and e stands for the observed errors - the residuals from fitting the estimated
regression line b0 + b1X to a set of n points. The normal equations: SSE b0
n n
The estimated regression line: y i
= nb0 + b1 x i At this point
i=1 i=1 SSE is
Least squares b0 minimized
Y$ = b0 +b1X n n n with respect
where Y$ (Y-hat) is the value of Y lying on the fitted regression line for a given x y
i=1
i i
=b0 x i + b1 x
i=1 i=1
2
i
to b0 and b1
Least squares b1 b1
value of X.
Consolidated By: Debanjan Datta Page 5 of 32
Sums of Squares, Cross Products, and Least
Squares Estimators
Sums of Squares and Cross Products:
Errors in Regression
2
SS x = ( x x ) = x 2 2
( x)
Y
n 2
SS y = ( y y ) 2 = y 2
( y) the observed data point
}
1
Y$
Unexplained Deviation
{ Total Deviation
2 2
( y y ) = ( y y$) + ( y$ y )
SST = SSE + SSR
2 where 0 is the Y-intercept of the
regression surface and each i , i = 1,2,...,k
0
Y
Explained Deviation
{ is the slope of the regression surface -
sometimes called the response surface -
x1
with respect to Xi. y = 0 + 1 x1 + 2 x 2 +
SSR SSE Percentage of
2
r = = 1 total variation Model assumptions:
SST SST explained by
X 1. ~N(0,2), independent of other errors.
X the regression. 2. The variables Xi are uncorrelated with the error term.
B
A
Slope: 1 C
A
x1 x1
Intercept: 0 y$ = b0 + b1x
x2 X x2 y$ = b 0 + b1 x1 + b 2 x 2
x
Any two points (A and B), or Any three points (A, B, and C), or an In a simple regression model, In a multiple regression model,
an intercept and slope (0 and intercept and coefficients of x1 and the least-squares estimators the least-squares estimators
1), define a line on a two- x2 (0 , 1 , and 2), define a plane in minimize the sum of squared minimize the sum of squared
errors from the estimated errors from the estimated
dimensional surface. a three-dimensional surface.
regression line. regression plane.
Least-Squares Estimation:
The Estimated Regression Relationship The 2-Variable Normal Equations
The F Test of a Multiple Regression Model Decomposition of the Sum of Squares and the
Adjusted Coefficient of Determination
A statistical test for the existence of a linear relationship between Y and any or SST
all of the independent variables X1, X2, ..., Xk:
H0: 1 = 2 = ...= k= 0 SSR SSE
H1: Not all the i (i=1,2,...,k) are equal to 0
2
The multiple coefficient of determination, R , measures the proportion of
Source of Sum of Degrees of
the variation in the dependent variable that is explained by the combination
Variation Squares Freedom Mean Square F Ratio
of the independent variables in the multiple regression model:
Regression SSR k SSR MSR SSR SSE
MSR = F = R2 = =1-
k MSE SST SST
Error SSE n - (k+1) SSE The adjusted multiple coefficient of determination, R 2 , is the coefficient of
MSE = determination with the SSE and SST divided by their respective degrees of freedom:
( n ( k + 1)) SSE
Total SST n-1 SST 2 (n - (k + 1)) MSE
MST = R =1- =
( n 1) SST MST
(n - 1)
2
Where n is the sample size or number of observation and When an independent variable is added, i.e. the value of k is increased, the value of R
k is number of independent variables. increases. But when the addition of another variable does not contribute towards
2
explaining the variability in the dependent variable, the value of R decreases.
Consolidated By: Debanjan Datta Page 8 of 32
Multicollinearity Effects of Multicollinearity
x2
Variances of regression coefficients are inflated.
ANOVA
Indicates whether the model is significant. If the model is not
significant, it implies that no relationship exists between the
set of variables.
Coefficients
The table provides the regression coefficients and their
significance.
Charts
To test the validity of the assumption that the residuals are
normally distributed.
Discriminant Analysis
The Discriminant Function
In a discriminant analysis, observations are classified into two or more groups, The intersection of the normal marginal
depending on the value of a multivariate discriminant function. distributions of two groups gives the cutting Group 1 Group 2
function gives the direction X1 The model may be evaluated in terms of the C
that maximizes the separation Cutting Score
between the groups. percentages of observations assigned
correctly and incorrectly.
Consolidated By: Debanjan Datta Page 11 of 32
Discriminant Analysis Objectives of Discriminant Analysis
The objectives of discriminant analysis are the following:
Discriminant analysis is used to predict group
membership. To find a linear combination of variables that discriminate between
This technique is used to classify individuals/objects into categories of dependent variable in the best possible manner.
one of the alternative groups on the basis of a set of
predictor variables. To find out which independent variables are relatively better in
discriminating between groups.
The dependent variable in discriminant analysis is
categorical whereas the independent or predictor To determine the statistical significance of the discriminant
variables are either interval or ratio scale in nature. function and whether any statistical difference exists among groups
in terms of predictor variables.
When there are two groups (categories) of dependent
variable, we have two-group discriminant analysis and To develop the procedure for assigning new objects, firms or
when there are more than two groups, it is a case of individuals whose profile but not the group identity are known to
multiple discriminant analysis. one of the two groups.
To evaluate the accuracy of classification, i.e., the percentage of
customers that it is able to classify correctly.
Assumptions
Discriminant analysis model Dependent variable should be non-metric and the independent
variable should be metric or dummy
The method of estimating bi is based on the principle Variances are normal, linear and homogeneous
that the ratio of between group sum of squares to The assumption of linearity applies to the relationships between pairs
within group sum of squares be maximized. This will of independent variable Multicollinearity in DA is identified by
make the groups differ as much as possible on the examining tolerance values. Multicollinearity can be resolved by
values of the discriminant function. removing or combining the variables with the help of PCA
After having estimated the model, the bi coefficients Homogeneity of variance is important in the classification stage of DA.
If one of the groups defined by the dependent variable has greater
(also called discriminant coefficient) are used to variance than the others, more cases will tend to be classified in that
calculate Y, the discriminant score by substituting the group. Homogeneity is tested with Bo s M test with null hypothesis
values of Xi in the estimated discriminant model. that the group variance-covariance matrices are equal. If it fails to
reject, and concluded that the variances are equal, one may use a
The discriminant function with a constant term is pooled variance-covariance matrix in classification.
called un-standardized whereas without the constant
term is known as standardized discriminant function
Consolidated By: Debanjan Datta Page 13 of 32
Key Terms Key Terms Co ti ue
Eigenvalue: The basic principle in the estimation of a discriminant Wilks Lambda It can be used to test which independents contribute
function is that the variance between the groups relative to the significantly to the discriminant function. It is given by ratio of within group
variance within the group should be maximized. The ratio of between sum of squares to total sum of squares. The Wilks lambda takes a value
group variance to within group variance is called Eigenvalue. More the between 0 and 1 and lower the value of Wilks lambda, the higher is the
eigenvalue, more appropriate is the differentiation, hence the model. significance of the discriminant function. A statistically significant function
For two group DA, there is one discriminant function and one will enhance the reliability that the differentiation between the groups
eigenvalue which accounts for all of the explained variance. exists. A significant lambda means, reject the null hypothesis that the two
groups have the same mean discriminant function scores and conclude the
Relative Percentage: Functions eigenvalue divided by the sum of all model as discriminating Y
eigenvalues of all discriminant functions in the model. Percent of Discriminant score: The value resulting from applying a discriminant
discriminating power for the model associated with a given function formula to the data for a given case. The Z score is the
discriminant function. Relative 5 is used to to tell how many discriminant score for standardised data.
functions are important. Centroid: Mean value for discriminant scores for a particular group.
Canonical Correlation (R*): Canonical correlation is the simple Number of centroids equals the number of groups. Mean for a group on all
correlation coefficient between the discriminant score and the group the functions are the group centroids.
membership. (0, 1 or 1,2 etc.) Cut-off Score for Classification: If discriminant Score of the function is less than or
equal to the cutoff, the case is classified as 0, above the cutoff, it is classified as 1 .
The cut-off score is the average of two group centroid when the size of the sample in
the two groups are same, for unequal groups, it is the weighted mean.
Factor Analysis
Dr. Chinmoy Jana
Reduce data with maximum variance explained
IISWBM
Management House, Kolkata The basic principle behind the application of factor analysis is that the initial
set of variables should be highly correlated. If the correlation coefficients
between all the variables are small, factor analysis may not be an
appropriate technique.
Principal Component analysis (PCA) & Common Factor Analysis (FA) Key Terms
Principal component analysis involves extracting linear composites of Factor Scores It is the composite scores estimated for each
observed variables. respondent on the extracted factors. It is called component scores in
Factor analysis is based on a formal model predicting observed variables PCA
from theoretical latent factors. Factor Loading The correlation coefficient between the variables
Run principal component analysis If you want to simply reduce your included in the study and the factor score is called factor loading. It is
correlated observed variables to a smaller set of important independent called component loading in PCA
composite variables. Run factor analysis if you assume or wish to test a
theoretical model of latent factors causing observed variables. Factor Matrix (Component Matrix) It contains the factor loadings of
The bottom line is that these are two different models, conceptually. In PCA,
all the variables on all the extracted factors.
the components are actual orthogonal linear combinations that maximize the Eigenvalue The percentage of variance explained by each factor can
total variance. In FA, the factors are linear combinations that maximize the be computed using eigenvalue. The eigenvalue of any factor is
shared portion of the variance--underlying "latent constructs". That's why FA obtained by taking the sum of squares of the factor loadings of each
is often called "common factor analysis". FA uses a variety of optimization
routines and the result, unlike PCA, depends on the optimization routine
component.
used and starting points for those routines. Simply there is not a single Communality Amount of variance. It indicates how much of each
unique solution. variable is accounted for by the underlying factors taken together. In
FA models are to be preferred since they explicitly account for measurement errors, other words, it is a measure of the percentage of va iables variation
while PCA doesn't care about that. Briefly stated, using PCA you are expressing that is explained by the factors. A relatively high communality
each component (factor) as a linear combination of the variables, whereas in FA
these are the
Consolidated By:variables
Debanjan that are expressed as linear combinations of the factors
Datta indicates that a variable has much in common with the other variables
Page 17 of 32
(including communalities and uniqueness components). taken as a group.
Key Terms Continue
Key Terms Continue
Factor plot or Rotated Factor Space: The factors are on Kaiser Meyer Olkin (KMO) measure of Sampling
different axis and the variables are drawn on these axes. Adequacy: An index used to test appropriateness of the
This plot can be interpreted only if the number of factors factor analysis. The KMO statistics compares the
are three or less. magnitude of observed correlation coefficients with the
Goodness of a factor: How well can a factor account for magnitudes of partial correlation coefficients. A small
the correlations among the indicators? Examine the value of KMO shows that correlation between variables
correlations among the indicators after the effect of the cannot be explained by other variables. High values (>
factor is removed. For a good factor solution, the resulting
partial correlations should be near zero, because once the 0.5) indicate that factor analysis is an appropriate
effect of the common factor is removed, there is nothing measure.
to link the indicators.
Scree plot: Plot of eigen values against the factors in the Trace: The sum of squares of the values on the diagonal
order of their extraction. of the correlation matrix used in the factor analysis. It
Barletts Test of specificity: Test the null hypothesis that represents the total amount of variance on which the
there is no correlation between the variables factor solution is based.
Extraction Method: Principal Component Analysis Total Variance Explained: This table provides the total
Rotation Method: Varimax with Kaiser Normalization variance contributed by each component with its
percentage and Cumulative percentage.
Correlation Matrix: PCA can be carried out if the Scree Plots: Number of components against the
correlation matrix for the variables contains at least two eigenvalues and helps to determine the optimal number
correlations of 0.30 or greater of components. Components having steep slope indicate
KMO a d Ba letts Test: Mi i u e ui ed KMO is 0.5 that good percentage of total variance is explained by
and Chi-square statistics should be significant. that component, hence the component is justified. The
shallow slope indicates that the contribution of total
Communalities: Estimates of variance in each variable
variance is less and the component is not justified.
accounted for by the components. High Communalities
indicate that variables are well represented by the Component (Factor) Matrix: Table provides each variable
extracted components. If any communalities are very low component loadings, but not easily interpreted. So refer
in a principal components extraction, you may need to to Rotated Component (Factor) Matrix.
extract another component.
Consolidated By: Debanjan Datta Page 20 of 32
Output Analysis Continue
CA does not assume any dependent variable. It uses ANOVA table: The univariate or one way ANOVA statistics for
different methods of classification to classify the data each clustering variable. The higher is the ANOVA value , the
into some groups without any prior information. The higher is the difference between the clusters on that variable.
cases with similar data would be in the same group and
Cluster variate: The variables or parameters representing the
the cases with distinct data would be classified in objects to be clustered and used to calculate the similarity
different groups between objects.
Cluster membership: This indicates the address or the cluster to Final cluster centres: The mean value of the cluster on
which a particular person/object belongs. each of the variables that is a part of the cluster variate.
Dendrogram: This is a tree like diagram that is used to graphically Hierarchical methods: A step-wise process that starts
present the cluster results. The vertical axis represents the objects
and the horizontal represents the inter-respondent distance. The with the most similar pair and formulates a tree-like
figure is to be read from left to right. structure composed of separate clusters.
Distances between final cluster centres: These are the distances Non-hierarchical methods: Cluster seeds or centres are
between the individual pairs of clusters. A robust solution that is the starting points and one builds individual clusters
able to demarcate the groups distinctly is the one where the inter around it based on some pre-specified distance of the
cluster distance is large; the larger the distance the more distinct
are the clusters. seeds.
Vertical icicle diagram: Quite similar to the dendogram, Stage 4 NUMBER OF CLUSTERS
Hierarchical methods
Examine dendrogram
of the clusters. The objects are individually displayed at Stage 5 INTERPRETING THE CLUSTERS
Examine cluster variables.
the top. At any given stage the columns correspond to Name clusters
the objects being clustered, and the rows correspond to Stage 6 VALIDATING AND PROFILING THE
CLUSTERS
Validation
Page 24 of 32
bottom to top.
Output Analysis Output Analysis
Icicle table: provides summary of cluster formation. It is read from
Case Processing Summary: Provides case processing bottom to top. The topmost is the single cluster solution and
summary and its percentage. Ignore cases which have bottommost is all cases separate. Cases in the table are in the
missing values. column. The first column indicates the number of clusters for that
stage. Each case is separated by an empty column. A cross in the
Single Linkage Agglomeration Schedule: Details of the empty column means the two cases are combined. A gap means
clusters formed in each stage. The column coefficients two cases are in separate clusters.
indicate the distance coefficient. Sudden increase in the Dendrogram: Most used tool to understand the number of clusters
coefficient indicates that the combining at that stage is and cluster memberships. The cases are in the first column and
more appropriate. This is one of the indicators for they are connected by lines for each stage of clustering. The
deciding the number of clusters. leftmost is all cluster solution and the rightmost is one cluster
solution. The graph has also the distance line from 0 to 25, More is
Difference in the coefficients between the current the width of the horizontal line for the cluster, more appropriate is
solution and the previous solution. the cluster.
If solution is not decisive, i.e., differences are very close, one can
try a different method, like furthest neighbourhood.
W
So that
(i, i = 1, 2, . . . m) i
1
i 1
xjj = 1 if the j th level of the i th attribute is present
= 0 otherwise
ki = number of levels of attribute i The simplest estimation procedure, and one which is gaining in popularity,
m = number of attributes is dummy variable regression. If an attribute has ki
levels, it is coded in terms of ki - 1 dummy variables.
The goodness of fit of the estimated model should be evaluated. Conjoint analysis assumes that the important attributes
For example, if dummy variable regression is used, the value of of a product can be identified.
R2 will indicate the extent to which the model fits the data.
It assumes that consumers evaluate the choice
Test-retest reliability can be assessed by obtaining a few
replicated judgments later in data collection. alternatives in terms of these attributes and make
tradeoffs.
The evaluations for the holdout or validation stimuli can be
predicted by the estimated part-worth functions. The predicted The tradeoff model may not be a good representation of
evaluations can then be correlated with those obtained from the choice process.
the respondents to determine internal validity.
Another limitation is that data collection may be
If an aggregate-level analysis has been conducted, the complex, particularly if a large number of attributes are
estimation sample can be split in several ways and conjoint
analysis conducted on each subsample. The results can be involved and the model must be estimated at the
compared across subsamples to assess the stability of conjoint individual level.
analysis solutions.
Consolidated By: Debanjan Datta The part-worth functions are not unique. Page 32 of 32