Sie sind auf Seite 1von 30

Cross Tabulation

Introduction
• Cross tabulation is a statistical tool that is used to analyze categorical
data. Categorical data is data or variables that are separated into different
categories that are mutually exclusive from one another. An example of
categorical data is eye color. Your eye color can be divided into 'categories'
(i.e., blue, brown, green), and it is impossible for eye color to belong to more
than one category (i.e., color).
• Cross tabulation helps you understand how two different variables are related
to each other. For example, suppose you wanted to see if there is a
relationship between the gender of the survey responder and if physical
education in high school is important.
Introduction
• For reference, a cross-tabulation is a two- (or more) dimensional
table that records the number (frequency) of respondents that have
the specific characteristics described in the cells of the table.
• Cross-tabulation tables provide a wealth of information about the
relationship between the variables.
• Cross-tabulation analysis goes by several names in the research
world including crosstab, contingency table, chi-square and data
tabulation.
• For example:
• How many brand-loyal users are males?
• Is familiarity with a new product related to age and income levels?
• Is product ownership related to income (high, medium and low)?
Introduction
• A frequency distribution describes one variable at a time, but a cross-
tabulation describes two or more variables simultaneously.
• Cross-tabulation results in tables that reflect the joint distribution of
two or more variables with a limited number of categories or distinct
values.
• The categories of one variable are cross-classified with the categories
of one or more other variables.
• Thus, the frequency distribution of one variable is subdivided
according to the values or categories of the other variables.
• Using the GlobalCash Project as an example, suppose that interest was
expressed in determining whether the number of European countries that a
company operates in was associated with the plans to change the number of
banks they do business with.
• The cross-tabulation is shown in Table 18.2. A cross-tabulation includes a cell
for every combination of the categories of the two variables. The number in
each cell shows how many respondents gave that combination of responses.
• In Table 18.2, 105 operated in only one European country and did not plan to
change the number of banks they do business with.
• Table 18.2 Number of countries in Europe that a company operates in and
plans to change the number of banks that a company does business with
Scope
(1) cross-tabulation analysis and results can be easily interpreted
and understood by managers who are not statistically oriented;
(2) the clarity of interpretation provides a stronger link between
research results and managerial action;
(3) a series of cross-tabulations may provide greater insights into a
complex phenomenon than a single multivariate analysis;
(4) cross-tabulation may alleviate the problem of sparse cells,
which could be serious in discrete multivariate analysis; and
(5) cross-tabulation analysis is simple to conduct and appealing to
less-sophisticated researchers.
Steps
• Open the table builder (Analyze menu, Tables, Custom Tables).
• Click Reset to delete any previous selections in the table builder.
• In the table builder, drag and drop Age category from the variable list to the
Rows area on the canvas pane.
• Drag and drop Gender from the variable list to the Columns area on the canvas
pane. (You may have to scroll down through the variable list to find this
variable.)
• Click OK to create the table.
DATA PURITY
• Data Normality
• Data Validity
• Data Reliability
DATA NORMALITY
• Normality is a critical assumption for data analysis with continuous variables. Under
this assumption, the data sets are expected to follow a normal distribution (i.e. bell
shaped). Skewness and kurtosis are widely used to evaluate the normality of data
sets.
• The normal distribution (also called the Gaussian distribution: named after Johann
Gauss, a German scientist and mathematician who justified the least squares method
in 1809) is the most widely used family of statistical distributions on which many
statistical tests are based.
• Many measurements of physical and psychological phenomena can be approximated
by the normal distribution and, hence, the widespread utility of the distribution.
• In many areas of research, a sample is identified on which measurements of
particular phenomena are made. These measurements are then statistically tested,
via hypothesis testing, to determine whether the observations are different because
of chance. Assuming the test is valid, an inference can be made about the population
from which the sample is drawn.
What Is Normal Distribution?
• A normal distribution is a bell-shaped frequency distribution curve. Most of
the data values in a normal distribution tend to cluster around the mean. The
further a data point is from the mean, the less likely it is to occur. There are
many things, such as intelligence, height, and blood pressure, that naturally
follow a normal distribution. For example, if you took the height of one
hundred 22-year-old women and created a histogram by plotting height on the
x-axis, and the frequency at which each of the heights occurred on the y-axis,
you would get a normal distribution.
Characteristics of Normal Distribution
• Normal distributions are symmetric, unimodal, and asymptotic, and
the mean, median, and mode are all equal.
• A normal distribution is perfectly symmetrical around its center.
• That is, the right side of the center is a mirror image of the left side. There is
also only one mode, or peak, in a normal distribution.
• Normal distributions are continuous and have tails that are asymptotic, which
means that they approach but never touch the x-axis.
• The center of a normal distribution is located at its peak, and 50% of the data
lies above the mean, while 50% lies below.
• It follows that the mean, median, and mode are all equal in a normal
distribution.
To Check for Normal Distribution of Data
• From the menus, choose: Graphs > Chart Builder...
• Click the Gallery tab.
• Select Histogram in the Choose from: list.
• Drag and drop the Simple Histogram icon into the canvas area of the Chart
Builder.
• Drag and drop a scale variable onto the X-Axis.
• Click the Groups/Point ID tab and select Rows panel variable or Columns
panel variable.
• Select a categorical grouping variable to define the panels.
• A separate histogram is created for each subgroup defined by the grouping
variable.
1. Validity
 In fact, validity is the ability of an instrument to measure what
is designed to measure.
 It sounds simple that a measure should measure what it is
supposed to measure but has a great deal of difficulty in real
life.

Measurement and Scaling 15


1(a) Content Validity
 The content validation includes, but is not limited to,
careful specification of constructs, review of scaling
procedures by content validity judges, and consultation
with experts and the members of the population (Vogt et
al., 2004).
 Sometimes, the content validity is also referred as face
validity.
 In fact, the content validity is a subjective evaluation of
the scale for its ability to measure what it is supposed to
Measurement and Scaling 16
1(b) Criterion Validity
 The criterion validity is the ability of the variable to
predict the key variables or criteria (Lehmann et al.,
1998).
 It involves the determination of whether the scale is
able to perform up to the expectation with respect to
the other variables or criteria.
 Criterion variables may include demographic and
psychographic characteristics, attitudinal and
behavioural measures, or scales obtained from other
scales (Malhotra, 2004).
Measurement and Scaling 17
1(c) Construct Validity
 The construct validity is the initial concept, notion, question, or hypothesis
that determines which data are to be generated and how they are to be
gathered (Golafshani, 2003).
 To achieve the construct validity, the researcher must focus on convergent
validity and discriminant validity.
 The convergent validity is established when the new measure correlates or
converges with other similar measures.
 The literal meaning of correlation or convergence specifically indicates the
degree to which the score on one measuring instrument (scale) is
correlated with other measuring instrument (scale) developed to measure
the same constructs.
Measurement and Scaling 18
Discriminant validity
 Discriminant validity is established when a new measuring
instrument has low correlation or nonconvergence with the
measures of dissimilar concept.
 The literal meaning of no correlation or non-convergence
specifically indicates the degree to which the score on one
measuring instrument (scale) is not correlated with the other
measuring instrument (scale) developed to measure the different
constructs.
 To establish the construct validity, a researcher has to establish
the convergent validity and discriminant validity.
Measurement and Scaling 19
2. Reliability
 Reliability is the tendency of a respondent to respond in the same or in a
similar manner to an identical or a near identical question (Burns & Bush,
1999).
 A measure is said to be reliable when it elicits the same response from the
same person when the measuring instrument is administered to that
person successively in similar or almost similar circumstances.
 Reliable measuring instruments provide confidence to a researcher that
the transient and situational factors are not intervening in the process, and
hence, the measuring instrument is robust.
 A researcher can adopt three ways to handle the issue of reliability: test–
retest reliability, equivalent forms reliability, and internal consistency
reliability.
Measurement and Scaling 20
2(a) Test–Retest Reliability

 To execute the test–retest reliability, the same questionnaire is


administered to the same respondents to elicit responses in two
different time slots.
 As a next step, the degree of similarity between the two sets of
responses is determined.
 To assess the degree of similarity between the two sets of
responses, correlation coefficient is computed. Higher correlation
coefficient indicates a higher reliable measuring instrument, and
lower correlation coefficient indicates an unreliable measuring
instrument.
Measurement and Scaling 21
2(b) Equivalent Forms Reliability
 In test–retest reliability, a researcher considers personal
and situation fluctuation in responses in two different
time periods, whereas in the case of considering
equivalent forms reliability, two equivalent forms are
administered to the subjects at two different times.
 To measure the desired characteristics of interest, two
equivalent forms are constructed with different sample of
items. Both the forms contain the same type of questions
and the same structure with some specific difference.
Measurement and Scaling 22
2(c) Internal Consistency Reliability
 The internal consistency reliability is used to assess the reliability of a summated
scale by which several items are summed to form a total score (Malhotra, 2004).
 The basic approach to measure the internal consistency reliability is split-half
technique.
 In this technique, the items are divided into equivalent groups. This division is done on
the basis of some predefined aspects as odd versus even number questions in the
questionnaire or split of items randomly.
 After division, responses on items are correlated. High correlation coefficient indicates
high internal consistency, and low correlation coefficient indicates low internal
consistency.
 Subjectivity in the process of splitting the items into two parts poses some common
problems for the researchers.
 A very common approach to deal with this problem is coefficient alpha or Cronbach’s
alpha. Measurement and Scaling 23
The Coefficient Alpha or
Cronbach’s Alpha
 The coefficient alpha or Cronbach’s alpha is actually
a mean reliability coefficient for all the different
ways of splitting the items included in the
measuring instruments.
 As different from correlation coefficient, coefficient
alpha varies from 0 to 1, and a coefficient value of
0.6 or less is considered to be unsatisfactory.
Measurement and Scaling 24
To Obtain a Reliability Analysis
• This feature requires the Statistics Base option.
• From the menus choose: Analyze > Scale > Reliability Analysis...
• Select two or more variables as potential components of an additive
scale.
• Choose a model from the Model drop-down list.
• You can select various statistics that describe your scale and items.
Statistics that are reported by default include the number of cases,
the number of items, and reliability estimates as follows:

Measurement and Scaling 25


• Descriptives for. Produces descriptive statistics for scales or items
across cases.
• Item. Produces descriptive statistics for items across cases.
• Scale. Produces descriptive statistics for scales.
• Scale if item deleted. Displays summary statistics comparing each
item to the scale that is composed of the other items. Statistics
include scale mean and variance if the item were to be deleted
from the scale, correlation between the item and the scale that is
composed of other items, and Cronbach's alpha if the item were to
be deleted from the scale.

Measurement and Scaling 26


• Summaries. Provides descriptive statistics of item distributions
across all items in the scale.
• Means. Summary statistics for item means. The smallest, largest, and
average item means, the range and variance of item means, and the
ratio of the largest to the smallest item means are displayed.
• Variances. Summary statistics for item variances.
Covariances. Summary statistics for inter-item covariances.
Correlations. Summary statistics for inter-item correlations.
• Inter-Item. Produces matrices of correlations or covariances
between items.

Measurement and Scaling 27


• ANOVA Table. Produces tests of equal means.
• F test. Displays a repeated measures analysis-of-variance table.
• Friedman chi-square. Displays Friedman's chi-square and Kendall's
coefficient of concordance. This option is appropriate for data that
are in the form of ranks. The chi-square test replaces the usual F test
in the ANOVA table.
• Cochran chi-square. Displays Cochran's Q. This option is appropriate
for data that are dichotomous. The Q statistic replaces the usual F
statistic in the ANOVA table.

Measurement and Scaling 28


• Hotelling's T-square. Produces a multivariate test of the null
hypothesis that all items on the scale have the same mean.
• Tukey's test of additivity. Produces a test of the assumption that
there is no multiplicative interaction among the items.
• Intraclass correlation coefficient. Produces measures of consistency
or agreement of values within cases.
• Model. Select the model for calculating the intraclass correlation
coefficient. Available models are Two-Way Mixed, Two-Way Random,
and One-Way Random. Select Two-Way Mixed when people effects
are random and the item effects are fixed, select Two-Way

Measurement and Scaling 29


• Random when people effects and the item effects are random, or
select One-Way Random when people effects are random.
• Type. Select the type of index. Available types are Consistency and
Absolute Agreement.
• Confidence interval. Specify the level for the confidence interval.
The default is 95%.
• Test value. Specify the hypothesized value of the coefficient for the
hypothesis test. This value is the value to which the observed value
is compared. The default value is 0.

Measurement and Scaling 30

Das könnte Ihnen auch gefallen