Sie sind auf Seite 1von 85

Unit 6

DATA PREPARATION, ANALYSES,


AND INTERPRETATION

• In most research studies, the researcher


begins by generating a research question,
framing it into a testable (i.e., falsifiable)
hypothesis, selecting an appropriate research
design, choosing a suitable sample of research
participants, and selecting valid and reliable
methods of measurement

4/19/2019 1
• In most types of research studies, the process
of data analysis involves the following three
steps: (1) preparing the data for analysis, (2)
analyzing the data, and (3) interpreting the
data (i.e., testing the research hypotheses and
drawing valid inferences)

4/19/2019 2
DATA PREPARATION
• Virtually all studies, from surveys to
randomized experimental trials, require some
form of data collection and entry
• Data represent the fruit of researchers’ labor
because they provide the information that
will ultimately allow them to describe
phenomena, predict events, identify and
quantify differences between conditions, and
establish the effectiveness of interventions

4/19/2019 3
Contd.
• Because of their critical nature, data should
be treated with the utmost respect and care
• In addition to ensuring the confidentiality and
security of personal data, the researcher
should carefully plan how the data will be
logged, entered, transformed (as necessary),
and organized into a database that will
facilitate accurate and efficient statistical
analysis

4/19/2019 4
Logging and Tracking Data

• Any study that involves data collection will


require some procedure to log the
information as it comes in and track it until it
is ready to be analyzed
• Research data can come from any number of
sources (e.g., personal records, participant
interviews, observations, laboratory reports,
and pretest and posttest measures)

4/19/2019 5
Contd.
• Without a well-established procedure, data can
easily become disorganized, uninterpretable,
and ultimately unusable
• Although there is no one definitive method for
logging and tracking data collection and entry, in
this age of computers it might be considered
inefficient and impractical not to take advantage
of one of the many available computer
applications to facilitate the process

4/19/2019 6
Computer software packages for data
analysis
• SPSS
• STATA
• One of the key elements of the data tracking system is
the recruitment log.
• The recruitment log is a comprehensive record of all
individuals approached about participation in a study
• The log can also serve to record the dates and times
that potential participants were approached, whether
they met eligibility criteria, and whether they agreed
and provided informed consent to participate in the
study
4/19/2019 7
Contd.
• In some study settings, where records are
maintained on all potential participants (e.g.,
treatment programs, schools, organizations), it
may be possible for the researcher to obtain
aggregate information on eligible individuals
• who were not recruited into the study, either
because they chose not to participate or
because they were not approached by the
researcher

4/19/2019 8
Record-Keeping Responsibilities
The lead researcher (referred to as principal investigator
in grant-funded research) is ultimately responsible for
maintaining the validity and quality of all research
data, including the proper training of all research staff
and developing and enforcing policies for recording,
maintaining, and storing data. The researcher should
ensure that research data are collected and recorded
according to policy; research data are stored in a way
that will ensure security and confidentiality; and
research data are audited on a regular basis to
maintain quality control and identify potential
problems as they occur

4/19/2019 9
Data Screening
• Immediately following data collection, but prior
to data entry, the researcher should carefully
screen all data for accuracy
• The promptness of these procedures is very
important because research staff may still be able
to re-contact study participants to address any
omissions, errors, or inaccuracies
• In some cases, the research staff may
inadvertently have failed to record certain
information (e.g., assessment date, study site)
or perhaps recorded a response illegibly

4/19/2019 10
Contd.
• Because data collection and data entry are
often done by different research staff, it may
be more difficult and time consuming to make
such clarifications once the information is
passed on to data entry staff
• One way to simplify the data screening
process and make it more time efficient is to
collect data using computerized assessment
instruments

4/19/2019 11
Contd.
• Computerized assessments can be programmed
to accept only responses within certain ranges,
to check for blank fields or skipped items, and
even to conduct cross-checks between certain
items to identify potential inconsistencies
between responses
• Another major benefit of these programs is that
the entered data can usually be electronically
transferred into a permanent database, thereby
automating the data entry procedure

4/19/2019 12
Contd.
• Whether it is done manually or electronically,
data screening is an essential process in ensuring
that data are accurate and complete
• Generally, the researcher should plan to screen
the data to make certain that (1) responses are
legible and understandable, (2) responses are
within an acceptable range, (3) responses are
complete, and (4) all of the necessary information
• has been included
4/19/2019 13
Constructing a Database
• Once data are screened and all corrections are
made, the data should be entered into a well-
structured database
• When planning a study, the researcher should
carefully consider the structure of the
database and how it will be used
• In many cases, it may be helpful to think
backward and to begin by anticipating how
the data will be analyzed

4/19/2019 14
Contd.
• This will help the researcher to figure out exactly
which variables need to be entered, how they
should be ordered, and how they should be
formatted
• Moreover, the statistical analysis may also dictate
what type of program you choose for your
database
• While designing the general structure of the
database, the researcher must carefully consider
all of the variables that will need to be entered
4/19/2019 15
Retaining Data Records

• Researchers should retain study data for a


minimum period of 5 years after publication
of their data in the event that questions or
concerns arise regarding the findings.
• The advancement of science relies on the
scientific community’s overall confidence in
disseminated findings, and the existence of
the primary data serves to instill such
confidence.

4/19/2019 16
The Data Codebook
• In addition to developing a well structured database,
researchers should take the time to develop a data
codebook
• A data codebook is a written or computerized list that
provides a clear and comprehensive description of
the variables that will be included in the database
• Moreover, it serves as a permanent database guide, so
that the researcher, when attempting to reanalyze
certain data, will not be stuck trying to remember
what certain variable names mean or what data were
used for a certain analysis

4/19/2019 17
Contd.
At a bare minimum, a data codebook should
contain the following elements for each variable:
Variable name
• Variable description
• Variable format (number, data, text)
• Instrument or method of collection
• Date collected
• Respondent or group
• Variable location (in database)
• Notes
4/19/2019 18
Data Entry
• After the data have been screened for
completeness and accuracy, and the researcher
has developed a well-structured database and a
detailed code book, data entry should be fairly
straightforward
• it is critical that all data-entry staff are properly
trained and maintain the highest level of
accuracy when inputting data
• One way of ensuring the accuracy of data entry is
through double entry
4/19/2019 19
Contd.
• In the double-entry procedure, data are
entered into the database twice and then
compared to determine whether there are any
discrepancies
• The researcher or data entry staff can then
examine the discrepancies and determine
whether they can be resolved and corrected
or if they should simply be treated as missing
data

4/19/2019 20
Contd.
• Although the double entry process is a very
effective way to identify entry errors, it may be
difficult to manage and may not be time or cost
effective.
• As an alternative to double entry, the researcher
may design a standard procedure for checking
the data for inaccuracies
• Such procedures typically entail a careful review
of the inputted data for out-of-range values,
missing data, and incorrect formatting
4/19/2019 21
Contd.
• Much of this work can be accomplished by
running descriptive analyses and frequencies on
each variable.
• In addition, many database programs (e.g.,
Microsoft Excel, Microsoft Access, SPSS) allow the
researcher to define the ranges, formats, and
types of data that will be accepted into certain
data fields
• These databases will make it impossible to enter
information that does not meet the preset
criteria

4/19/2019 22
Transforming Data
• After the data have been entered and checked for
inaccuracies, the researcher or data entry staff
will undoubtedly be required to make certain
transformations before the data can be analyzed
• These transformations
Typically involve the following:
• Identifying and coding missing values
• Computing totals and new variables
• Reversing scale items
• Recoding and categorization

4/19/2019 23
Identifying and Coding Missing Values

• Inevitably, all databases and most variables will


have some number of missing values. This is a
result of either study participants’ failing to
respond to certain questions, missed
observations, or inaccurate data that were
rejected from the database
• most statistical packages (e.g., SPSS, STATA) will
provide the option of ignoring cases in which
certain variables are considered missing, or they
will automatically treat blank values as missing

4/19/2019 24
Missing Value Imputation
• Virtually all databases have some number of
missing values. Unfortunately, statistical
analysis of data sets with missing values can
result in biased results and incorrect
inferences
• A few of the more widely used imputation
techniques include the following

4/19/2019 25
Contd.
• Hot deck imputation: In this imputation technique, the researcher
• matches participants on certain variables to identify potential donors.
• Missing values are then replaced with values taken from matching
respondents (i.e., respondents who are matched on a set of relevant
factors).
• Predicted mean imputation: Imputed values are predicted using certain
• statistical procedures (i.e., linear regression for continuous data and
discriminant function for dichotomous or categorical data).
• Last value carried forward: Imputed values are based on previously
• observed values. This method can be used only for longitudinal variables,
• for which participants have values from previous data collection points.
• Group means: Imputed variables are determined by calculating the
variable’s group mean (or mode, in the case of categorical data).

4/19/2019 26
Computing Totals and New Variables

• In certain instances, the researcher may want


to create new variables based on values from
other variables
• Reversing Scale Items
• Many instruments and measures use items
with reversed scales to decrease the likelihood
of participants’ falling into what is referred to
as a “response set.”

4/19/2019 27
Contd.
• A response set occurs when a participant
begins to respond in a patterned manner to
questions or statements on a test or
assessment measure, regardless of the
content of each query or statement
• an individual may answer false to all test
items, or may provide a 1 for all items
requesting a response from 1 to 5

4/19/2019 28
Recoding Variables
• Some variables may be more easily analyzed if
they are recoded into categories
• a researcher may wish to collapse income
estimates or ages into specific ranges
• Once the data have been screened, entered,
cleaned, and transformed, they should be
ready to be analyzed

4/19/2019 29
DATA ANALYSIS
I. Qualitative Data Analysis
II. Descriptive Quantitative Analysis
III. Correlation and Regression Analysis
IV. Advanced statistical Analysis

4/19/2019 30
Qualitative Data Analysis
• Qualitative research poses a challenge to the researcher in terms
of how to reduce what may feel like an overwhelming amount of
data gathered from in-depth interviews, observations and
written documentation.
• The aims of qualitative analysis are seven fold:
• 1. To detect patterns in the data
• 2. Identify deviants and oddities
• 3. To compare to theory—detection of conformance (if the
scientific method is used)
• 4. Identify groups—classification
• 5. Compare and contrast groups
• 6. Construct a model
• 7. Test the model—validation
4/19/2019 31
Contd.
• Thus, there is an exploration phase (aims 1 and
2), a classification phase (aims 3 and 4), a
drawing conclusion phase (aims 3 and 5), a
representation phase (aim 6) and a testing
phase (aim 7)
• PREPARATION
• To start the analysis, qualitative data has to be
prepared—in the scientific method, this is
inevitably reductionist.
• All the observations, inflections in tone, the way
words were combined, all of them cannot be
4/19/2019 32
captured.
Reducing Interview and
Observational Data
• Interviews Transcription
• Main points and key or interesting verbatim
quotes.
• Type points of information in black; attitudes,
and list of observations about: feelings, views
and so on in green and verbatim quotes in blue
(or any colors you choose)

4/19/2019 33
Observations
• A)Description of the physical environment
• (b) Description of individual’s physical
characteristics
• (c) Description of individual’s behaviour
• (d) Description of interactions

4/19/2019 34
Contd.
• Once this is done, a useful approach is to write
verbatim quotes or summarised information in
the form of statements onto ‘post-its’. This
should be at the lowest level of analysis and each
‘post-it’ should contain only one point—do not
use ‘and’ or ‘or’. ‘Post-its’ of different colours,
sizes and shapes can be used to represent
different times, grades of employment, divisions,
companies and so on
4/19/2019 35
Contd.
• These can be then placed on a blank wall and
moved around into groups representing themes,
and placed into clusters by statements they have
in common
• Classifications and order will emerge from the
initial chaos
• Information can be summarized by theme and
frequencies produced
• Typical cross-tabulations gender and experience
and so on compare
4/19/2019 36
Framework Approach
• The Framework Approach comprises of five
stages, each of which is interrelated.
• 􀁺 Familiarization with data
• 􀁺 Creating a thematic framework
• 􀁺 Coding and Indexing of data
• 􀁺 Charting
• 􀁺 Mapping and interpretation

4/19/2019 37
Familiarisation with Data
• This involves immersion in the data in order to
gain an overview of the depth and diversity of
the material and identification of recurring
themes and issues
• writing up a contact summary sheet as soon as
transcripts have been typed up provides a useful
and reflective overview of what went on during
the interview or discussion, much of which
would have been lost if summarized later

4/19/2019 38
• These sheets also help in direct planning for
subsequent interviews and the development
of topic guides and generation of categories
which have become more sophisticated as new
territories unfold. They provide an invaluable
source of keeping the researcher on track at a
time when he/she could become buried in a
large volume of detail.

4/19/2019 39
Creating a Thematic Framework
• Having identified a number of key themes at the
familiarization stage, these are examined in
detail with a view to setting up a thematic
framework or index. This requires a thorough
careful reading of transcripts in full to ensure
that any index is grounded in the original
accounts and observations
• Although some index categories may be identical
to the original topic areas and questions covered
at interview, others will be newly defined from
the emergent themes
4/19/2019 40
Coding and Indexing of Data
• The next stage involves applying codes to the data.
Codes can be numerical or alphabetical. All the
material is read through and coded alongside the
margin of the text. Single paragraphs, often
contain a number of different themes and are
coded accordingly
• The data should be well molded to the codes that
represent them. This is not a routine exercise and
is time consuming, but nevertheless can provide a
useful mechanism for reviewing judgments made
about the meaning and significance of the data
4/19/2019 41
Charting

• Distilled data summaries can now be charted


according to the coded text. This allows the
analyst to build up a picture of the data as a
whole

4/19/2019 42
Mapping and Interpretation
• The final stage in this process of analyzing
qualitative data is about interpreting the data
and making assertions
• The analyst reviews the charts and research
notes, compares and contrasts perceptions,
accounts or experiences, and searches for
patterns and connections that will help explain
the phenomenon under study.
• This process will be shaped by the key objectives
and research questions guiding the study, and
can be clustered under the following headings:
4/19/2019 43
• Defining Concepts
• 􀁺 Mapping Range and Nature of
Phenomenon
• 􀁺 Creating Typologies
• 􀁺 Finding Associations
• 􀁺 Providing Explanations
• 􀁺 Developing Strategies

4/19/2019 44
METHODS OF ANALYSIS
• I. Content Analysis
• This is a popular approach to the analysis of
qualitative information
• Key phrases or words are counted and the
frequencies analyzed.
• The selection of these depends on, for example,
the particular hypothesis to be tested

4/19/2019 45
Contd.
• This method may be useful in allowing the
researcher to present a picture of what the
concepts are but it does not help in
understanding why ideas or interpretations
arose in the first place.
• In qualitative research, it is likely that you will
engage in some form of content analysis as it is
frequently used to analyze text, pictorial
information, interviews and web pages
• As the name suggests, the purpose of content
analysis is to describe the content of your
respondents’
4/19/2019 46
Contd.
• comments systematically and classify the various
meanings expressed in the material you have
recorded.
• It is not the only way in which you would analyze
the data you have obtained and you might find
yourself presenting information in the form of a
connected narrative .
• Another way is to‘tell a story’, that is, to explain
the key issues, the key points and the context
within which the issues and problems are being
investigated.
4/19/2019 47
Contd.
• But all reporting of semi-structured interviews
assumes that you present findings which are
representative of what was said and content
analysis is a powerful means for familiarizing
yourself with what is there. It enables you as the
researcher to more deeply understand what the
respondents have said and to find ‘typical’
answers in their responses. There are six main
steps in content analysis

4/19/2019 48
1. Identify the Unit of Analysis
• Often the unit of analysis is either an individual
or a company, but it may be more generic like
geographic region or country
• 2. Choose a Set of Categories
• The categories must be relevant to the issue
being explored, mutually exclusive (so that a
unit can only be placed under one category),
exhaustive (covering all possibilities) and
reliable (someone else repeating the analysis
would categorize the unit in the same way). The
categories are chosen based on either a theory
or rationale or on the usefulness of the review
4/19/2019 49
• It is also important that the categories that you
construct are reliable
• If you are using categories drawn from a pre-existing
theory or rationale, then it is important that before
you use them to classify your data you make a brief,
explicit list of the defining characteristics of each
category
• 3. Coding
• Read through the material and, within each context
unit, assign each assertion to one of the categories.
There may be more than one assertion within a
context unit
4/19/2019 50
• 4. Tabulate the Material
• Count the number of assertions under each
category and present the material as a table
• 5.Illustrate the Material
• Present the categories and list all the assertions
under them or a representative set.
• Illustration is very important and the
construction of schematic diagrams to indicate
the relationships between elements and the
direction of influence is very important to aid
the researchers understanding and to facilitate
4/19/2019 51
• 6. Draw Conclusions from the Tabulations and
Diagram
• Produce inferences as to the nature of effects
between elements in the data. From views on
how one element may influence another and
understand the complexity of the problem.

4/19/2019 52
2. Force Field Analysis
• Force Field Analysis is another way of
depicting qualitative data and helps to
categorize positive and negative influences.
• For example, influences on future revenue of a
company gleaned from interviews of members
of the board of directors can be illustrated:
Inhibitors: Competition, Technological, Change
Employee Commitment
• Promoters: Skilled Employees, Loyal
4/19/2019 53
customers
3.Schematic Diagrams
• Schematic diagrams are also useful for
illustrating findings from interviews
• From interviews of staff in company X as to
what contributes to performance
• Schematic Diagram of Contributors to
Performance for Company X

4/19/2019 54

• Training + Perform
ance

» + -
Incentives

»
+ Intensity of
Experience
distribution

4/19/2019 55
4. Cause and Effect Diagrams

• ‘Cause and Effect’ diagrams are frequently


used in quality management to illustrate how
factors may contribute to a target effect, and
help to order these into broad categories.
Consider a situation in which staff members
are interviewed on the causes of absenteeism
in their organization

4/19/2019 56
• Causes:
• Collegues- Bullying, Harassment, Stupidity
• Environment: Dangerous, harsh, temperature
• Smell/dirt
• Individual: commitment, other interest, family
matters, health
• Reward system-pay, recognition, promotion
• Effect: High absenteeism
4/19/2019 57
II. Descriptive Quantitative
Analysis
• In recent years, there has been a move towards
evidence based decision-making
• To do this effectively, business researchers must
collect numerical data, store it and interrogate
it.
• It begins with a review of basic statistical
summaries; frequencies, bar charts and
numerical measures and tables are reviewed,
and an example of a scatter plot
4/19/2019 58
Contd.
• It covers the use of t-tests to compare two
groups as well as analysis of variance (ANOVA)
to compare more than two groups. These are
tests for continuous data.
• For primary sources, we need sufficient editing
responses before entering into a computer data
file. By definition, editing is the process of
checking and adjusting the data for omissions,
legibility and consistency

4/19/2019 59
Contd.
• Each of the columns shown in the worksheet
is called variable. There are two types of
variables.
• The first is continuous variables—entities that
can be measured on a continuous scale, like
age and salary.
• The second is categorical variables, which can
be further divided into nominal and ordinal
variables
4/19/2019 60
DESCRIPTIVE STATISTICS
• Descriptive statistics help us understand and
summarize the data. The summary statistics can
either be represented by tabular form or
graphically
• As their name implies, descriptive statistics are
used to describe the data
• collected in research studies and to accurately
characterize the variables under observation
within a specific sample. Descriptive analyses
are frequently used to summarize a study
sample prior to analyzing a study’s primary
4/19/2019 61
• hypotheses
• This provides information about the overall
representativeness of the sample, as well as
the information necessary for other
researchers to replicate the study, if they so
desire
• The principal objective of descriptive statistics
is to accurately describe distributions of
certain variables within a specific data set

4/19/2019 62
Frequency and Frequency
Distribution
• There is a variety of methods for examining the
distribution of a variable.
• Perhaps the most basic method, and the starting
point and foundation of virtually all statistical
analyses, is the frequency distribution
• frequency distribution is simply a complete list of
all possible values or scores for a particular
variable, along with the number of times
(frequency) that each value or score appears in
the data set
4/19/2019 63
• Frequency is the count of each category in a
certain variable. This count is often expressed as
a percentage form or cumulative percentage
form
• Frequency of Males and Females in the Data Set
• Gender______________________
• Count f m Grand Total_____
• Total 216 258 474_______

4/19/2019 64
• Percentages of Males and Females in the Data Set
• Gender_____________________
• Count f m Grand Total_____
• Total 45.57 54.43 100.00

4/19/2019 65
e.g.2 Test score of students
• Frequency Distribution of Test Scores
• Value Frequency Cumulative Frequency
• 71 1 1
• 76 1 2
• 78 2 4
• 81 2 6
• 82 1 7
• 83 1 8
• 84 2 10
• 85 2 12
• 86 2 14
• 87 1 15
• 89 1 16
• 90 2 18
• 94 3 21
• 98
4/19/2019 1 22 66
Grouped Frequency Distribution
of Test Scores
• Value Frequency Cumulative Frequency
71–75 1 1
76–80 3 4
81–85 8 12
86–90 6 18
91–95 3 21
96–100 2 23

4/19/2019 67
Contd.
• Still another way that this distribution may be
depicted is in what is known as a histogram. A
histogram is nothing more than a graphic
display of the same information contained in
the frequency tables
• Although frequency tables and histograms
provide researchers with a general overview of
the distribution, there are more precise ways of
describing the shape of the distribution of
values for a specific variable.
• These include measures of central tendency
4/19/2019 68
Central Tendency

• The central tendency of a distribution is a


number that represents the typical or most
representative value in the distribution.
Measures of central tendency provide
researchers with a way of characterizing a
data set with a single value. The most widely
used measures of central tendency are the
• mean, median, and mode.

4/19/2019 69
Dispersion

• Measures of central tendency, like the mean,


describe the most likely value, but they do not
tell us anything about how the values vary. For
example, two sets of data can have the same
mean, but they may vary greatly in the way
that their values are spread out

4/19/2019 70
• The spread, more technically referred
• to as the dispersion, of a distribution provides
us with information about how tightly
grouped the values are around the center of
the distribution (e.g., around the mean,
median, and/or mode).
• The most widely used measures of dispersion
are range, variance, and standard deviation

4/19/2019 71
Measures of Association
• In addition to describing the shape of variable
distributions, another important task of
descriptive statistics is to examine and describe
the relationships or associations between
variables.
• Correlations are perhaps the most basic and
most useful measure of association between
two or more variables. Expressed in a single
number called a correlation coefficient (r),
correlations provide information about the
direction of the relationship (either positive or
negative) and the intensity of the relationship (–
4/19/2019 72
• Although correlations are typically regarded as
descriptive in nature, they can—unlike measures
of central tendency and dispersion—be tested
for statistical significance. Tests of significance
allow us to estimate the likelihood that a
relationship between variables in a sample
actually exists in the population and is not simply
the result of chance.

4/19/2019 73
Guidelines for the Production of Tables
• Guidelines for the production of tables can be listed as:
1. The table must have a clear purpose.
2. The table must have an explanatory title.
3. The table must clearly indicate the units of
measurement of the data.
4. The table should clearly indicate the source of the
data.
5. Lines should be drawn where appropriate to draw
attention.
6. Row and column totals should be shown where
meaningful.
7. 4/19/2019
Percentages and ratios should be computed if74
Contd.
• 8. Don’t present too many significant digits.
• 9. It is easier to look down a column of
numbers than across a row.
• 10. Give attention to the spacing and layout of
the table

4/19/2019 75
ARE THERE SIGNIFICANT
DIFFERENCES?
• Here, one asks, is there a significant difference between
a group of sample measurements and a value?
• For example, in a statistical quality improvement
situation, suppose in the manufacture of circuit boards
the critical dimension is 80 microns. If a sample, of five
circuit boards is measured, the measures might be 81,
79, 76, 70 and 77 microns. The average of this sample is
76.6 microns. Is this a big enough difference from the
target of 80 to say that the process is not meeting the
target? To confirm this, another sample could be taken
giving the results 78, 82, 88, 85 and 85 microns—giving
4/19/2019 76
an average of 83.6 microns—now what is to be
Contd.
• Consider 100 samples of size five, which give the
distribution of averages from the sample of size five The
mean of all these samples is 75.1 microns and 95 per
cent of the data lies between 74.7 and 75.5 microns.
This is often referred to as the 95 per cent Confidence
Interval of the mean. One is 95 per cent sure that the
mean lies in this interval. The target of 80 does not lie
• in this interval so it is concluded that the process is not
conforming to target and it is significantly less than 80
microns. It appears that the second sample is
significantly different from the rest of the data
4/19/2019 77
• Theoretically, for any sample the equation that gives the
95 per cent Confidence Interval is
• X ±1.96* √s
• n
• where s is the standard deviation, a measure of
variability and n is the size of the sample. Thus, as the
variability increases and n decreases, the confidence
interval widens
• When a target is not included in the confidence interval
then there is said to be a significant difference
4/19/2019 78
• Likewise if the two confidence intervals were not
to overlap, then they would be significantly
different.
• Often these ideas are quoted as P-values, which
is the likelihood of a difference occurring by
chance—if this is less than 5 per cent (0.05) then
this is taken as an acceptance of significance. (In
SPSS, the P-value is called the sig level.)

4/19/2019 79
• Formally, one tests a null hypothesis (Ho) that
there is no difference to an alternative
hypothesis, (Ha) that there is a difference.
• i.e., Ho: There is no significant difference.
• vs.
• Ha: There is a difference (note there are three
forms of this test).

4/19/2019 80
Comparing against a Target
• Here, a one sample t-test is used
• For example using the data in the file employ.xls
one can ask the question: is the average years of
education more than 10?
• Ho: The mean is not different from 10.
• vs. Ha: The mean is greater than 10.
• To choose which hypothesis to accept a one
sample t-test is used. Essentially, a confidence
interval is constructed and if the test value is in
4/19/2019 81
that interval then Ho is rejected
t-Test on Mean of Years of
Education
• Variable 1 Variable 2
• Mean 13.49 0
• Variance 8.32 0
• Observations 474 474
• If Pis less than 0.05 one accepts Ha and that the
mean years of education is greater than 10.
• Accept the alternative hypothesis as the p value
is less than 0.05—so 95 per cent sure that the
mean is greater than 10.
4/19/2019 82
COMPARING TWO GROUPS
• Here, a two samples t-test is used—there are
two types; one when the variance (standard
deviation squared) between the groups is
broadly similar and the other when the
variances are not similar
• For these t-tests, one requires data that is
roughly symmetrical
• Formally test,
• Ho: the salary is not affected by gender.
• vs.
• Ha: males are paid more than females.
4/19/2019 83
• To use the t-test, first sort the data set by
gender and compute the standard deviations
of log salary for each group and square these
deviations
• Paired Comparisons
• This is used to compare two groups which only
differ by one intervention

4/19/2019 84
• For example, the performance of 12 individuals
before and after a training intervention are
displayed below
• t-Test: Two-Sample Assuming Unequal Variances
• Variable 1 Variable 2
• Mean 10.13 10.54
• Variance 0.065 0.159
• Observations 216 258
• Hypothesised Mean Difference 0
4/19/2019 85

Das könnte Ihnen auch gefallen