Sie sind auf Seite 1von 58

INTRODUCTION TO STATISTICS

LECTURER: KANNI HUANG


MARCH 1ST, 2019
AGENDA

• What is quantitative • Statistical tests


research? • t test
• What is the scientific method? • ANOVA
• What is statistics? • Correlation Coefficients
• Regression
• Introduction to SPSS
AGENDA

• What is quantitative • Statistical tests


research? • t test
• What is the scientific method? • ANOVA
• What is statistics? • Correlation Coefficients
• Regression
• Introduction to SPSS
WHAT IS A SCIENTIFIC THEORY?

• A scientific theory contains abstract concepts and statements


that are considered part of scientific knowledge.

• Theories are in the form of “a set of descriptions of causal


processes.”
AN EXAMPLE OF A THEORY
(Ajzen, 1991)
PROPOSE A VALID HYPOTHESIS FOR ANALYSIS
Don’t do that. Do this.
What role did social media play in Time spent on social media is related
Vague relationship mobilizing people? to participate in political campaign.

Political knowledge and time spent on H1: Political knowledge is related to


Multiple social media are related to political political participation.
relationships participation. H2: Time spent on social media is
related to political participation.
Time spent on social media is NOT Time spent on social media is related
No relationship
related to political participation. to political participation.
H1: Time spent on social media leads H1: Political knowledge is related to
to political participation. political participation.
Model design
H2: Time spent on social media leads H2: Time spent on social media is
to increasing political knowledge. related to political participation.
IMPORTANT CONCEPTS

Population: A complete set of observation (data) of the


entire group of individuals under consideration .
A population can be finite or infinite.
Example: The number of students in this class, the
population in China etc.

Sample: A set of data drawn from population containing a


part which can reasonably serve as a basis for valid
generalization about the population.
A sample is a portion of a population selected for
further analysis.
EXAMPLE

• The environmental protection agency (EPA) uses a few new


automobiles of each brand every year to collect data on pollution
emission and gasoline mileage performance. For the Toyota Prius
brand, identify the
• Population
• Sample
POPULATIONS AND SAMPLES
THE POPULATION
is the set of all the individuals
of interest in particular study.

The result from the The sample is selected


sample are generalized from the population
to the population

THE SAMPLE
is a set of individuals selected from a population, usually
intended to represent the population in a research study.
PARAMETER AND STATISTIC

• A parameter is a value, usually a numerical


value, that describes a population.

• A statistic is a value, usually a numerical


value, that describes a sample.
IMPORTANT CONCEPTS

Sample size: The number of items under investigation in


a sample.

Census data: A way of obtaining data referring the


entire population including a total coverage of the
population.
Sample data: A way of obtaining data referring a
portion of the entire population consisting only a partial
coverage of the population.
SIMPLE RULE OF THUMB

• A good maximum sample size is usually around 10% of the population, as long
as this does not exceed 1000. For example, in a population of 5000, 10% would
be 500. In a population of 200,000, 10% would be 20,000. This exceeds 1000, so
in this case the maximum would be 1000.

• Even in a population of 200,000, sampling 1000 people will normally give a fairly
accurate result. Sampling more than 1000 people won’t add much to the
accuracy given the extra time and money it would cost.
THE READYMADE TABLE
How large a sample of
patients should be followed
up
If an investigator wishes to
estimate the incidence rate
of a disease to within 10%
of it’s true value with 95%
confidence?
There are different procedures for calculating sample size

1. Estimation (Confidence interval approach)

SAMPLE SIZE
ESTIMATION 2. Hypothesis testing(Test of significance approach)

A researcher needs to select the appropriate procedure for


computing the sample size &accordingly use the approach
of drawing a statistical inferencesubsequently.
Online Power analysis and sample size calculator:
http://powerandsamplesize.com/Calculators/
AGENDA

• What is quantitative • Statistical tests


research? • t test
• What is the scientific method? • ANOVA
• What is statistics? • Correlation Coefficients
• Regression
• Introduction to SPSS
WHAT IS STATISTICS?

• A set of mathematical procedure for


organizing, summarizing, and interpreting
information (Gravetter, 2004).

• Any numerical summary measure based on


data from a sample (Fortune, 1999).
DESCRIPTIVE STATISTICS

• The purpose of descriptive statistics is to


organize and to summarize observations so that
they are easier to comprehend.

• Descriptive Statistics describes the observed


data (usually a convenience sample) without
making conclusion or generalization.
Categorical data Quantitative data
Bar chart Histogram
Graphs Pie chart Distribution (skewness, kurtosis)
Boxplot (ordinal) Boxplot
Tables Frequency table Frequency table
Mode Mode
Centrality Median (ordinal) Median
********************** Mean
Range (ordinal) Range
Variability
********************** Standard deviation
********************** z-score
Position Percentile (ordinal) percentile
********************** Outlier

Summary Interquartile range (ordinal) Interquartile range


INFERENTIAL STATISTICS
• The purpose of inferential statistics is to draw an
inference about condition that exist in the
population (the complete set of observation)
from study of a sample (a subset) drawn from
population.

• The goal of inferential statistics is to give


reasonable estimates of unknown population
parameters.
AGENDA

• What is quantitative • Statistical tests


research? • t test
• What is the scientific method? • ANOVA
• What is statistics? • Correlation Coefficients
• Regression
• Introduction to SPSS
STATISTICAL
ANALYSIS
SOFTWARE
WHAT IS SPSS?

 “Statistical Package for the Social Sciences”

 It is a software used for data analysis in


social science research. Can be used
for:
 Reporting in Tables and Graphs
 Analyzing: Means, Chi-square, Regression, …
and much more..
INTRODUCTION TO SPSS

• Data editor: a spreadsheet for preparing data for analysis


and reporting
• Output
• Help
PREPARATION

• Assign each questionnaire an identification number (ID#). The ID


number provides a critical link between the questionnaire and the
data file.

• Assign a numeric value for each response.


CHOOSING STATISTICAL TESTS
AGENDA

• What is quantitative • Statistical tests


research? • t test
• What is the scientific method? • ANOVA
• What is statistics? • Correlation Coefficients
• Regression
• Introduction to SPSS
H: Do males and females significantly differ
on their time spent on video games?
IV: Gender (2 groups: males and females)
DV: Hours playing video games
Independent
Samples t Test H: Do older people exercise significantly
less frequently than younger people?
IV: Age (2 groups: older people and younger
people)
DV: Frequency of getting exercise

3
2
SPSS: THE INDEPENDENT-SAMPLES T TEST

• Analyze  independent-samples T test  transfer the dependent


variable into the test variable(s)  transfer the independent variable
into the grouping variable section  click define groups and enter the
values of the two levels of the independent variable  continue 
OK
INTERPRETING Output Tables
OUTPUT TABLE:
Mean APGAR
Sample size SCORE

Levene’s tests the assumption of equal


variances – if p < .05, then variances
t-value Degrees of
are not equal and use a different test freedom
to modify this:

Here, we have met


the assumption so
use first row. CI

p - value
Observed difference
between the groups
DRAWING
CONCLUSIONS
• An independent-samples t test comparing the mean scores of sharing
information on social media found a significant difference between two
groups: people with lower and higher scores on extraversion scale (t(62)
= .-2.034, p < .05). The mean of the group with higher extraversion
scores was significantly higher (m =1.191, sd = 1.347) than the mean
of the group with lower scores (m = .491, sd = 1.379).
ONE-WAY ANALYSIS OF VARIANCE
EXAMPLES OF HYPOTHESES
• We wish to compare the mean hourly wage for nonunion farm laborers
from three different ethnic groups (African American, Anglo-American,
and Hispanic).
• We wish to test whether there is a statistically significant difference in
mean blood pressures among the four BMI groups (underweight, normal
weight, overweight, obese) .
• Four age groups of 25 patients each (0-5 years, 6-11 years, 12-17
years, 18-31 years) were determined for evaluating whether the laser
treatment was more effective (improvement in color) for younger
patients.
SPSS: ONE-WAY ANOVA
• The one-way ANOVA compares the means of two or more groups of subjects that vary on a single
independent variable.
• The one-way anova requires a single dependent variable and a single independent variable.
Groups should be independent of each other.
• Anova also assumes that the dependent variable is at the interval or ratio levels and normally
distributed.
• Open file “example 4.”
• Analyze  compare means  one-way anova  place the independent variable in the factor
box  choose the dependent variable  options  click on descriptive  click homogeneity of
variance test  continue  post-hoc  tukey  continue  ok
TEST OF EQUAL VARIANCE
ONE-WAY ANOVA OUTPUT

• THE PRIMARY ANSWER IS F.


• This table presents us with every possible combination of levels of the independent variable.
DRAWING CONCLUSIONS

• Drawing conclusions for ANOVA requires that we indicate the value of


F, the degrees of freedom, and the significance level. A significant
ANOVA should be followed by the results of a post-hoc analysis and
a verbal statement of the results.
DRAWING CONCLUSIONS

• Phrasing results that are significant


• A one-way ANOVA was computed comparing facebook users’ scores of self-expression
intention among three groups: high, medium and low extraversion personalities. A
significant difference was found among the groups (F(2, 2140) = 63.15, p < .001).
Tukey’s HSD was used to determine the nature of the differences between the groups.
This analysis revealed that facebook users who were scored as low in extraversion had
lower scores in self-expression intention (M = -.258, SD = 1.444) than those who were
scored medium (M = .162, SD = 1.384, p < .001) and high (M = .426, SD = 1.520, p <
.001).
• Phrasing results that are not significant
• Facebook users’ scores of self-expression intention among three
groups: high, medium and low extraversion personalities were
compared using a one-way ANOVA. No significant difference was
found (F( , ) = , p > .05). The intention of self-expression from
the three groups did not differ significantly.
CORRELATION COEFFICIENTS
• Nominal by nominal:
Phi (Φ) / Cramer’s V, Chi-square

• Ordinal by ordinal:
Spearman’s rank

• Interval/ratio by interval/ratio:
Product-moment or Pearson’s r
16
PEARSON r CORRELATION

• IN SPSS,
Analyze  correlate  bivariate  click on “Pearson”
Exclusiveness
Exclusiveness Non- Human Exclusiveness (1 = non-profit
Shareworthiness Verification Visualization (1= media) domestic interests (1 = official) groups) Conflicts
Shareworthiness
(1 = 12 or more 1

* p < .05
reposts)
Verification
.384** 1
(1 = verified)
Visualization ** p < .01
(1 = with
pictures)
.372** .079* 1
*** p < .001
Exclusiveness
.354** .270** .174** 1
(1= media)
Non-domestic
.172** .079* .059 .148** 1
(1 = foreign)
Human interests
(1 = .168** .014 .049 .128** .192** 1
consequences)
Exclusiveness
-.095* .404** -.153** -.362** -.096** -.142** 1
(1 = official)
Exclusiveness
(1 = non-profit -.005 -.016 -.036 -.045 -.031 -.039 -.069* 1
groups)
Conflicts (1 =
.109* -.075* -.002 .091** .064* -.050 -.146** -.020 1
conflict)
MEASURING CORRELATION: PEARSON R

Size of Effect /
r r2 (% of variance)
Magnitude

small 0.1 0.01 (1%)

medium 0.3 0.09 (9%)


49

large 0.5 0.25 (25%)


SIMPLE LINEAR REGRESSION MODEL
PURPOSES OF USING REGRESSION MODEL

• Predicting future values of a variable, such as financial officers predict


future cash flows, or college committee predicts a graduate student’s
academic potential.
• Explanation of past variation, such as finding the variables that explain
addiction to video games.
• The basic idea of regression analysis is to use data on a quantitative
independent variable to predict or explain variation in a quantitative
dependent variable.
Types of Relationships
Linear relationships Curvilinear relationships

Y Y

X X

Y Y

Slide-52
X X
Types of Relationships
No relationship

X
SPSS: SIMPLE LINEAR REGRESSION MODEL
Ha: The higher the intention of self-expression a person has,
the more likely that the person will share information on
Facebook.

IN THE SPSS,
Analyze  regression  linear  statistics  click
on “confidence intervals”  click on “descriptives” 
continue  select “save”  under “prediction
intervals” click on “mean”  set up the confidence
level  continue  OK
SPSS OUTPUT
DRAWING CONCLUSIONS

A simple linear regression was calculated predicting a person’s motivation to share


information on FB. A significant regression equation was found (F (1, 3316) =
1633.884 , p <.001), with an R2 of .33. Subjects’ average score of sharing
information increased 0.626 for each point increased in the intention of self-
expression.
The conclusion should state the direction (increase or decrease), explained variance
(r2), slope (β1), model fit (f) and significance level (p value) of the regression. A
statement of the equation is also included.
Ŷ = 4.824 + 0.626x
DRAWING CONCLUSIONS

A simple linear regression was calculated


predicting y based on x. The regression equation
was not significant (F = , p >.05), with an R2 of
. x cannot be used to predict y.
RESOURCES
• Hyperstat online (simpler, goes through ANOVA and chi-square, available free at
http://davidmlane.Com/hyperstat/
• Penn state university online course/ STAT 414/415 (available free at
https://onlinecourses.Science.Psu.Edu/stat414/)
• SPSS beginners tutorials (available free at https://www.spss-
tutorials.com/basics/)
• Report the results:
Angeli, E., Wagner, J., Lawrick, E., Moore, K., Anderson, M., Soderlund, L., & Brizee,
A. (2010, may 5). General format. Retrieved from APA formatting and style guide

Das könnte Ihnen auch gefallen