Beruflich Dokumente
Kultur Dokumente
STATISTICS
KEY STATISTICAL CONCEPTS
Sample
• a set of data drawn from the
population.
Population • Potentially large, but less than the
• the group of all items of interest to a population
statistics practitioner.
• frequently very large; sometimes infinite.
KEY STATISTICAL CONCEPTS
Statistic
Parameter
110.54 110.23
• Organize Data
– Tables
– Graphs
• Summarize Data
– Central Tendency
– Variation
DESCRIPTIVE STATISTICS
• Organize Data
– Tables
• Frequency Distribution
• Relative Frequency Distribution
– Graphs
• Bar Chart
• Histogram
• Stem and Leaf Plot
• Frequency Polygon
• Pie Chart
• Scatter Plot
SPSS OUTPUT FOR
FREQUENCY DISTRIBUTION
GROUPED RELATIVE FREQUENCY
DISTRIBUTION
80 – 89 3 11.5 11.5
90 – 99 5 19.2 30.7
100 – 109 7 26.9 57.6
110 – 119 4 15.4 73.0
120 – 129 3 11.5 84.5
130 – 139 2 7.7 92.2
140 – 149 1 3.8 96.0
150 and over 1 3.8 100.0
Stem Leaf
8 079
9 33678
10 2356999
11 0159
12 078
13 11
14 0
15
16 2
SPSS OUTPUT OF A
FREQUENCY POLYGON
PIE CHART
SCATTER PLOT
DESCRIPTIVE STATISTICS
Summarizing Data:
• The mean, median and mode are all valid measures of central
tendency, but under different conditions, some measures of central
tendency become more appropriate to use than others (Laerd, 2018).
MEAN
Most commonly called the “average.”
Add up the values for each case and divide by the total
number of cases.
Y-bar = Σ Yi
n
MEAN
What’s up with all those symbols, man?
1 lb at 1 lb at 1 lb at
93 cm 106 cm 110 cm 131 cm
17 21
4
units
units 0
above
below units
units
below
The scale is balanced because…
17 + 4 on the left = 21 on the right
MEAN
Income in Malaysia.
Syed Al-Bukhary
All of Us
Mean Outlier
MEDIAN
When data are listed in order, the median is the point at which
50% of the cases are above and 50% below it.
Symmetric Skewed
Mean
Median
Median Mean
MEDIAN
Symmetric Skewed
Mean
Median
Mode Mode Median Mean
Choosing a Measure of Central Tendency
– If you want to know which score occurred most often, then the mode
is the choice.
– The mean is a better choice to serve as the representative score
because it takes into account all the data in the distribution. However,
it treats all scores alike; differences in magnitude are not taken into
account.
– When the mean is calculated, the value of each number is taken into
account.
• When the scores in your distribution tend to cluster in one of the
tails (i.e., a cluster of high or low scores) the distribution is skewed
(i.e., a nonsymmetrical distribution). In these instances, the median
may be more appropriate.
SUMMARY OF WHEN TO USE
THE MEAN, MEDIAN AND MODE
Type of Variable Type of Data Best measure of
central tendency
Nominal Dicrete Mode
Ordinal Dicrete Median
Interval/Ratio (not Continuous Mean
skewed)
Interval/Ratio Continuous Median
(skewed)
DESCRIPTIVE STATISTICS
Summarizing Data:
To get the range for a variable, you subtract its lowest value
from its highest value.
Class A--IQs of 13 Students Class B--IQs of 13 Students
102 115 127 162
128 109 131 103
131 89 96 111
98 106 80 109
140 119 93 87
93 97 120 105
110 109
Class A Range = 140 - 89 = 51 Class B Range = 162 - 80 = 82
INTERQUARTILE RANGE
A quartile is the value that marks one of the divisions that breaks a series of values into
four equal parts.
25th percentile is a quartile that divides the first ¼ of cases from the latter ¾.
75th percentile is a quartile that divides the first ¾ of cases from the latter ¼.
The interquartile range is the distance or range between the 25th percentile and the 75th
percentile. Below, what is the interquartile range?
The larger the variance, the further the individual cases are from the
mean.
Mean
The smaller the variance, the closer the individual scores are to the
mean.
Mean
VARIANCE
Variance is a number that at first seems complex to calculate.
Yi – Y-bar
If the average person’s car costs $20,000,
my deviation from the mean is - $14,000!
6K - 20K = -14K
VARIANCE
The deviation of 102 from 110.54 is? Deviation of 115?
235.45 = 15.34
Review:
1. Deviation
2. Deviation squared
3. Sum of squares
4. Variance
5. Standard deviation
VARIANCE VS STANDARD DEVIATION
19 25 31 13 25 37
Y = 25 Y = 25
s.d. = 3 s.d. = 6
2. s.d. = 0 only when all values are the same (only when you have a constant and
not a “variable”)
3. If you were to “rescale” a variable, the s.d. would change by the same magnitude
—if we changed units above so the mean equaled 250, the s.d. on the left would
be 30, and on the right, 60
4. Like the mean, the s.d. will be inflated by an outlier case value.
STANDARD DEVIATION
• Note about computational formulas:
– Your book provides a useful short-cut formula for computing the variance
and standard deviation.
– This is intended to make hand calculations as quick as possible.
– They obscure the conceptual understanding of our statistics.
– SPSS and the computer are “computational formulas” now.
diskret
selanjar
SYMBOLS IN STATISTICS
DESCRIPTIVE STATISTICS
Summarizing Data:
162
123.5
M=110.5 106.5
96.5
82
SPSS OUTPUT OF CLASS A & B
SHAPE OF DISTRIBUTIONS
• Shape of distribution is measured by
– Skewness & Kurtosis
• When the scores in your distribution tend to cluster in one of the tails
(i.e., a cluster of high scores or a cluster of low scores) the distribution
is skewed.
– Positively Skewed Distributions – occur when there is cluster of lower
scores, the smaller, more spread-out tail will be on the right (i.e., fewer
high scores).
– Negatively Skewed Distributions – occur when there is a cluster of higher
scores, the smaller more spread out tail will be on the left (i.e., fewer
small scores).
• Statisticians use several specific terms
to describe the different shapes these
distributions can assume.
– Unimodal Distributions have one
prominent category or high point.