Beruflich Dokumente
Kultur Dokumente
CHAPTER 3
DESCRIPTIVE STATISTICS (UNGROUP DATA)
Introduction
2
Measures in this chapter include i. Measures of central tendency ii. Measures of dispersion iii. Measures of position iv. Measures of skewness
Usually called the average Single value situated at the center of the data and can be taken as a summary value for that data set. Three common measures mean, mode, median
The mean is the sum of the values, divided by the total number of values. For population,
X 1 X 2 X 3 ... X N N
Example 1
6
The following data give the hours spent studying per week by nine students. Find the mean and interpret 12 16 7 22 11 18 7 25 16 Answer: 14.89 hours Interpretation:
On the average x ), 14.89 hours was spent for ( studying in a week.
QMT554 DATA ANALYSIS
Example 2
7
The data represent the annual chocolate sales (in billions of dollars) for a sample of seven countries in the world. Find the mean and interpret 2.0 4.9 6.5 2.1 5.1 3.2 16.6
Pros Summarizes data in a way that is easy to understand. Uses all the data Used in many statistical applications Cons Affected by extreme values
Mean Example #1
9
Sum = 18 + 19 + 20 + 20+ 20+ 20 + 20 + 22 + 24 + 28 = 211 Mean = 211 10 = 21.1 Reasonable as a typical or middle new customers recruited Mean is a reasonable measure of central tendancy
9
10
24
28
Mean Example #2
10
Mean =
303 10 = 30.3 Does NOT seem to be typical or middle number No one in department is close to 30.3
9
10
24
120
The median is the midpoint of the data array. Lies in the middle of the data. Symbol for median is MD Is NOT influenced by extremely high or low numbers in a set of data Example : cost of houses, income, age, etc. Steps in computing the median of raw data Step 1 Arrange the data in order Step 2 Select the middle point. If the middle point falls halfway between two values, find the median by adding the two values and dividing by two. QMT554 DATA ANALYSIS
Example 3
12
Find and interpret the median for ages of seven preschool children. The ages are 1 3 4 2 3 5 1
Answer: 1 1
3
median
13
Interpretation Half of the preschool children aged at least 3 years old or at most 3 years during the study period.
Example 4
The number of cloudy days for the top 6 cloudiest cities is shown. Find the median. 209 223 211 227 213 240
213
223
227
240
median
15
Interpretation Half (50%) of the cities were having cloudy days at an average of at least 218 days during the study period.
MODE
16
The mode is the value that occurs most often in a data set. Data set with only one mode unimodal Data set with two modes bimodal Data set with more than two modes multimodal
Example 5
17
The following data represent the duration (in days) of U.S. Space Shuttle voyages for the years 1992-1994. Find the mode and interpret
8 9 9 14 8 8 10 7 6 9 7 8 10 14 11 8 14 11 Answer: Mode for the data set is 8 Interpretation: Most of the U.S. Space Shuttle voyages for 8 QMT554 DATA ANALYSIS days
Example 6
18
19
20
As discussed in previous topic, histogram or a frequency distribution curve can assume either skewed shape or symmetrical shape.
Knowing the value of mean, median and mode can give us some idea about the shape of frequency curve.
21
Mean, median, and mode for a symmetric histogram and frequency distribution curve
22
Mean, median, and mode for a histogram and frequency distribution curve skewed to the right
23
Mean, median, and mode for a histogram and frequency distribution curve skewed to the left
To determine the skewness of data (symmetry, left skewed, right skewed) Measure the lack of symmetry in a data distribution The relationship between mean, median and mode Mean>median : positively-skewed Mean=median : symmetrical Mean<median : negatively-skewed If mean>mode : positively-skewed QMT554 DATA ANALYSIS
25
MEASURES OF POSITION
QUARTILES 1st Quartile & 3rd Quartile BOX AND WHISKER PLOT
QUARTILE
26
Describe positional values of data set. divide the distribution into four groups, separated by Q1, Q2 and Q3.
First Quartile - positional value where 25% of the observations are smaller and 75% are larger than this value.
QMT554 DATA ANALYSIS
27
Second Quartile - positional value where 50% of the observations are greater or equal to this value and another half are smaller or equal this value Second Quartile = median Third Quartile positional value where 75% of the observations are smaller and 25% are larger than this value.
Step 1 Arrange the data in order from lowest to highest Step 2 Find the median of the data values. That is the value for Q2. Step 3 Find the median of the data values that fall below Q2. This is the value for Q1 Step 4 Find the median of the data values that fall above Q2. This is the value for Q3
Location formula
29
Q1 = (n + 1)th/4 value of ordered observations Q2 = (n +1)th/2 value of ordered observations Q3 = 3(n + 1)th/4 value of ordered observations
Example 8A
Solutions 5 6
12
13
15
18
22
50
13 15 Q2 median 14 2
QMT554 DATA ANALYSIS 30
Continue.
First Quartile, Q1 5 6 12 13
Q1 position = (8+1)/4 = 2.25
15
18
22
50
6 12 Q1 median 9 2
Third quartile, Q3 5 6 12 13
18 22 Q3 median 20 2
15
18
22
50
31
Example 8B
32
Interpretation??
Purpose : to find out what information can be discovered about the data such as the center and spread Box and Whisker plot involve five specific values. 1) The lowest value (minimum) 2) Q1 3) The median 4) Q3 QMT554 DATA ANALYSIS 33 5) The highest value (maximum)
If the median is near the center of box, the distribution is approximately symmetric If the median falls to the left of the center of the box, the distribution is positively skewed. If the median falls to the right of the center, the distribution is negatively skewed.
QMT554 DATA ANALYSIS 34
Example 9
A stockbroker recorded the number of clients she saw each day over an 11-day period. The data are shown. Construct a box and whisker plot for the data. 33, 38, 43, 30, 29, 40, 51, 27, 42, 23, 31
35
Solutions
Step 1 : Arrange the data in order 23, 27, 29, 30, 31, 33, 38, 40, 42, 43, 51 Step 2 : Find the median 23, 27, 29, 30, 31, 33, 38, 40, 42, 43, Median =33 51
Step 3 : Find Q1 Q1=29 23, 27, 29, 30, 31, 33, 38, 40, 42, 43, QMT554 DATA ANALYSIS 36 51
Step 4 : Find Q3 23, 27, 29, 30, 31, 33, 38, 40, 42, 43, 51
Q3=42
Step 5 : Draw a scale for the data on the x axis Step 6 : Locate the lowest values, Q1, the median, Q3 and the highest value on the scale Step 7 Draw a box around Q1 and Q3, draw a vertical line through the median, and connect the upper and lower values
QMT554 DATA ANALYSIS 37
SPSS OUTPUT
38
Example 10
A dietitian is interested in comparing the sodium content of real cheese and cheese substitute. The data for two random samples are shown. Compare the distribution using box plots.
Real cheese Cheese substitute
310 420 45 40 270 180 250 290 220 240 180 90 130 260 340 310
QMT554 DATA ANALYSIS 39
40
The interquartile range (IQR) is defined as the difference between and , and is the range of the middle 50% of the data
IQR Q3 Q1
Interquartile range is used to identify outliers, and it is also used as a measure of variability If data values smaller than 1 1.5(IQR ) or Q Qthan larger 3 1.5(IQR ) , the data is an outlier.
QMT554 DATA ANALYSIS 41
Example 11
42
Refer to example 9, find the IQR and identify if there any outliers
43
MEASURES OF SKEWNESS
Pearson Coefficient of Skewness
44
Interpretation of measures
45
If Sk +ve If Sk -ve If Sk = 0
If Sk takes a value in between (-0.9999, -0.0001) or (0.0001, 0.9999) indicates that the data is approximately symmetry.
QMT554 DATA ANALYSIS
Example 12A
46
The duration of cancer patient warded in Hospital Sultanah Bahiyah recorded in a frequency distribution. From the record, the mean is 28 days, median is 25 days and mode is 23 days. Given the standard deviation is 4.2 days. Find the skewness coefficient. What is the type of distribution?
QMT554 DATA ANALYSIS
47
Solution:
Mean - Mode 28 23 Sk 11905 . s 4.2 OR Sk 3 Mean - Median s 3 28 25 4.2 21429 .
Example 12B
48
49
MEASURES OF DISPERSION
RANGE STANDARD DEVIATION VARIANCE
RANGE
50
The range is the highest value minus the lowest value Disadvantages: being influenced by outliers. Based on two values only. All other values in a data set are ignored. Refer to example 4 Highest value = 240 Lowest value = 209 Range = 240 209 = 31 QMT554 DATA ANALYSIS
51
52
X X N
2
53
s2
X
n
n 1
s2
Example 13
54
Find the sample variance and standard deviation for the amount of European auto sales for a sample of 6 years shown. The data are in millions of dollars. 11.2 11.9 12.0 12.8 13.4 14.3
Solution
x 11.2 11.9 ... 14.3 75.6 x 11.2 11.9 ... 14.3 958.94 x x
2 2 2 2
2 2
s2
s s 2 1.276 1.13
QMT554 DATA ANALYSIS 55
Example 14
56
Refer to Example 9. From the SPSS output, interpret the value of range, variance and standard deviation.
Example 15
57
The following data give the number of patients admitted to a hospital on seven days during the month of January 2003.
19
14
25
21
13
16
CALCULATOR MANUAL
58
1) CLEAR DATA 2) MODE (SD) PRESS 1 3) KEY IN DATA (eg. If data 56, 57 and 58) Press 56 and M+ followed by 57 and M+ 58 and M+ 2 4) Press SHIFT 1 for x , x, n 5) Press SHIFT 2 for x (sample mean), xn (population std. deviation), xn 1(sample std. deviation)
59
END OF CHAPTER 3
THANK YOU