Sets of Data Business Statistics Business Statistics Our market share far exceeds all competitors!
30%
32%
34%
36%
Us Y X Business Statistics Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram Presenting Qualitative Data Business Statistics Pie Chart Pareto Diagram Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Business Statistics Summary Table 1. Lists categories & number of elements in category 2. Obtained by tallying responses in category 3. May show frequencies (counts), % or both Row Is Category Tally: |||| |||| |||| |||| Major Count Accounting 130 Economics 20 Management 50 Total 200 Business Statistics Pie Chart Summary Table Data Presentation Qualitative Data Quantitative Data Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pareto Diagram 0 50 100 150 Acct. Econ. Mgmt. Major Business Statistics Vertical Bars for Qualitative Variables Bar Height Shows Frequency or % Zero Point Percent Used Also
Equal Bar Widths F r e q u e n c y
Bar Graph Business Statistics Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram Econ. 10% Mgmt. 25% Acct. 65% Business Statistics Pie Chart 1. Shows breakdown of total quantity into categories 2. Useful for showing relative differences 3. Angle size (360)(percent) Majors (360) (10%) = 36 36 Business Statistics Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram Business Statistics Pareto Diagram Like a bar graph, but with the categories arranged by height in descending order from left to right. 0 50 100 150 Acct. Mgmt. Econ. Major Vertical Bars for Qualitative Variables Bar Height Shows Frequency or % Zero Point Percent Used Also Equal Bar Widths F r e q u e n c y
Business Statistics Thinking Challenge Youre an analyst for IRI. You want to show the market shares held by Web browsers in 2006. Construct a bar graph, pie chart, & Pareto diagram to describe the data. Browser Mkt. Share (%) Firefox 14 Internet Explorer 81 Safari 4 Others 1 0% 20% 40% 60% 80% 100% Firefox Internet Explorer Safari Others Business Statistics M a r k e t
S h a r e
( % )
Browser Bar Graph Solution Business Statistics Market Share Safari, 4% Firefox, 14% Internet Explorer, 81% Others, 1% Pie Chart Solution Business Statistics 0% 20% 40% 60% 80% 100% Internet Explorer Firefox Safari Others M a r k e t
S h a r e
( % )
Browser Pareto Diagram Solution Presenting Quantitative Data Business Statistics Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram Business Statistics Stem-and-Leaf Display
1. Divide each observation into stem value and leaf value Stem value defines class Leaf value defines frequency (count) 2. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 26 2 144677 3 028 4 1 Business Statistics Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram Business Statistics Frequency Distribution Table Steps 1. Determine range 2. Select number of classes Usually between 5 & 15 inclusive 3. Compute class intervals (width) 4. Determine class boundaries (limits) 5. Compute class midpoints 6. Count observations & assign to classes Business Statistics
Determine the range Range (R) = highest value lowest value Number of classes C=1 + 10/3 x log N ( N = number of observation) Class Interval CI = R/C (rounded) Class Limits/Boundaries Lowest Limits value <= lowest value Highest Limits value >= Highest Value Class Mid Point CM = (Lower + Upper Limits) / 2
Business Statistics Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram 0 1 2 3 4 5 Business Statistics Frequency Relative Frequency Percent 0 15.5 25.5 35.5 45.5 55.5 Lower Boundary Bars Touch Class Freq. 15.5 25.5 3 25.5 35.5 5 35.5 45.5 2 Count Histogram Business Statistics Raw Data: 24, 26, 24, 21, 27 27 30, 41, 32, 38 20 18 42 25 57 26 35 29 34 40 33 21 56 45 51 23 36 54 20 19 Make Distribution Frequency Table ! Business Statistics Relative Frequency Distribution Class 18 23 2 24 29 1 42 47 3 Frequency % 30 35 36 41 54 59 48 53 4 5 8 7 10 3 7 13 17 27 23 Numerical Data Properties Business Statistics Standar Notation Measure Sample Population Mean
X
Standard Deviation S o Variance
S 2 o 2 Size n N Business Statistics Central Tendency (Location) Variation (Dispersion) Shape Numerical Data Properties Business Statistics Numerical Data Properties Mean Median Mode Central Tendency Range Variance Standard Deviation Variation Percentiles Relative Standing Interquartile Range Zscores Central Tendency Business Statistics Mean Median Mode Range Variance Standard Deviation Interquartile Range Numerical Data Properties Central Tendency Variation Percentiles Relative Standing Zscores Business Statistics Mean 1. Measure of central tendency 2. Most common measure 3. Acts as balance point 4. Affected by extreme values (outliers) 5. Formula (sample mean) X X n X X X n i i n n = = + + + =
1 1 2
Business Statistics Mean Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7 X X n X X X X X X i i n = = + + + + + = + + + + + = =
1 1 2 3 4 5 6 6 10 3 4 9 8 9 11 7 6 3 7 7 6 8 30 . . . . . . . Business Statistics Mean Median Mode Range Variance Standard Deviation Interquartile Range Numerical Data Properties Central Tendency Variation Percentiles Relative Standing Zscores Business Statistics Median 1. Measure of central tendency 2. Middle value in ordered sequence If n is odd, middle value of sequence If n is even, average of 2 middle values 3. Position of median in sequence
4. Not affected by extreme values Positioning Point = + n 1 2 Business Statistics Median Example (Odd-sized sample) Raw Data: 24.1 22.6 21.5 23.7 22.6 Ordered: 21.5 22.6 22.6 23.7 24.1 Position: 1 2 3 4 5 Positioning Point Median = + = + = = n 1 2 5 1 2 3 0 22 6 . . Business Statistics Median Example (Even-sized Sample) Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7 Ordered: 4.9 6.3 7.7 8.9 10.3 11.7 Position: 1 2 3 4 5 6 Positioning Point Median = + = + = = + = n 1 2 6 1 2 3 5 7 7 8 9 2 8 30 . . . . Business Statistics Mean Median Mode Range Variance Standard Deviation Interquartile Range Numerical Data Properties Central Tendency Variation Percentiles Relative Standing Zscores Business Statistics Mode
1. Measure of central tendency 2. Value that occurs most often 3. Not affected by extreme values 4. May be no mode or several modes 5. May be used for quantitative or qualitative data Business Statistics Mode Example
No Mode Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7 One Mode Raw Data: 6.3 4.9 8.9 6.3 4.9 4.9 More Than 1 Mode Raw Data: 21 28 28 41 43 43 Business Statistics Thinking Challenge
Youre a financial analyst for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11. Describe the stock prices in terms of central tendency. Business Statistics Mean X X n X X X i i n = = + + + = + + + + + + + = =
1 1 2 8 8 17 16 21 18 13 16 12 11 8 15 5
. Business Statistics Median Raw Data: 17 16 21 18 13 16 12 11 Ordered: 11 12 13 16 16 17 18 21 Position: 1 2 3 4 5 6 7 8 Positioning Point Median = + = + = = + = n 1 2 8 1 2 4 5 16 16 2 16 . Business Statistics Mode
Raw Data: 17 16 21 18 13 16 12 11
Mode = 16 Business Statistics
Summary of Central Tendency Measures Measure Formula Description Mean E X i / n Balance Point Median ( n +1) Position 2 Middle Value When Ordered Mode none Most Frequent Variation Business Statistics Mean Median Mode Range Variance Standard Deviation Interquartile Range Numerical Data Properties Central Tendency Variation Percentiles Relative Standing Zscores Business Statistics Range 1. Measure of dispersion 2. Difference between largest & smallest observations Range = X largest X smallest 3. Ignores how data are distributed 7 8 9 10 7 8 9 10 Range = 10 7 = 3 Range = 10 7 = 3 Business Statistics Mean Median Mode Range Interquartile Range Variance Standard Deviation Numerical Data Properties Central Tendency Variation Percentiles Relative Standing Zscores Business Statistics Variance & Standard Deviation 1. Measures of dispersion 2. Most common measures 3. Consider how data are distributed 4 6 10 12 X = 8.3 4. Show variation about mean (X or ) 8 Business Statistics n - 1 in denominator! (Use N if Population Variance) Sampel Variance Formula X X X X X X n n 1 2 2 2 2 1 = + + +
( ) ( ) ( )
= S X X n i i n 2 2 1 1 =
=
( ) Business Statistics Standar Deviation Formula S S X X n X X X X X X n i i n n = =
= + + +
=
2 2 1 1 2 2 2 2 1 1 ( ) ( ) ( ) ( )
Business Statistics Variance Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7 S X X n X X n S i i n i i n 2 2 1 1 2 2 2 2 1 8 3 10 3 8 3 4 9 8 3 7 7 8 3 6 1 6 368 =
= = = + + +
= = =
( ) ( ) ( ) ( ) where . . . . . . . .
Business Statistics Thinking Challenge Youre a financial analyst for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11. What are the variance and standard deviation of the stock prices? Business Statistics Variation Solution Raw Data: 17 16 21 18 13 16 12 11 S X X n X X n S i i n i i n 2 2 1 1 2 2 2 2 1 15 5 17 15 5 16 15 5 11 15 5 8 1 11 14 =
= = = + + +
= = =
( ) ( ) ( ) ( ) where . . . . .
Business Statistics Sample Standard Deviation S S X X n i i n = =
= = =
2 2 1 1 11 14 3 34 ( ) . .
Business Statistics Summary of Variation Measures Measure Formula Description Range X largest X smallest Total Spread Standard Deviation (Sample) X X n i
( )
2 1 Dispersion about Sample Mean Standard Deviation (Population) X N i X ( )
2 Dispersion about Population Mean Variance (Sample) E ( X i
X ) 2 n 1 Squared Dispersion about Sample Mean Interpreting Standard Deviation Business Statistics Intrepreting Standard Deviation : Chebyshevs Theorem (Applies to any shape data set) No useful information about the fraction of data in the interval x s to x + s At least 3/4 of the data lies in the interval x 2s to x + 2s At least 8/9 of the data lies in the interval x 3s to x + 3s In general, for k > 1, at least 1 1/k 2 of the data lies in the interval x ks to x + ks Business Statistics Interpreting Standard Deviation: Chebyshevs Theorem s x 3 s x 3 + s x 2 s x 2 + s x + x s x No useful information At least 3/4 of the data At least 8/9 of the data Business Statistics Chebyshevs Theorem Example Previously we found the mean closing stock price of new stock issues is 15.5 and the standard deviation is 3.34. Use this information to form an interval that will contain at least 75% of the closing stock prices of new stock issues. Business Statistics At least 75% of the closing stock prices of new stock issues will lie within 2 standard deviations of the mean.
x = 15.5 s = 3.34 (x 2s, x + 2s) = (15.5 23.34, 15.5 + 23.34) = (8.82, 22.18) Business Statistics Interpreting Standard Deviation : Empirical Rule Applies to data sets that are mound shaped and symmetric Approximately 68% of the measurements lie in the interval to + Approximately 95% of the measurements lie in the interval 2 to + 2 Approximately 99.7% of the measurements lie in the interval 3 to + 3 Interpreting Standard Deviation: Empirical Rule 3 2 + +2 + 3 Approximately 68% of the measurements Approximately 95% of the measurements Approximately 99.7% of the measurements Empirical Rule Example Previously we found the mean closing stock price of new stock issues is 15.5 and the standard deviation is 3.34. If we can assume the data is symmetric and mound shaped, calculate the percentage of the data that lie within the intervals x + s, x + 2s, x + 3s. Empirical Rule Example Approximately 95% of the data will lie in the interval (x 2s, x + 2s), (15.5 23.34, 15.5 + 23.34) = (8.82, 22.18)
Approximately 99.7% of the data will lie in the interval (x 3s, x + 3s), (15.5 33.34, 15.5 + 33.34) = (5.48, 25.52)
According to the Empirical Rule, approximately 68% of the data will lie in the interval (x s, x + s), (15.5 3.34, 15.5 + 3.34) = (12.16, 18.84)
Numerical Measures of Relative Standing Numerical Data Properties & Measures Mean Median Mode Range Variance Standard Deviation Interquartile Range Numerical Data Properties Central Tendency Variation Percentiles Relative Standing Zscores Numerical Measures of Relative Standing: Percentiles Describes the relative location of a measurement compared to the rest of the data The p th percentile is a number such that p% of the data falls below it and (100 p)% falls above it Median = 50 th percentile Percentile Example You scored 560 on the GMAT exam. This score puts you in the 58 th percentile. What percentage of test takers scored lower than you did? What percentage of test takers scored higher than you did? Percentile Example What percentage of test takers scored lower than you did? 58% of test takers scored lower than 560. What percentage of test takers scored higher than you did? (100 58)% = 42% of test takers scored higher than 560. Numerical Data Properties & Measures Mean Median Mode Range Variance Standard Deviation Interquartile Range Numerical Data Properties Central Tendency Variation Percentiles Relative Standing Zscores Numerical Measures of Relative Standing: ZScores Describes the relative location of a measurement compared to the rest of the data
Sample zscore x x s z = Population zscore x
z = Measures the number of standard deviations away from the mean a data value is located ZScore Example The mean time to assemble a product is 22.5 minutes with a standard deviation of 2.5 minutes. Find the zscore for an item that took 20 minutes to assemble. Find the zscore for an item that took 27.5 minutes to assemble. ZScore Example x = 20, = 22.5 = 2.5 x 20 22.5
z = = 2.5 = 1.0 x = 27.5, = 22.5 = 2.5 x 27.5 22.5
z = = 2.5 = 2.0 Quartiles & Box Plots Quartiles 1. Measure of noncentral tendency 25% 25% 25% 25% Q 1 Q 2 Q 3 2. Split ordered data into 4 quarters Positioning Point of Q i n i = + 1 4 ( ) 3. Position of i-th quartile Quartile (Q 1 ) Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7 Ordered: 4.9 6.3 7.7 8.9 10.3 11.7 Position: 1 2 3 4 5 6 Q Position Q 1 = + = + = ~ = 1 1 4 1 6 1 4 1 75 2 6 3 1 n ( ) ( ) . . Quartile (Q 2 ) Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7 Ordered: 4.9 6.3 7.7 8.9 10.3 11.7 Position: 1 2 3 4 5 6 Q Position Q 2 = + = + = = + = 2 1 4 2 6 1 4 3 5 7 7 8 9 2 8 3 2 n ( ) ( ) . . . . Quartile (Q 3 ) Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7 Ordered: 4.9 6.3 7.7 8.9 10.3 11.7 Position: 1 2 3 4 5 6 Q Position Q 3 = + = + = ~ = 3 1 4 3 6 1 4 5 25 5 10 3 3 n ( ) ( ) . . Numerical Data Properties & Measures Mean Median Mode Range Interquartile Range Variance Standard Deviation Skew Numerical Data Properties Central Tendency Variation Shape Interquartile Range 1. Measure of dispersion 2. Also called midspread 3. Difference between third & first quartiles Interquartile Range = Q 3 Q 1
4. Spread in middle 50% 5. Not affected by extreme values Thinking Challenge Youre a financial analyst for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11. What are the quartiles, Q 1 and Q 3, and the interquartile
Raw Data: 17 16 21 18 13 16 12 11 Ordered: 11 12 13 16 16 17 18 21 Position: 1 2 3 4 5 6 7 8 Q Position Q 3 = + = + = ~ = 3 1 4 3 8 1 4 6 75 7 18 3 n ( ) ( ) . Interquartile Range Solution* Interquartile Range Raw Data: 17 16 21 18 13 16 12 11 Ordered: 11 12 13 16 16 17 18 21 Position: 1 2 3 4 5 6 7 8 Interquartile Range = = = Q Q 3 1 18 0 13.0 5 . Box Plot 1. Graphical display of data using 5-number summary Median 4 6 8 10 12 Q 3 Q 1 X largest X smallest Shape & Box Plot Right-Skewed Left-Skewed Symmetric Q 1 Median Q 3 Q 1 Median Q 3 Q 1 Median Q 3 Graphing Bivariate Relationships Graphing Bivariate Relationships Describes a relationship between two quantitative variables Plot the data in a Scattergram Positive relationship Negative relationship No relationship x x x y y y Scattergram Example Youre a marketing analyst for Hasbro Toys. You gather the following data: Ad $ (x) Sales (Units) (y) 1 1 2 1 3 2 4 2 5 4 Draw a scattergram of the data Scattergram Example 0 1 2 3 4 0 1 2 3 4 5 Sales Advertising Time Series Plot Time Series Plot Used to graphically display data produced over time Shows trends and changes in the data over time Time recorded on the horizontal axis Measurements recorded on the vertical axis Points connected by straight lines Time Series Plot Example The following data shows the average retail price of regular gasoline in New York City for 8 weeks in 2006. Draw a time series plot for this data.
Date Average Price Oct 16, 2006 $2.219 Oct 23, 2006 $2.173 Oct 30, 2006 $2.177 Nov 6, 2006 $2.158 Nov 13, 2006 $2.185 Nov 20, 2006 $2.208 Nov 27, 2006 $2.236 Dec 4, 2006 $2.298 Time Series Plot Example 2.05 2.1 2.15 2.2 2.25 2.3 2.35 10/16 10/23 10/30 11/6 11/13 11/20 11/27 12/4 Date Price Distorting the Truth with Descriptive Techniques Errors in Presenting Data 1. Using chart junk 2. No relative basis in comparing data batches 3. Compressing the vertical axis 4. No zero point on the vertical axis Chart Junk Bad Presentation Good Presentation 1960: $1.00 1970: $1.60 1980: $3.10 1990: $3.80 Minimum Wage Minimum Wage 0 2 4 1960 1970 1980 1990 $ No Relative Basis Good Presentation As by Class As by Class Bad Presentation 0 100 200 300 FR SO JR SR Freq. 0% 10% 20% 30% FR SO JR SR % Compressing Vertical Axis Good Presentation Quarterly Sales Quarterly Sales Bad Presentation 0 25 50 Q1 Q2 Q3 Q4 $ 0 100 200 Q1 Q2 Q3 Q4 $ No Zero Point on Vertical Axis Good Presentation Monthly Sales Monthly Sales Bad Presentation 0 20 40 60 J M M J S N $ 36 39 42 45 J M M J S N $