Sie sind auf Seite 1von 43

Summarizing and Visualizing a Data Set

Arun Kumar, Ravindra Gokhale, and Nagarajan


Krishnamurthy

Quantitative Techniques-I, Term I, 2012


Indian Institute of Management Indore
Cafe Data
This data was collected at a student run Cafe in the College of
Business at Indiana State University.
Type of Data

Quantitative Data: Data for which arithmetic operation makes


sense. Ex: Age, Salary, Length.

Categorical Data: Data obtained by putting individuals in


different categories. Ex: Gender, States of a country
Visualization

Quantitative Data: Histogram, Stem-Leaf plot, Box plot

Categorical Data: Pie Chart, Bar chart

*Discuss Cafe data


Interpreting a Histogram

Shape: symmetric, skewed, unimodal, bimodal


Center: mean, median
Spread: range, standard deviation, inter-quartile range
Measure of the central tendency of a data set

Mean: If we have a data set x1 , . . . , xn then mean of the data


set is x1 ++x
n
n
.

Notation: x
Mean: Example

The mean of 0,5,1,1,3 is 2.


Measure of the Central Tendency of a Data Set

Median: Middle number in a sorted data set. When the


number of observations (sample size) is an even number then
there are two middle numbers. In that case, we take average
of the two middle numbers to obtain the median.

Notations: x
Median: Example 1

For example the median of 0,5,1,1,3 is 1 because 1 is the


middle number of the sorted data i.e. 0,1,1,3,5.
Median: Example 2

The median of 3,2,5,6,4,5,3,5 is 4.5 because 4.5 is the average


of the two middle numbers of the sorted data i.e.
2,3,3,4,5,5,5,6.
Measure of the Central Tendency of a Data Set

Mode: Observation in the data set with the largest frequency.


Note that we can have more than one mode for a data set.
Mode: Example

For example the mode of 0,5,1,1,3 is 1.


Effect of an Outlier

Calculate mean, median, and mode of 0,5,1,1,3,100.


Effect of an Outlier

Calculate mean, median, and mode of 0,5,1,1,3,100.

mean=18.33, median=2, mode=1.


Effect of an Outlier

Outlier pulls mean towards it but may not affect median and
mode.
Unit of Mean, Median, and Mode

Mean, Median, and Mode has the same unit as the data.
Identifying Relation Between Mean and Median
from Histogram

Symmetric: mean median

Left skewed: mean < median (in general)


Identifying Relation Between Mean and Median
from Histogram

Right skewed: mean > median (in general)


Measure of the Spread of a Data Set

Range: max-min

Ex: 0,5,1,1,3; what is the range?


Measure of the Spread of a Data Set

Range: max-min

Ex: 0,5,1,1,3; what is the range?


Range = 5 0 = 5.
Measure of the Spread of a Data Set

Pn 2
i=1 (xi x)
Variance: n1

q Pn
2
i=1 (xi x)
Standard deviation: n1
Variance and Standard Deviation: Example

What is the variance and the standard deviation of 3,3,3,3,3?


Variance and Standard Deviation: Example

What is the variance and the standard deviation of 3,3,3,3,3?


Ans. variance=0 standard deviation=0

What is the variance and the standard deviation of 1,2,3,4,5?


Variance and Standard Deviation: Example

What is the variance and the standard deviation of 3,3,3,3,3?


Ans. variance=0 standard deviation=0

What is the variance and the standard deviation of 1,2,3,4,5?


Ans. variance=2.49 standard deviation=1.58
Unit of Variance and Standard Deviation

Standard deviation has the same unit as the data but the unit
of variance is the square of the unit of the data.
Standard Deviation

Standard deviation is always greater than or equal to zero.


Does Standard Deviation Gets Affected by
Outliers?

What is the standard deviation for the data 3,3,3,3,100?


Does Standard Deviation Gets Affected by
Outliers?

What is the standard deviation for the data 3,3,3,3,100?

Ans. 43.38
Is Standard Deviation Always a Good Measure of
the Spread of a Data Set?
Is Standard Deviation Always a Good Measure of
the Spread of a Data Set?

Not a good measure when data is skewed or have outliers.


Quartiles

First quartile: 25th percentile

Notation: Q1
Quartiles

Third quartile: 75th percentile

Notation: Q3
Exercise

Find the first and third quartile of 8,7,1,4,6,6,4,5,7,6,3,0.


Exercise

Find the first and third quartile of 8,7,1,4,6,6,4,5,7,6,3,0.

Ans. The sorted data is 0,1,3,4,4,5,6,6,6,7,7,8. The median of


the red half of the data is 3.5 (Q1) and the median of the blue
half of the data is 6.5 (Q3).
Quartiles

Median is the second quartile (Q2 ).


Measure of the Spread of a Data Set

Inter Quartile Range (IQR): Q3 Q1

*IQR is a robust measure of spread. IQR does not get affected


much by skewness or outliers.
Exercise

Find IQR of 8,7,1,4,6,6,4,5,7,6,3,0.


Exercise

Find IQR of 8,7,1,4,6,6,4,5,7,6,3,0.

Q3-Q1=6.5-3.5=3.
Five Number Summary

Minimum
First quartile
Median
Third quartile
Maximum
Boxplot

*We will create a box plot for the Cafe data set.
Interpreting a Box Plot

Shape:

Outliers: Any observation not in the range


[Q1 1.5 IQR, Q3 + 1.5 IQR] is considered an outlier
(Informal Rule).
Why Do We Need Box Plot?

To compare two or more data sets.


Visualization of summary statistics.
Categorical Data Visualization

*Bar Chart

*Pie Chart

Show billionaires data.

Das könnte Ihnen auch gefallen