Sie sind auf Seite 1von 20

BASIC BOX-PLOT

Applied Statistics and Computing Lab Indian School of Business

Applied Statistics and Computing Lab

Learning goals
What are the components of a basic box-plot? How is a basic box-plot constructed? How to interpret it? What are its salient features? What are its limitations? How is it related to Histogram? What is the effect of translation on a box-plot?
2

Applied Statistics and Computing Lab

Where does a box-plot fit in?


Exploratory Data-analytic tool for continuous data Visual display of certain important summary statistics

Applied Statistics and Computing Lab

Why Box-plot?
Useful in Studying Location Spread Distribution Symmetry Tail behaviour Skewness Useful in comparison of different batches of Data, or a batch of data with factors Useful to study information of observations at the tails Easy to compute and draw, yet informative User-friendly
4

Applied Statistics and Computing Lab

Constructing a Basic Box-plot


Data set 1. Suppose we have data on a batch (variable) 90, 41, 22, 135, 15, 72, 50, 26, 105 Step 1: Arrange the data in the increasing order: 15, 22, 26, 41, 50, 72, 90, 105, 135 Step 2: Get the Five-point Summary, consisting of (i) the Minimum, (ii) First quartile ( ), (iii) Median, (iv) Third quartile ( ) and (v) the Maximum For the above data, the Five-point Summary is: Minimum= 15 First Quartile= 26 Median= 50 Third Quartile= 90 Maximum= 135
5

Applied Statistics and Computing Lab

Constructing a Basic Box-plot (contd.)


Step 3: Draw a box of length equal to ( ). For now, we can choose the width as per convenience. The lower and upper hinges of the box represent the first and third quartiles. (In this case, the width is the vertical distance and hinges are the right and the left extremes)

Applied Statistics and Computing Lab

Constructing a Basic Box-plot (contd.)


Step 4: From the middle of the lower hinge draw a line (parallel to the lines corresponding to the length of the box) up to the minimum. Similarly draw a line from the middle of the upper hinge (parallel to the lines corresponding to the length of the box) up to the maximum. These lines are called the whiskers. Step 5: Draw a line at the median parallel to the hinges, dividing the box into two parts.

Applied Statistics and Computing Lab

Horizontal and vertical Box-plots


The box-plot we saw is a horizontal box-plot (here the scale is on the horizontal axis) One can also have a vertical plot (where the scale is on the vertical axis) There is no specific advantage of one over the other, in general

Applied Statistics and Computing Lab

What can we get from a basic Boxplot?


We can obviously get the position of the location measure, spread and where the middle 50% of observations are concentrated, in a visual display After all, a picture is worth a thousand words What else can we get ?
Applied Statistics and Computing Lab
9

What can we get from a basic Boxplot? (contd.)

Applied Statistics and Computing Lab

10 Visuals from Aczel A., Sounderpandian J. Complete business statistics

Interpretation of the basic box-plot

Thus the distribution of the Dataset 1 is right skewed.

Applied Statistics and Computing Lab

11

Basic Box-plot: Features and limitations


Features
Basic box-plot displays the location (median), intervals required for the first, second, third and fourth quarters of the data It visually shows where the middle 50% of data is located It tells us whether the data is symmetric or left skewed or right skewed

Limitations
We cannot get modal information We cannot identify unusual observations It is hard to identify the tail behaviour

Applied Statistics and Computing Lab

12

Histogram and box-plot


In histogram, the width of the interval is fixed and the height of the vertical bar is proportional to the (relative) frequency in that interval In a box-plot the relative frequency is fixed at 25% and the intervals correspond to the first, second, third and fourth 25% of the relative frequencies. More precisely,
The left and right whiskers correspond to the first and fourth 25% The part of the box from the first quartile to the median corresponds to the second 25%, and The part of the box from the median to the third quartile corresponds to the third 25%

Applied Statistics and Computing Lab

13

Scores dataset
Comprises 50 students scores in their second semester exam of the second course in Quantitative methods We also have their GPA of first semester exam and scores in 3 minors of the subject
= 60 = 20 = 1,2,3 = ( 10)
Applied Statistics and Computing Lab
14

Histogram and box-plot (contd.)


Histogram of scores Box-plot of scores

Applied Statistics and Computing Lab

15

Histogram and box-plot (contd.)


Box-plot readily gives the information on the 5-point summary and the information about skewness. However, it is not possible to get information about the mode(s) Histogram readily give information about mode(s). But it takes some effort to extract information about the 5-point summary Thus the two plots complement each other!

Applied Statistics and Computing Lab

16

Effect of translation
Consider the box-plot of a variable X. Suppose we translate from X to Y = aX + b where a is positive, the features of the box-plot do not change If we translate from X to Y = aX + b where a is negative, then the features of the box-plot would be the same as those of boxplot of (X) We shall demonstrate this by getting the box-plots of , (3 +5) and (-3 +5) for the Scores data set ( is the score in the 2nd minor)

Applied Statistics and Computing Lab

17

Effect of translation

Box-plot of
Applied Statistics and Computing Lab

Box-plot of (3 +5)

Box-plot of (-3 +5)


18

R-codes
The R-code for box-plot is boxplot(variable name)

Applied Statistics and Computing Lab

19

Thank you

Applied Statistics and Computing Lab

Das könnte Ihnen auch gefallen