0 views

Uploaded by cons the

Ft Mba Section 1 Descriptives Pt2 Sva

- Depression Anxiety Stress Scale 42
- hhp 334 eval of p
- Lecture 11 Fall 2017
- Six Sigma Template Kit
- DATA TABULASI SCREENSHOT.docx
- hw13 stat
- Chapter 05
- Stat Case 1 Century National Banks
- Measure of Validity
- Review
- 12 How to analyse rainfall data
- IJEP
- Dec. test E2+E3 held DEc.2014
- FINAL_Measure of Variability
- 10895402486715277616-1
- 12-Item Pruritus Severity Scale Development and Va
- Chapter 3 Mini Case
- Test
- ADJ COMP - Weights
- ASTM D 2332_EN_2008

You are on page 1of 29

Descriptive Statistics:

Measures of Dispersion & Skewness

Paul Bottomley

Bottomleypa@cf.ac.uk

Silver, pp.24-26, 50-68

Measures of Dispersion

Measures of central tendency say nothing about the extent to

which data values are similar to one another.

Basis of segmentation and vendor selection.

Example: Number of days lost due to hangovers by a group

of French wine tasters.

October: 4, 5, 5, 5, 5, 5, 6

November: 0, 1, 3, 5, 5, 9, 12

Index of diversity (nominal)

Quartiles (ordinal)

Range, IQR, variance and standard deviation (metric)

The Range

Range = difference between maximum and minimum.

Example: French wine tasters, Oct. = 2, Nov. = 12.

Easy to calculate and interpret.

Lacks power: based on only two data values;

Strongly influenced by extreme values (potential outliers).

Always compare data-sets of same size: larger sample

greater chance of selecting extreme values.

but are they really equally spread out?

A = {0, 48, 49, 51, 52, 100}, B = {0, 1, 1, 99, 99, 100}

Upper and Lower Quartiles

when placed in ascending order, instead of the mid point (Md).

Lower (Q1): 25% of the data lies below Q1, 75% above it.

Upper (Q3): 75% of the data lies below Q3, 25% above it.

Think of Q1 as the median of the lower half of the sorted data,

and Q3 as the median of the upper half.

Data in

75% ascending

25% IQR

order

Quartile Quartile

Inter-Quartile Range (IQR)

IQR is the difference between the upper and lower quartiles.

Range of the middle 50% of data values.

More reliable less influenced by outliers. But what about

the other 50% of data?

Metric variables only must be legitimate to add/subtract!

Lower quartile (Q1): value in the position (n + 1)/4.

Upper quartile (Q3): value in the position 3(n + 1)/4.

evenly spread between Xi and Xi+1.

Calculating Quartiles: B&0 Prices

Range: 1451 580 = 871

First find the positions of the lower and upper quartiles.

Q1: position = (n + 1)/4 = (8 + 1)/4 = 2.25

Q3: position = 3(n + 1)/4 = 3(8 + 1)/4 = 6.75

Q1: value = 757 + 0.25x(800 757) = 767.75

Q3: value = 1285 + 0.75x(1295 1285) = 1292.50

IQR: 1292.50 767.75 = 524.75. Recall: units = s

Variance and Standard Deviation

Variance and SD measure the spread of the data around the

mean. They use all data

_

values. Follow the steps below:

Calculate the mean (X).

Subtract the mean from each data value (deviations)

But: sum of deviations = 0; mean = center of gravity.

Square each deviation, then add them all up.

Divide by the number of data points (n).

_

(X X ) 2 _

s 2

i

SD

i

( X X ) 2

n n

SD is the square root of the average squared deviation from

the mean.

Standard Deviation: B&O Prices

(The Harder Way)

_ _

The variance is Xi Xi X (X i X )2

800 -231.38 53536.70

687969.84

85996.23 891 -140.38 19706.54

8 1295 263.62 69495.50

Units difficult to interpret. 1451 419.62 176080.94

Now it is measured in 2. 580 -451.38 203743.90

Solution: use the standard 1192 160.62 25798.78

deviation in units of s. 1285 253.62 64323.10

757 -274.38 75284.38

687969.84

293.25 8251 0 687969.84

8

Mean = 8251/8 = 1031.38

Standard Deviation: B&O Prices

(The Easier Way)

SD = (Mean of the Squares Price ( X i )2

minus Square of the Mean)

800 640000

X X

2

2

891 793881

SD

n 1295 1677025

n

1451 2105401

X2 = (X1)2 + (X2)2 ++ (Xn)2

580 336400

(X)2 = (X1 + X2 ++ Xn)2 1192 1420864

Formula looks more complex, but 1285 1651225

needs fewer calculations. 757 573049

8251 9197845

2

9197845 8251

SD = 293.25 (trust me!)

8 8

Interpreting the Standard Deviation

Q: B&Os prices have a SD of about 300. High or low?

Difficult to say with only one data series. Easier to think in

comparative terms but only if we have two (+) variables.

We can still make claims about the proportion of data values

we would expect to find within a certain number of standard

deviations from the mean.

Try to imagine / picture the histogram.

If YES, use Empirical Rule; NO, use Chebyshevs Rule.

Interpreting the Standard Deviation

YES, use Empirical Rule Histogram

Mean 2 SD contains 95% of data points

about 95% of all data

values.

Mean 3 SD contains

about 99% of all data

values.

_

-2SD X +2SD

NO, use Chebyshevs Rule

Mean 2 SD contains at least 75% of the data.

Mean 3 SD contains at least 89% of the data.

Interpreting the Standard Deviation

YES, use Empirical Rule Histogram

Mean 2 SD contains 99% of data points

about 95% of all data

values.

Mean 3 SD contains

about 99% of all data

values.

_

-3SD X +3SD

NO, use Chebyshevs Rule

Mean 2 SD contains at least 75% of the data.

Mean 3 SD contains at least 89% of the data.

Is it Reasonable to Assume the

Distribution is Bell-shaped?

YES: Empirical Rule

Within what price range would we expect to find 95% of all

data values?

Expect at least 75% of all data values within this range.

This rule can be applied to any data series regardless of

the shape of the distribution (see later).

It is a conservative estimate the minimum proportion.

SD: Population or Sample?

Samples are used when it is impossible or too expensive to

include every item / person of interest.

Because samples are less likely to include values at the

extremes of the distribution, we divide by n 1 rather than n

(harder way) or weight SD (easier way).

_

(Xi X ) X X

2

2 2

n

sn 1 SD

n 1 n n n 1

8

SD = 293.25 x = 293.25 x 1.069 = 313.48

7

Be careful: Excel commands STDEV or STDEVP.

To avoid confusion: we will treat data as a population!

Comparing Measures of Dispersion

Scale of Ordinal

measurement Metric Metric Metric Metric

Uses all data? No No No Yes

Unique? Yes Yes Yes Yes

Resistant to

outliers? Yes No Yes No

Relative Dispersion: Coefficient of Variation

With larger means we often find larger standard deviations

(height of men vs. women). Difficult to compare.

Coefficient of Variation (CV) = Std.Dev. / Mean.

Independent of units of measurement:

change units from s to pence or to $ has no effect.

Useful for comparing: (i) different variables, (ii) same variable

over time, (iii) international comparisons.

Brand Mean Std.Dev. CV

B&O 1031.38 293.25 0.28

Sony 680.23 278.32 0.41

JVC 445.78 150.44 0.34

Visualising Skewness (Shape)

Skewness measures the degree of 20

Md = Mean

symmetry of a distribution. 15

Frequency

Histogram: useful graphical way to 10

plot frequency of values against a 5

numerical scale. 0

Q: What is the relative position 1 2 3 4 5 6 7 8 9 10 11 12 13

Md 15

Md

15

Frequency

10

Frequency

10

5 5

0 0

1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13

Measuring Skewness

If the distribution is truly symmetrical, mean = median. Each

half is a mirror image of the other.

When data are positively skewed (income), mean is greater

than median; when data are negatively skewed (easy exam),

mean is less than median.

of Skewness Standard Deviation

Values outside the range -1 to +1 are highly skewed.

Treat data with care robust statistics.

Measures of Skewness (2)

Pearsons Skewness: 3(Mean - Md) / Std.Dev

B&O Prices: 3(1031.38 - 1041.50) / 293.25 = -0.104

Very mild case of negative skew (bottom right figure).

SD meaningful; each half of the data has same properties.

Focuses on middle 50% of data values; more robust.

If the upper (lower) quartile is further from the median than the

lower (upper) quartile, then positive (negative) skew.

Range: -1 to +1. Zero is symmetrical;

Values outside the range -0.5 to 0.5, are highly skewed. Treat

data with care.

(1292.50 - 767.75) = [251.00 273.75] / 524.75 = -0.043

Summarizing Metric Data

Central Measures of

Tendency Dispersion

Deviation Range

Skewed: Rule Rule

Not skewed: 95% 2SD 75% 2SD

Summary Measures Summarized

Measure Nominal Ordinal Metric

Examples Do you own a car? How often do you What is your date of

Is your natural hair buy, daily, weekly? birth? What is your

colour? What level are you annual income?

in the firm?

Central Mode Mode Mode

Tendency Median Median

Mean

Spread / Index of Quartiles Range

Dispersion Diversity IQR

Std. Dev.

Skewness Bowleys

Pearsons

Increasing Power

purpose of the measure, different measures (complementary views).

Tukeys Box-and-Whiskers Plot

Graphical device for integrating measures of central tendency,

dispersion and skewness. (Variant of 5-number summary).

First draw a thin rectangle from lower to upper quartile and

mark the median as a line that crosses this box.

The box contains the middle 50% of data points (IQR).

Whiskers are the max. and min. values within the upper fence

= Q3 + (1.5*IQR) and lower fence Q1 - (1.5*IQR).

Values beyond the fences are potential outliers.

Box Outlier

Whisker

*

Q1-(1.5xIQR) Q1 Median Q3 Q3+(1.5xIQR)

Tukeys Box-and-Whiskers Plot

Graphical device for integrating measures of central tendency,

dispersion and skewness. (Variant of 5-number summary).

First draw a thin rectangle from lower to upper quartile and

mark the median as a line that crosses this box.

The box contains the middle 50% of data points (IQR).

Whiskers are the max. and min. values within the upper fence

= Q3 + (1.5*IQR) and lower fence Q1 - (1.5*IQR).

Values beyond the fences are potential outliers.

25% 25%

25% 25%

Example: TV Prices Dataset#1

301.81 756.66 460.00 150.19 635.30 239.99 904.05 206.60

417.82 882.05 176.97 466.69 259.47 478.90 173.66 333.79

673.69 1216.95 579.74 429.06 195.98 352.33 222.56 334.46

444.27 237.47 386.64 590.54 158.33 423.23 456.85

Note: prices rounded to s in interest of simplification.

240 259 302 334 334 352 387 418

423 429 444 457 460 467 479 580

591 635 674 757 882 904 1217

Building the Box Plot

Find the position and value of the median (Q2), lower (Q1)

and upper (Q3) quartiles.

Easy with 31 data points. Position of median is (n + 1)/2 =

(31 + 1)/2 = 16th data point, namely 418.

Positions of the lower and upper quartiles are the 8th and

24th values, namely 237 and 580.

Next, find the inter-quartile range (IQR) = Q3 Q1 =

580 237 = 343 (range, middle 50% of data).

0 200 400 600 800 1000 1200 1400

Lower Upper

Quartile Median Quartile

Building the Box Plot Cont.

Whiskers are the max. and min. data values between the

upper and lower fences (not always shown but should be!)

Upper: Q3+ (1.5xIQR) = 580 +(1.5 x 343) = 1094.5

Lower: Q1 - (1.5xIQR) = 237 - (1.5 x 343) = -277.5 0.

Most expensive television (1217) is greater than the upper

fence, it is a possible outlier (*).

2nd most expensive TV is not an outlier = 904

0 200 400 600 800 1000 1200 1400

*

Outlier

Cheapest TV Most expensive TV

Upper

inside the fence.

Fence

SPSS Summary Statistics:

Price of Selected Televisions

Statistic

Mean 436.97

Median 417.82

Mode 150.19a

Variance 62785.59

Std. Deviation(pop) 246.50

Minimum 150.19

Maximum 1216.95

Range 1066.76

Interquartile Range 342.27

Percentiles 25 237.47

Percentiles 50 417.82

Percentiles 75 579.74

Television Monthly Sales (Dataset#1)

Upper

fence

i

0 500 1000 1500

2000

= 836 + 1.5*(836 88) = 836 + 1122 = 1958

Childhood Consumerism:

Comparing Younger and Older Children

4.5

4.0 511

282

526

543

140

84

482

497

3.5

3.0

2.5

2.0

1.5 530

423

309

157

426

1.0

N= 261 296

Junior senior

- Depression Anxiety Stress Scale 42Uploaded byIcus Florence Nightingale
- hhp 334 eval of pUploaded byapi-409430347
- Lecture 11 Fall 2017Uploaded byAaron Hayyat
- Six Sigma Template KitUploaded byAnonymous 3tOWlL6L0U
- DATA TABULASI SCREENSHOT.docxUploaded bynola
- hw13 statUploaded bycincinmindy
- Chapter 05Uploaded bymesakon
- Stat Case 1 Century National BanksUploaded bySightless92
- Measure of ValidityUploaded byFrancis Andru Reyes Dimaandal
- ReviewUploaded byAra Leann Eslava Laranang
- 12 How to analyse rainfall dataUploaded bySagar Jss
- IJEPUploaded bySoedirboy99
- Dec. test E2+E3 held DEc.2014Uploaded byfazalulbasit9796
- FINAL_Measure of VariabilityUploaded byRenedick Capili
- 10895402486715277616-1Uploaded byZhou Yunxiu
- 12-Item Pruritus Severity Scale Development and VaUploaded byrizwan pamungkas
- Chapter 3 Mini CaseUploaded byStephen Magudha
- TestUploaded byShinohara
- ADJ COMP - WeightsUploaded byMuhammad Shadiq
- ASTM D 2332_EN_2008Uploaded byEbe Ami
- Expected valueUploaded bydewpeak
- Food quality surveyUploaded byPrashant Tripathi
- Excel Data TerkumpulUploaded byTengku Asiah
- Beyer Bommer 2006 - Relationships between median values.pdfUploaded byVicente Suarez
- Jurnal BadmintonUploaded byAju Al-Ghifary
- W3Stream_ Very Important Repeated Quantitative Aptitude Questions With AnswersUploaded bymehul kapadia
- PMP Course Material 1 EZ_Dump.pdfUploaded byAmritanshu Ranjan
- 8812_Ciccone_PeriUploaded byRAJAT SHARMA
- Chapter 9 Chapter 11Uploaded byNdomadu
- art%3A10.1007%2Fs00348-013-1588-1Uploaded byசோ. இளமுகில்

- Financial and Managerial Accounting (74)Uploaded bycons the
- Financial and Managerial Accounting (87)Uploaded bycons the
- Financial and Managerial Accounting (75)Uploaded bycons the
- Financial and Managerial Accounting (80)Uploaded bycons the
- Financial and Managerial Accounting (78)Uploaded bycons the
- Financial and Managerial Accounting (85)Uploaded bycons the
- Financial and Managerial Accounting (77)Uploaded bycons the
- Financial and Managerial Accounting (76)Uploaded bycons the
- Financial and Managerial Accounting (79)Uploaded bycons the
- Financial and Managerial Accounting (73)Uploaded bycons the
- Financial and Managerial Accounting (82)Uploaded bycons the
- Financial and Managerial Accounting (72)Uploaded bycons the
- Financial and Managerial Accounting (81)Uploaded bycons the
- Financial and Managerial Accounting (77).pdfUploaded bycons the
- Financial and Managerial Accounting (83)Uploaded bycons the
- Financial and Managerial Accounting (56)Uploaded bycons the
- Financial and Managerial Accounting (63).pdfUploaded bycons the
- Financial and Managerial Accounting (65).pdfUploaded bycons the
- Financial and Managerial Accounting (69)Uploaded bycons the
- Financial and Managerial Accounting (64).pdfUploaded bycons the
- Financial and Managerial Accounting (61)Uploaded bycons the
- Financial and Managerial Accounting (71)Uploaded bycons the
- Financial and Managerial Accounting (57)Uploaded bycons the
- Financial and Managerial Accounting (60)Uploaded bycons the
- Financial and Managerial Accounting (58)Uploaded bycons the
- Financial and Managerial Accounting (59).pdfUploaded bycons the
- Financial and Managerial Accounting (68)Uploaded bycons the
- Financial and Managerial Accounting (62)Uploaded bycons the
- Financial and Managerial Accounting (70)Uploaded bycons the
- Financial and Managerial Accounting (67)Uploaded bycons the

- exam2solUploaded byHarry Ta
- Statistics Science (STAT)Uploaded bypen2trinity3200
- Perhitungan SPSSUploaded byFarisa Rahma
- AJC JC 2 H2 Maths 2011 Mid Year Exam Question Paper 2Uploaded byjimmytanlimlong
- 10.1.1.462.5462Uploaded byWaqas Ayub
- DP Statistical ForecastingUploaded byPramod Shetty
- axdifUploaded byPeter Newman
- Understanding Z ScoresUploaded bynaveenrulez
- 9709_w12_qp_73Uploaded byAnonymous F8kQ7L4
- S1 CompilationUploaded byAngel Wasi
- PHStat2 Users GuideUploaded byOmar Ahmed Elkhalil
- geostatikUploaded byRachid Lhissou
- Part 4C (Quantitative Methods for Decision Analysis) 354.docUploaded byOeln Cainglet
- Non Parametric MethodUploaded byKyai Mbethik
- Learning Statistics With RUploaded byDaniel
- Applied Time Series AnalysisUploaded bygenlovesmusic09
- Regression WorksheetUploaded byRocket Fire
- Homework 1 SolutionsUploaded byVan Thu Nguyen
- Mediation in Process. Andy FieldUploaded byCocia Podina Ioana Roxana
- CfaUploaded byFerry Bachara
- Hypothesis testing (statistics)Uploaded byMich Salvatorē
- SAS for Monte Carlo Studies a Guide for Quantitative ResearchersUploaded byoriontherecluse
- Carver - The Case Against Statistical Significance TestingUploaded bymauberley
- Assignment SPSS STA 411-Z Fall 2017Uploaded byrameez
- 02 StatisticsUploaded byTamer Fathy
- Final Assingment of QTM.Uploaded byMano_Bili89
- Lecture Notes in Statistics 145 Chapter 3 Part 3Uploaded byEve Yap
- correlationalresearchdesign-111125101237-phpapp01Uploaded byJapeth Purisima
- ch09Uploaded bySaied Aly Salamah
- CRI StatisticalModeling Methods Emrbots.orgUploaded byUK