Sie sind auf Seite 1von 44

A Statistical Journey: Taming of the Skew

A Tutorial Of Chapters 1 4
c. 2009 by Dr. Donald F. DeMoulin and Dr. William Allen Kritsonis
These Slides May Not Be Altered or Modified

Topics Of This Lesson


1) 2) 3) 4) 5) 6) 7)

8)

9)

10)

Introduction Review of the Basics Research Rules Statistical Symbols Statistical Terms Data Strength Measures of Central Tendency 1) Mean 2) Median 3) Mode Measures of Variability 1) Range 2) Variance 3) Standard Deviation Distribution Types Putting it all together

Introduction
Many wonder why a statistical concept is so hard to grasp?
Have you ever tried to understand a native from Italy, France, Russia, China, or Japan It is because they speak a foreign language and something that is unfamiliar to your vocabulary In this realm, statisticians, most of the time, speak in a foreign language; a language we will call Statonese (stat-n-eaze)
3

Introduction
For ExampleHave you ever heard
Four out of five dentists recommend Brand X toothpaste to help fight cavities Nine out of 10 doctors stranded on a desert island recommend Brand Y aspirin for headache pain

Five out of six farmers reported significant increases in yield from using Brand Z fertilizer
These are just a few of the many thousands of examples for statistical applications

Introduction
But, have you ever thought:
1)

What makes up the four out of five doctorsare the four employed by Brand X toothpaste Who are the nine out of ten doctorsand why are they stranded on a desert island What constitutes a significant increase in yield
5

2)

3)

Introduction
Deciphering what the numbers are and gaining an understanding of statistical procedures and concepts in order to make a somewhat accurate, independent judgment of reports, statements, and claims is what statistics is all about
Our discussions will minimize the guesswork about statistics and maximize the WHEN, the WHY, and the HOWthe basis for statistical applications
6

Introduction
The goal of these Power Point slides is to bring statistical concepts, applications, and explanations to you in a language that can be understood of how statistical procedures are developed, analyzed, and interpreted

So, Let Us Begin!


7

Review of the Basics

A variable is usually represented by the symbol X


If you have two variables, they are usually associated with the symbols X and Y, although this is not set in stonecertainly using A, B, C, or D is also acceptable A subscript after each variable denotes the numbered variable (X1 X2 X3 X4 X5 ) For example, let us have two variables, X and Y, that represent two different turtle races
8

Review of the Basics


A hypothetical data set may look something like this:
* * * * Time for a turtle to complete a one-inch race Number = five turtles for each race Variable X = participants in race one Variable Y = participants in race two Participant in Race one Time (in seconds) Participant in Race two Time (in seconds) X1 45 Y1 25 X2 41 Y2 36 X3 30 Y3 56 X4 28 Y4 51 X5 59 Y5 43

Review of the Basics


Participant in Race one Time (in seconds) Participant in Race two Time (in seconds) X1 45 Y1 25 X2 41 Y2 36 X3 30 Y3 56 X4 28 Y4 51 X5 59 Y5 43

What is the time logged for participating turtle X4? 28 seconds

Which participating turtle in race two logged a whopping 51 seconds for completing the race?
Turtle Y4
10

Review of the Basics


The Greek symbol means sum or to sum a set of numbers following it

So, X would simply mean to sum all values of the variable X _ X_______ X1 = 45 X2 = 41 X3 = 30 X4 = 28 X5 = 59 X = 203

11

Review of the Basics


The notation X simply sums all the squared values of X
For example, the hypothetical data set for race one would be: X X 45 x 45 = 2025 41 x 41 = 1681 30 x 30 = 900 28 x 28 = 784 59 x 59 = 3481 X = 203 X= 8,871
The X = 45 + 41 + 30 + 28 + 59 = 203 X is equivalent to (45) + (41) + (30) + (28) + (59) The X = 2,025 + 1,681 + 900 + 784 + 3,481 = 8,871

12

Review of the Basics


How about the (X)
This identifier means to sum all the values of X and square the answer
X 45 The (X) would be 45 + 41 + 30 + 28 + 59 = 203 41 30 28 Now, square the value 203 and you get 41,209 59 X= 203 = (X) = (203) = 41,209 13

Review of the Basics


Please, remember that X does not equal (X)
X = 8,871 (X) = 41,209
X 45 x 45 = 41 x 41 = 30 x 30 = 28 x 28 = 59 x 59 = X = 203 X 2025 1681 900 784 3481 (203) = 41,209

X= 8,871

14

Review of the Basics


Finally, we have XY
This notation signifies a summing of products of corresponding values X and Y (cross products) X 45 41 30 28 59 X = 203 Y 25 36 56 51 43 Y = 211 XY_____ 1,125 1,476 1,680 1,428 2,537 XY = 8,246

All we do is multiply X and Y and put them in column labeled XY Fill in the missing spaces and add the column XYAlgebraically, it would be (45)(25) + (41)(36) + (30)(56) + (28)(51) + (59)(43) = 1,125 + 1,476 + 1,680 + 1,428 + 2,537 = 8,246
15

Summary of the Basics


A variable is usually represented by the symbol X
The Greek symbol means to sum a set of numbers following it X would simply mean to sum all values of the variable X X simply sums all the squared values of X (X) means to sum all the values of X and square the answer XY signifies a summing of products of corresponding values X and Y (cross products)

Now we continue with the Rules of Research


16

Research Rules
There are two rules in research

Rule #1 is that credibility and believability are vital components in research In essence the researcher must be credible by conducting his/her research with integrity, honesty and within proper research etiquette This leads to the next component of Rule #1 where the results (data input, data analysis and statistical interpretation) must be believable which involves proper coding and the use of the appropriate statistical procedure for analysis Credibility and believability are the two critical aspects of any research for without them, the entire research process undertaken becomes an insignificant exercise
17

Research Rules

Now, Rule # 2 is simply

18

First learn Rule #1

19

Cowboy Proverb
These are critical rules in research because if you do not have credibility as a researcher, the results that are produced lack believability

And, as the ole Cowboy Proverb goes:


20

Cowboy Proverb

Dont dig for water

Under the Outhouse


21

Research Rules
In other words, dont expect believability and credibility with data that is polluted, tainted or contaminated
By following the rules of research, you maximize your credibility as a researcher and the believability of your results
22

Statistical Symbols

----Sigma-------mathematical notation meaning to add up


X----variable that can represent any score in the distribution ------mu----------symbol for the mean of a population

X-bar-----------symbol for the mean of a sample


2
^

or or

S2----------symbol for the variance of a population S----------symbol for the standard deviation of a population s2 ----symbol for the variance of a sample s------symbol for the standard deviation of a sample
The caret top ^ denotes sample

or or

three dots (...)---symbolic representation which literally mean "and so on"


23

Statistical Terms

Descriptive Statistics is taking raw data and describing


it in a meaningful way (to make sense out of data) generating a profile of that data set utilizing graphs, charts, and other picturesque techniques to help display and interpret the data

Inferential Statistics is taking the results from

descriptive procedures of the raw data and subjecting them to a higher order statistical procedure to reasonably infer results to a corresponding population by following certain rules and assumptions
24

Statistical Terms

Parametric statistics are concerned about a parameter


of a given population, hence inferences can be made from the resulting analysis to the population of concern

Non-parametric statistics, on the other hand, do not

conform to any stringent assumptions, and therefore have more latitude in proceduresbecause stringent assumptions are not strictly adhered to, we cannot confidently generalize the results to a population

25

Statistical Terms

A variable is defined as a property of an event or item that can be changed or can take on different values A dependent variable is called the measured, outcome, or criterion variable An independent variable is the variable that is changed, altered, or manipulated by the experimenter during research
26

Statistical Terms

A qualitative variable refers to nonnumerical qualities, attributes, items such as gender, eye color, etc. A quantitative variable is concerned with numerical qualities such as the number of items falling into various categories or measurable data
27

Data Strength

Data are considered nominal strength if the assignment of numbers to objects does no more than identify the objects

An example of this would be a football jersey to identify a player on the field

Data considered ordinal strength contain elements of the nominal scale of measurement plus the inclusion of an ordering of objects thereby implying magnitudecontaining objects that are labeled, but also objects that are ranked in accordance to importance

Military rank would be an example of ordinal data or lining up people according height with 1 being the smallest to 10 being the tallest

28

Data Strength

Data considered Interval strength contain all the elements of nominal and ordinal scales (labeling and ordering) plus equal intervals between each item

A thermometer would be an example of interval strength data since the distance between 20 and 30 degrees is the same distance between 50 and 60 degreeshowever 60 degrees is not twice as warm as 30 degrees since we can have minus degrees in temperature

Data considered Ratio strength contain all elements of nominal, ordinal, and interval strength (labeling, ordering, equal distance between items) plus the inclusion of an absolute zero

Height and weight are examples of ratio strength data since there is no negative weight or height

29

Measures of Central Tendency

How Data Gathers Around the Center of a Data Set

30

Measures of Central Tendency


Mean, Median and Mode are located at the exact same place on a normal distribution

Mean Median Mode


Exact Middle of a Data SetData Must Be Ranked From High to Low or Low to High

mean absolute deviation


Most Frequently Occurring Data Point [(X X-bar) = 0] The sum of the deviation from the mean must equal zero 31

Measures of Central Tendency


Group 1 (X) Score (X X-bar) 72 - 75 = -3 73 - 75 = -2 76 - 75 = 1 76 - 75 = 1 78 - 75 = 3 X = 375 [(X X-bar) = 0] N=5 X-bar = 75 (375/5) Median = 76 Mode = 76 (two scores of 76) Group 2 (Y) . Score (Y X-bar) 67 - 75 = -8 72 - 75 = -3 76 - 75 = 1 76 - 75 = 1 84 - 75 = 9 . Y = 375 [(Y X-bar) = 0] N=5 X-bar = 75 (375/5) Median = 76 Mode = 76 (two scores of 76)32

Measures of Variability

How Data is Dispersed Throughout a Data Set

33

Measures of Variability

Range

Difference Between Highest and Lowest Numbers in a Data Set

Variance

2 =

X - (X) n . n X - (X) n . n

Variance is the Square of the Standard Deviation

Standard Deviation

Standard Deviation is the Square Root of the Variance

34

Measures of Variability
Group 1 (X) 72 73 76 76 78 X 5,184 5,329 5,776 5,776 6,084 (Y) 67 72 76 76 84 Group 2 Y 4,489 5,184 5,776 5,776 7,056

X = 375 X = 28,149 N=5

Y = 375 N=5

Y = 28,281

35

Measures of Variability
Group 1
2

X - (X) = n .
n 2 =

28,149 (375) 5 .

2 =

28,149 (140,625) 5 .

28,149 28,125 2 =

2 = 24 5

2 = 4.8

= 4.8 = 2.19

Group 2
2

Y - (Y) = n .
n

28,281 (375) 5 .

28,281 (140,625) 5 .

28,281 28,125 2 =

2 = 156

2 = 31.2

= 31.2 = 5.59

36

Distribution Types

Normal Skewed
37

Distribution Types
Also known as symmetrical, standard normal and z-normal distributions Normal Distribution
Right Half is Mirror Image of Left Half

Leptokurtic
High-Peaked

Mesokurtic
Middle Peaked

Platykurtic
Low-Peaked

Kurtosis is how a distribution is peaked

If a distribution is not symmetrical, then it is asymmetrical or skewed where the right half is not the mirror image of the left half
Tail points to the negative end of the number line

Tail points to the positive end of the number line

38

Distribution Types
The remaining .003 percent is considered outliers that do not conform to the standard normal population distribution above 3 standard deviations + or the mean
The remaining .003 percent is considered outliers that do not conform to the standard normal population distribution above 3 standard deviations + or the mean

Roughly 68% of all scores fall within one standard deviation + or the mean Roughly 95% of all scores fall within two standard deviations + or the mean Roughly 99.7% of all scores fall within three standard deviations + or the mean

39

Relation of Mean and Standard Deviation


99.7% of scores fall between 3 standard deviations plus or minus the mean or between 20 and 80 and between 44 and 56 in our examples

=0

=1

= 50

= 10
20 = 50 44 30 46 40 48 50 50 60 52 70 54 80 56
Range = 60(80 20 = 60)
Moving more towards platykurtic shape Range = 12 (56 - 44 = 12) Moving more towards leptokurtic shape

=2

The mean and standard deviation help determine the height (kurtosis) of a distribution through the variability of scores dispersed throughout the data set 40

Putting it all Together

41

Putting It All Together


Data Type
AssumptionsNormally distributed VariableHomogeneity of Variance Null hypothesis is trueat least interval strength data

Data Strength

Data Tests

Ratio

Parametric

Mean

Interval __________________________________________________________
Ordinal
Median

One Sample z-testOne Sample t-test Independent t-testDependent t-testANOVA Pearson CorrelationRepeated Measures ANOVA

Mann-Whitney UWilcoxon TSpearman Rho Kruskal-Wallis HFriedman ANOVA (ranks) Chi-Square Goodness-Of-FitChi Square Test of Independence

Non Parametric

Nominal
Mode

42

Descriptive Statistics
Computer-Generated Analysis

(X) 75 76 76 77 78

(Y) 62 75 76 76 85

43

Computer-Generated Results

Descriptive Statistics Column 1 Mean Std. Dev. Std. Error Count Minimum Maximum # Missing Variance Coef. Var. Range Sum Sum Squares 76.400 1.140 .510 5 75.000 78.000 0 1.300 .015 3.000 382.000 76.393 76.386 .272 -1.044 76.000 1.500 76.000 76.400 1.000

Column 2 74.800 8.228 3.680 5 62.000 85.000 0 67.700 .110 23.000 374.000 74.422 74.027 -.518 -.431 76.000 6.500 76.000 74.800 1.000

29190.000 28246.000

Skewness is an asymmetrical distributionif skewness is positive (negative), the data are skewed to the right (left)the larger the number, the greater the skew notice that Mean, Median and Mode are almost identical giving an almost perfect normal distribution Kurtosis refers to how peaked the distribution is when kurtosis = 3, it is a normal height distribution (Mesokurtic)Kurtosis > 3 is a high peaked distribution (Leptokurtic)Kurtosis < 3 is a low peaked distribution (Platykurtic)

Geom. Mean Harm. Mean Skew ness Kurtosis Median IQR Mode 10% Tr. Mean MAD

44

Das könnte Ihnen auch gefallen