You are on page 1of 50

Statistics Tutorial Instructions

Welcome to this statistics tutorial. This tutorial teaches you about a number of very useful statistical techniques based on a working example using real data supplied by your tutor. You could use these pages as reference, to look up a single topic if you wanted, but they are designed to be worked through in order for a full understanding of the topics. Each topic is divided into three levels and each level has three sections:

Level 1 is designed to help you understand the statistics that you see reported in journal papers or that your statistics software package gives you. At this level, you will not calculate any statistics yourself, and there are no formulae. The emphasis is on understanding what the statistics measure, how to interpret them, and when to use them. Level 2 is about calculating the statistics. Here you will see the formulae for doing the calculations Level 3 covers the theory behind the techniques. Understanding the theory will give you a good intuitive feel for what the statistical measures that we cover are doing.

You can stay at one level, or switch up or down a level at any time. You are currently at level 2. Each level of each topic is split into three sections, all shown on the same page. The sections are organised as they are below on this page. Move between topics and levels using the navigation box below. It appears at the top of every page in the tutorial. There are also next page and previous page links at the bottom of each page, which you can use to follow the topics in their suggested order.

Tutorial Navigation General Instructions | Introduction to Your Study | Experimental Design | Stating a Hypothesis Histograms | Central Tendency | Standard Deviation | Confidence Descriptive Statistics Intervals Comparing Two Samples and Populations | Choosing a T-Test | Paired T-Test | PSamples Values and T-Tables Important Concepts The Normal Distribution | Z Scores | Probability Distributions You are currently on Instructions at level 2. Level 1 | Level 2 | Level Levels 3 Next Topic Start over | Introduction to Your Study Getting Started

Explanation
The Explanation section describes the concepts behind a technique or statistical measure. This section is designed to give you the facts much like a textbook would. At the second level, it explains in words how the statistic is calculated

Extra topics Click the question mark for details on extra topics

Exploration
The Exploration section allows you to explore the concepts explained above for yourself. At the second level, you are shown the formula or procedure for calculating the statistic. Hover over any part of the formula to see an explanation of what it means. For example, here is the formula for calculating the mean of a set of numbers.

Try it now with the formula above.

Application
This tutorial is based on a set of data and an experiment taken from your own field of study. The Application or Interpretation section uses this data, and the study that generated it, to allow you to try out each statistical technique on real data. At the second level, you must calculate the statistics for that data yourself. You will be asked to answer questions and be given feedback as you go. The questions look like this: What is 1+1? The flat line next to the box for the answer shows that you have not yet answered the question. When you get the answer right, it turns into a tick and if you get the answer wrong, it becomes a cross. When you have typed an answer, click anywhere on the screen or press the TAB key (don't press the Enter key) to have your answer checked. So, the question above is 'What is

1+1?'. Try typing a wrong answer and then a right answer into the box. The question mark in a circle is the help button. Hover over it to get a clue about answering the question.

The Study
Tutorial Navigation General Instructions | Introduction to Your Study | Experimental Design | Stating a Hypothesis Descriptive Histograms | Central Tendency | Standard Deviation | Confidence Statistics Intervals Comparing Two Samples and Populations | Choosing a T-Test | Paired T-Test | PSamples Values and T-Tables Important Concepts The Normal Distribution | Z Scores | Probability Distributions You are currently on Introduction to Your Study at level 2. Level Levels 1 | Level 2 | Level 3 Next Topic Instructions | Experimental Design Getting Started

Explanation
Introduction to Your Study Throughout this tutorial, we will use real data from a real experiment to illustrate the topics that you will learn about. We will use the vocabulary of statistics, which can be confusing if you haven't seen it before, so here is an introduction to the study you will be working through and the words that are used to describe it. The Data Statistics are designed to help us understand things we observe in the world around us. To use statistics, we have to measure things in the real world and so produce data. Data can be expressed as words or numbers, and are plural - so you say "Here are my data." So that we know which aspects of the data we are talking about, we use the following words:

To generate data we take measurements or make observations of specific qualities of things;

The things we are measuring are called the experimental units of the study. They might be referred to as 'people' or 'soil samples', whatever is being measured, but in this study, they are referred to as People; The qualities that we measure, or observe, are called variables. So if you measured a piece of string, the string would be the experimental unit and 'length' would be the variable; All variables take a range of values - the variable 'length' might take the values 3 or 10.5, for example. Generally, one measurement of a variable from a single experimental unit will produce a single value. If we say "Length = 5" then 'length' is the variable and '5' is the value.

Your study measured one variable: , in two samples. The Study Experiments often compare one set of measurements with another. The two sets of measurements could differ because they were each taken from different groups of experimental units, or because they were taken from the same experimental units under two different conditions. Each set of measurements is called a sample. Your study splits the People into two samples:

The Pre-Test sample The Post-Test sample

Such studies always have two variables, each with its own role to play. They are:

The independent variable discriminates between the two samples. Your independent variable is and it can take one of two values: Pre-Test or Post-Test. The dependent variable is whatever you are measuring in each sample. In the case of this study, the dependent variable is , so you expect to differdepending on (whether it is Pre-Test or Post-Test).

Such studies have an idea they wish to test. The idea is called the hypothesis. Each study actually has two versions of the hypothesis - one that says there is a difference between the two samples (the experimental hypothesis) and one that says that there is no difference (this is called the null hypothesis).

The experimental hypothesis in this study is post-training test scores will be significantly higher than pre-training test scores.. The null hypothesis in this study is there is no difference in pre/post test scores.

There is a page later in this tutorial that explains hypotheses in full. Symbols The list below shows you the symbols used in this tutorial to represent certain statistical measures. If the words are unfamiliar to you just now, don't worry this tutorial will make

everything clear. You can refer back to this page at any point if you need to look up a word or a symbol.

x = The sample mean - note the bar over the X. You can say 'the mean of X' or just 'X bar' when reading this. = The population mean (pronounced mew) S2 = The sample variance (say S squared) = The population variance (pronounced sigma) S = The sample standard deviation 2 = The population standard deviation (sigma squared)

You'll notice that population statistics are referred to using greek symbols and sample statistics use letter from the roman alphabet.

Exploration
Here is the data from your study. Hover over the hightlighted parts of the table to find out how they relate to the description above. The table below shows for the both the pre-test and the post-test samples.

Pre-Test Post-Test 20 11 16 22 18 19 9 13 16 18 13 20 29

20 30 23 14 9 27 14 28 16 24 7 20 14 6 18 25 22 5 17 20 13 16 13 13 11 10 20 26 23 20

5 9 8 5 9 6 14 5 7 21 6 17 4 22 5 27 12

Application
Let's look at your data now and check that you have understood the concepts described above. We are measuring in your data. Which of the words in the box to the right best describes the role of ? can be either pre-test or

post-test. What do these two words refer to? Which of these is your null hypothesis? Which variable is the independent variable? Which variable is the dependent variable? Imagine that you measured from a Person in the Pre-Test sample and got a value of 22 . Which of these words best describes ? Which of these words best describes Pr e-Test? Which of these words best describes 22 ?

Experimental Design
Tutorial Navigation General Instructions | Introduction to Your Study | Experimental Design | Stating a Hypothesis Histograms | Central Tendency | Standard Deviation | Confidence Descriptive Statistics Intervals Comparing Two Samples and Populations | Choosing a T-Test | Paired T-Test | PSamples Values and T-Tables Important Concepts The Normal Distribution | Z Scores | Probability Distributions You are currently on Experimental Design at level 2. Level 1 | Level Levels 2 | Level 3 Next Topic Introduction to Your Study | Stating a Hypothesis Getting Started

Explanation
Choosing Your Experimental Design Before you start an experiment, it is a good idea to put some thought into which type of experimental design you will use. Both have advantages and disadvantages and sometimes you are forced to choose one over the other. Here are the usual scenarios in which the choice of experimental design is easy:

When the dependent variable cannot be manipulated, you probably need an independent design A good example of this point would be an experiment that compares males with females. You cannot test the same people under each condition (male and female) as you cannot change their gender, so unless you are pairing them in some other way (twins, for example) you must compare one group of males with a group of different females. When you cannot repeat a measurement on the same experimental unit, you probably need an independent design When using a paired design, you often test (or measure) the same experimental units twice. If the first test is likely to have an effect on the second test or if the unit can be tested only once, then the second, repeated test is not possible. Crash testing cars would fall into this category as you cannot crash test the very same car under two different conditions - the first crash rules out the second test! If you need to test the same experimental unit under different conditions, you need a paired design

There are some other considerations when choosing an experimental design. The main ones

are:

It is easier to control for confounding variables with a paired design. Anything other than the independent variable that might affect your measurements is referred to as a confounding variable. There is always a risk that an experiment might produce a difference that is due to something other than the independent variable. For example, if you wanted to compare the heights of males and females but you chose one group from a basketball team and the other from a group of jockeys, you might find a difference, but would it be due to gender? By measuring the same subjects twice, you reduce the risk of introducing confounding variables. If subjects are hard to find, then testing the same ones twice obviously doubles the quantity of data you can collect. It has the disadvantage of requiring each subject to return to produce the second sample, which can increase the risk of you having to throw away any data from the first sample that does not have a paired value in the second. There may be ethical issues which either prevent you from manipulating the independent variable or dictate that subjects must be tested under both conditions. If you are testing people, you may find that giving them the same test twice under different conditions allows them to practice the task in the first condition, which naturally produces an improvement in the second condition. Similarly, subjects might grow bored of the task and show fatigue effectswhen asked to perform it under the second condition.

Here are the advantages and disadvantages of each type of experimental design:

Within Subjects

Advantages

Disdvantages

Between Subjects There is less risk of Fewer subjects are needed as each practice or fatigue effects; subject is tested twice; There is less risk of data You have more control over loss due to drop-out as confounding variables. subjects are only measured once. Subjects may drop out, not completing the second condition Twice as many subjects and so rendering the data from their are required; first condition unusable; You have less control over Subjects can suffer from practice or confounding variables. fatigue effects when tested twice.

Exploration

Here are some scenarios for you to make a decision on. Which experimental design would you choose? You give a memory test to a group of people, then give them coffee and test them again. What type of design do you have? Which of these is a disadvantage of the design for the experiment above? If you wanted to test for differences between males and females, which design would you use? You sow one set of seeds in clay soil and another set of seeds in sandy soil to compare their growth rates. What type of design do you have?

Application
You have a paired experimental design. Are these data from the same subjects measured twice or from two sets of different subjects? Is the risk of confounding variables affecting the results greater or less for your experimental design? You have 30 measurements for each condition. How many different People were measured? How many times was each Person measured?

Choosing a Hypothesis
Tutorial Navigation

General Instructions | Introduction to Your Study | Experimental Design | Stating a Hypothesis Histograms | Central Tendency | Standard Deviation | Confidence Descriptive Statistics Intervals Comparing Two Samples and Populations | Choosing a T-Test | Paired T-Test | PSamples Values and T-Tables Important Concepts The Normal Distribution | Z Scores | Probability Distributions You are currently on Stating a Hypothesis at level 2. Level 1 | Level Levels 2 | Level 3 Next Topic Experimental Design | Frequency Histograms Getting Started

Explanation
Choosing Your Hypothesis In this section, you will learn how to choose and phrase a hypothesis for a study or experiment. A hypothesis is generally a single sentence that describes what your experiment sets out to test. Here are the key points to remember.

The hypothesis should identify the following: o What the experimental units are (are they people, soil samples, bacteria cells?); o What is being compared, i.e. the dependent variable (is it height, acidity, lifespan?); o What separates the things being compared, i.e. what is the independent variable and what two values does it take (is it male/female gender, or clay/sandy soil, or anti-bacterial gell/placebo treatment); o Whether a direction is expected (do we expect height to be greater in males, or do we just want to know if there is a difference in either direction?). For reasons that will be explained later, experiments where a direction is expected are called one tailed and experiments where a change in either direction is expected are called two tailed. Include these things in a sentence, not a list; Be as precise and specific as you can.

Using the example of height from the list above, you could phrase the research hypothesis as this follows: People of the male gender have a larger average height than people of the female gender. Point at any of the highlighted words in the sentence above to see which of the points above the word covers. You could re-write the sentence to make it less clumsy, for example, 'Males are taller, on average, than females', but you are assuming that who ever reads your hypothesis

can work out that gender is the independent variable and that you are measuring people, not some other animal. The null hypothesis is usually just the opposite of the research hypothesis, for example 'People of the male gender DO NOT have a larger average height than people of the female gender'. Notice how we keep a reference to the direction in the null hypothesis too.

Exploration
You can construct a clumsy version of the research hypothesis using a sentence like the one below: Units in the first sample of independent variable have a difference average dependent variable than units in the second sample sample of independent variable. Once you have the sentence like this, you can tidy it up to make it read better.

Application
Your tutor provided the following research hypothesis for the experiment that produced your data: Post-training test scores will be significantly higher than Pre-training test scores. Try to reproduce a similar research hypothesis using the method described in the section above. Click the [Generate Hypothesis] to see how your hypothesis reads.

in the

sample of

have a

average

than

under the

sample of

Click the button to turn your answers into a hypothesis. Is your experimental hypothesis one tailed or two tailed? Can you tidy up the experimental hypothesis and generate a null hypothesis from it?

Plotting a Data Frequency Distribution Histogram


Tutorial Navigation General Instructions | Introduction to Your Study | Experimental Design | Stating a Hypothesis Histograms | Central Tendency | Standard Deviation | Confidence Descriptive Statistics Intervals Comparing Two Samples and Populations | Choosing a T-Test | Paired T-Test | PSamples Values and T-Tables Important Concepts The Normal Distribution | Z Scores | Probability Distributions You are currently on Frequency Histograms at level 2. Level Levels 1 | Level 2 | Level 3 Getting Started

Next Topic

Stating a Hypothesis | Central Tendency

Explanation
Calculating the Frequencies for a Histogram It is easy to understand how you would build a frequency histogram for variables that have category values, such as the colour of an object or discreet values such as 1,2,3 and 4. In such cases, each bar in a histogram would represent a single value and the height of the bar would reflect the frequency with which that value appears in the data. The procedure is even easier if you have sorted your data into order first, as this puts all the equal values next to each other. 1. Simply work through the data counting how many times each value occurs. 2. Plot these numbers in a bar chart, labelling the y axis as 'Frequency' and the x axis with your variable's name. Each bar should be labelled with the name of the value it represents. For numeric variables, there can be a few complications:

If your data contains whole discrete numbers, it might not contain every number in the range. For example, you might survey families and find people have 0,1,2,3,4 or 6 children (none with 5 children, note). Would you have a space for 5 with no bar in it? It would make the missing value more obvious, which is good, but in some cases this might lead to a very sparse graph, for example if data contained only values of 1, 10, 100 and 1000, you wouldn't want to use 1000 bars, all but 4 of which were empty. You need to think about the missing values and decide for yourself whether to include an empty bar for them. For continuous values, you may find that no single value is repeated twice (for example, 1.1, 1.2, 1.5, 2.1, 2,2 .. etc.) In such cases, you must group values together into 'bins'. In our example above, we might choose bins covering 1 to 2, 2.1 - 3, etc. There is a help topic below that explains how to calculate the bin ranges for continuous numbers.

Calculating bin ranges for continuous variables

Exploration

In this section, you must think about how you would plot different data sets on a histogram. In each question, you will be shown a small set of data and asked to decide how it should be treated to produce the right kind of histogram. Note that the numbers show the raw data, not the frequency counts. Data What would you plot? 1,1,2,2,3,3,3,5 1,1,70,150,150,350 1.54, 1.61, 1.7, 4.53, 4.62, 7.84, 8.14

Application
Counting the Frequencies for Your Data Below on the left are the values of when is pre-test. You are going to count the number of occurances of each value in the data. This task is much easier if you sort the data first, which we have done for you. Your data has 17 different values, so there are 17 boxes to enter the counts into. Count the frequency of each different value and enter it into each box. When Is Pre-Test 4 5 5 6 7 7 8 9 9 11 11 12 13 13 13 14 14 14 Enter Frequencies 4 5 6 7 8 9 11 12 13 14 16 17 18 19 20 22 23 Plot

16 17 18 18 19 20 20 20 22 22 23 23 When you have filled in all the frequencies correctly, click 'Plot' to see the histogram plotted. Now we can do the same for when is post-test, which has 17 different values. Count the frequency of each different value and enter it into each box. When Is Post-Test 5 5 5 6 6 9 9 10 13 13 14 16 16 16 17 18 20 20 20 20 21 22 Enter Frequencies 5 6 9 10 13 14 16 17 18 20 21 22 24 25 26 27 28 29 30 Plot

24 25 26 27 27 28 29 30 When you have filled in all the frequencies correctly, click 'Plot' to see the histogram plotted.

Measures of Central Tendency


Tutorial Navigation General Instructions | Introduction to Your Study | Experimental Design | Stating a Hypothesis Histograms | Central Tendency | Standard Deviation | Confidence Descriptive Statistics Intervals Comparing Two Samples and Populations | Choosing a T-Test | Paired T-Test | PSamples Values and T-Tables Important Concepts The Normal Distribution | Z Scores | Probability Distributions You are currently on Central Tendency at level 2. Level 1 | Level Levels 2 | Level 3 Next Topic Frequency Histograms | Standard Deviation Getting Started

Explanation
Calculating Measures of Central Tendency Level 1 showed how the three different measures of central tendency are each suited to different types of data. We will now see how the different measures can give different results depending on the pattern of values in your data. We will also show you how to calculate the values of the three different measures.

The mode is calculated by counting how often each value occurs. There is no formula, you just count the values and see which one occurs most often. That value is the mode; The median is found by first sorting your data in order and then finding the middle value. If you have an even number of data points, there won't be a value right in the middle, so you take the two middle values, add them together and divide the answer

by 2; The mean is calculated using a formula which adds all the values together and then divides by the number of values there are.

Now we know how to calculate the values, we can see how each different measure performs with different patterns of data.

The mean is badly affected by extreme values. Imagine you had the data: 1,2,3,4,400. The mean is 410/5 = 82, which doesn't really reflect where any of the data lies The mode can be misleading if several values all appear equally often. It is useless if all the values in your data are different, as they each appear once only! The median can smooth out extreme values but can produce the least frequently occurring value, as in this example: 1,1,1,2,3,3,3

Exploration
Here is the formula for calculating the mean of a set of numbers. Hover over any part of the formula for an explanation of what it signifies.

The whole formula means 'The mean of x (where x is any numeric variable, such as ) is the sum of its values divided by the number of values there are.'

Application
Here is the data from your study: Calculator Pre-Test Post-Test 11 20 Help 22 16 19 18 13 9 18 16 20 13 20 29 23 30 9 14 14 27

16 28 7 24 14 20 18 6 22 25 17 5 13 20 13 16 11 13 20 10 23 26 5 20 8 9 9 5 14 6 7 5 6 21 4 17 5 22 12 27 The Mean We will calculate the mean of the pre-test column first.
1. Use the calculator to add up all the values in the pre-test column. Enter that value

here:
2. Count the number of values in the data (n). Enter that value here: 3. Now divide the sum from step 1 by n. Enter that value here using 2 decimal places: 4. Now repeat the process for the post-test column and enter the mean here:

To calculate the median and the mode, it is much easier to sort your data into order first. Here is your data sorted with the position of each value noted along the top row. Positio 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 n Pre4 5 5 6 7 7 8 9 9 11 11 12 13 13 13 14 14 14 16 17 18 18 19 20 20 20 22 22 23 23 Test Post5 5 5 6 6 9 9 10 13 13 14 16 16 16 17 18 20 20 20 20 21 22 24 25 26 27 27 28 29 30 Test The Mode Once again, we will look at the pre-test data first.

Find the mode by counting how many times each value appears and noting which appears most often. Also check to see if more than one value appears equally most often. 1. Let's see if it makes sense to calculate a mode for this data. The mode is the value that appears more than any other value. Is there a single mode for your data?
2. Now enter the mode of when is post-test:

The Median The median is in the middle of the ordered list of values.
1. Looking at pre-test, how many values are there?

2. Is there an exact midway point, or does the middle span two values?
3. At what position is the mid point? Enter its position or the first position if it spans

two.
4. Find the value in the pre-test row at that position. If the mid point spans two places, add

the numbers at those two positions together and divide what you get by two. The number you get is the median of when is pre-test. Enter it to 2 decimal places:
5. Now enter the median of when is post-test. Use 2 decimal places

Reporting Your Findings Once you have calculated your measures of central tendency, you usually want to report them. The best way to do this is in a table. As you are comparing two groups, you should report the findings for both in a way that makes them easy to compare:

Pre-Test Post-Test Mean Mode Median Mean Mode Median 13.77 13 13.5 17.23 20 17.5

Standard Deviation
Tutorial Navigation General Instructions | Introduction to Your Study | Experimental Design | Stating a Hypothesis Descriptive Statistics Histograms | Central Tendency | Standard Deviation | Confidence Getting Started

Intervals Comparing Two Samples and Populations | Choosing a T-Test | Paired T-Test | PSamples Values and T-Tables Important Concepts The Normal Distribution | Z Scores | Probability Distributions You are currently on Standard Deviation at level 2. Level 1 | Level Levels 2 | Level 3 Next Topic Central Tendency | Confidence Intervals

Explanation
We have seen that standard deviation is a measure of variation. Now we will be more specific. Standard deviation measures how far from the average (the mean) the data is spread. The standard deviation measures the average distance between each of the data points and the mean. One thing that you should keep in mind when using or reading about standard deviation and variance:

The standard deviation of a sample of data (s) involves dividing by (n-1) The standard deviation of a population of data () involves dividing by n

When using software to calculate these values, check which version is used and make sure you use the version for samples unless your data represents the entire population of whatever you are measuring (which is pretty rare). For example, Excel has the function stdev for calculating the sample standard deviation and the function stdevP for calculating the population standard deviation. The sample (n-1) method is known as the unbiased method. In reality, the differences are small unless n is very small, and in such cases you probably have insufficient data anyway. The difference is worth knowing about, however.

Help Topic - Squaring and Square Roots

Exploration
Calculating the Standard Deviation When you look at the formula below, you will see that it is made up of the mean subtracted from each value, then squared. These values are added together and the final sum is divided by n-1. You might notice that this is similar to the formula for calculating the mean, and you'd

be right. The standard deviation is the average distance between each point and the mean There are two common formulae for calculating standard deviation. We will show you both of them at here. The first one you will see highlights the fact that the standard deviation is the average distance between each point in your data and the mean. Here is the formula for calculating the standard deviation of a sample of data. Click on any part of the formula to see a description of its role. There is help on squaring and square roots above if you need it.

It says that you subtract the mean, which is x, from each value in turn and square the result. You add all of these values together and divide the result by one less than the number of values in your data (n-1). Finally, you find the square root of the result of the division by n-1 and that is your standard deviation. Note that any part of the formula in brackets is calculated first, for example (x - x) uses the brackets to indicate that you do the subtractions first and then add up the results of all the subtractions. In case you are wondering, we square the differences and then take square roots for two reasons:

Half the distances from the mean are positive and half are negative, so adding them up would produce a value of zero! Squaring makes numbers positive and removes that problem; We could leave the value squared, but square rooting it brings it back into units that match the units of the original measurements. If we measured people's heights in cm, then we can report the standard deviation in cm too.

You might see the formula for standard deviation in text books written as below. This is the same formula, but written in a way that makes it easier to calculate.

It doesn't matter which formula you use as they both give the same result.

Application
Let's calculate the standard deviation for in the case where is pre-test, which has a mean of 13.77
1. First we calculate the differences, x - x, and the squared differences, (x-x)2.

The values of in the case where is pre-test are shown below with the differences and squared differences filled in for all except the first five. Subtract the mean from each value and enter the result in the first column (keep the minus sign if there is one). For example, the first value is 11 - 13.77 = -2.77 Then square the number from the first column and enter its value in the second column. To square a number, multiply it by itself (or use the [x^2] button) on the calculator. Enter answers to 2 decimal places. You can use the [RD] button on the calculator to round down. Value x - x 11 22 19 13 18 20 6.23 20 6.23 23 9.23 9 -4.77 14 0.23 16 2.23 7 -6.77 14 0.23 18 4.23 22 8.23 17 3.23 13 -0.77 13 -0.77 11 -2.77 20 6.23 23 9.23 5 -8.77 (x-x)2 Calculator Help

38.81 38.81 85.19 22.75 0.05 4.97 45.83 0.05 17.89 67.73 10.43 0.59 0.59 7.67 38.81 85.19 76.91

8 9 14 7 6 4 5 12

-5.77 -4.77 0.23 -6.77 -7.77 -9.77 -8.77 -1.77

33.29 22.75 0.05 45.83 60.37 95.45 76.91 3.13

2. The formula for standard deviation requires you to add up the squared differences.

That is the (x-x)2 part. Add them up now and enter the total here.
3. You have 30 values in your data. What is n-1? 4. What is 1001.37 divided by n-1? 5. Now find the square root of that number, that is the part: s=

One Sample Confidence Intervals


Tutorial Navigation General Instructions | Introduction to Your Study | Experimental Design | Stating a Hypothesis Histograms | Central Tendency | Standard Deviation | Confidence Descriptive Statistics Intervals Comparing Two Samples and Populations | Choosing a T-Test | Paired T-Test | PSamples Values and T-Tables Important Concepts The Normal Distribution | Z Scores | Probability Distributions You are currently on Confidence Intervals at level 2. Level 1 | Level Levels 2 | Level 3 Next Topic Standard Deviation | Samples and Populations Getting Started

Explanation
Calculating a Confidence Interval Calculating the confidence interval is pretty straight forward. The only complication is that it includes a constant, Z, which is related to the required confidence level. Z does not take the value of the confidence level itself (95 for example), but takes a constant that relates to the

number of standard deviations that cover the chosen percentage (95% for example) of a standard normal distribution. You can look these values up in a table of z-scores, or see the most useful values at level three of this topic. For now, this is all you need to know: The z value you need for a 95% confidence interval is 1.96. Understanding where the confidence intervals come from requires you to follow level three of this tutorial, so at this level we will just show you how to calculate them.

Exploration
The formula for confidence intervals is shown below. Hover over or click on any part of it for a description of that part.

To read, it is 'The sample mean plus and minus z times the sample standard deviation over the square root of the sample size, where z is the z value for the chosen confidence interval.'

x is the mean of the sample, and is read 'x bar'; The + (plus and minus) part means that you subtract this value from your mean to get the lower confidence limit and you add it to your mean to get the upper confidence limit. Between those limits is the confidence interval. That means that you use the formula twice: once with addition and once with subtraction; Z is the constant described above. For 95% confidence levels, Z = 1.96; S is the standard deviation of the sample; The symbol means square root. There is an introduction to square roots on the page describing how to calculate standard deviations.

Application
Let us look at your dependent variable, in the pre-test sample. Here are the figures you will need. The sample mean of in the pre-test sample is 13.77. There are 30 entries in your data, so N=30. The sample standard deviation is 5.88

We will work to the 95% confidence level, so z = 1.96 You will be entering answers to 2 decimal places, but working to more decimal places with the calculations. If you can round numbers down in your head, do so. If not, use the [RD] button on the calculator, but remember to put the full number into memory [M=] first so that you can get it back for further calculations. You can clear the calculator screen with the [CA] button.
1. What is the value of N for your data?

2. Now use the calculator to find the square root of N with the [Sqrt] button and put it into memory using the [M=] button
3. Round that number to 2 decimal places and enter it here

4. Now divide the sample standard deviation by the value you stored in memory. Use the [MR] to retrieve that value from memory. Put this new number into memory with [M=].
5. Now round this new number to 2 places and enter it here 6. Now recall the number from step 5 from memory [MR] and multiply it

Calculator Help

by 1.96. You will need this number twice, so put it into memory with [M=]. When you have safely stored the number, round it down to 2 decimal places and enter the rounded value here. 7. This number must now be used to produce the lower and upper confidence limits. You will make two calculations, recalling the number from step 6 [MR] for each one. Use [CA] to clear the screen between steps 8 and 9. 8. Subtract this number from the mean to produce the lower confidence limit
9. Add this number to the mean to produce the upper confidence limit

Finally, we report the confidence intervals like this: The population mean of in the pre-test sample has a 95% confidence interval of between 11.67 and 15.87. Now try it on your own with data from the post-test group. Here are the figures you will need. The sample mean of in the post-test sample is 17.23. There are 30 entries in your data, so N=30. The sample standard deviation is 7.88 Repeat the steps above with these new values (but work on paper or a spreadsheet - don't enter anything other than your final answer here). What is the lower confidence limit for the post-test sample? What is the upper confidence limit for the post-test sample?

Samples and Populations


Tutorial Navigation General Instructions | Introduction to Your Study | Experimental Design | Stating a Hypothesis Histograms | Central Tendency | Standard Deviation | Confidence Descriptive Statistics Intervals Comparing Two Samples and Populations | Choosing a T-Test | Paired T-Test | PSamples Values and T-Tables Important Concepts The Normal Distribution | Z Scores | Probability Distributions You are currently on Samples and Populations at level 2. Level Levels 1 | Level 2 | Level 3 Next Topic Confidence Intervals | Choosing a T-Test Getting Started

Explanation
Sample Size At level one we saw that a sample represents a small number of units taken from a larger population and that the larger the sample, the better it reflects the population. The term 'sample size' refers to the number of measurements in the sample. Now we will practice making some simple calculations based on sample size. Sample size is always notated in statistics using the letter n. If you are aware of this fact and happy using n in calculations, you can skip this page and move on.

Exploration
In the exploration stage of level two through out this tutorial you will be shown formulae and asked to make calculations guided by those formulae. Here are a few very simple formulae involving n to get you used to the idea. Hover over each one to see what it means.

Application
Again, as preparation for later levels and to practice using the calculator, perform the following sums involving n. You have 30 data points in your data. For your data, what value is n? For your data, what value is n-1? Divide the number 20 by n and round the result to 2 decimal places using the [RD] button to round down. Calculator

Choosing a T-Test
Tutorial Navigation General Instructions | Introduction to Your Study | Experimental Design | Stating a Hypothesis Histograms | Central Tendency | Standard Deviation | Confidence Descriptive Statistics Intervals Comparing Two Samples and Populations | Choosing a T-Test | Paired T-Test | PSamples Values and T-Tables Important Concepts The Normal Distribution | Z Scores | Probability Distributions You are currently on Choosing a T-Test at level 2. Level 1 | Level Levels 2 | Level 3 Next Topic Samples and Populations | Paired T-Test Getting Started

Explanation
Can You Always Use a t-Test? The t-test described here is actually Students's t-test, to give it its full name. This test is a parametric test, which means that it makes some assumptions about the data. If these assumptions are not true, the test can give misleading results. We will look at those

assumptions here. These assumptions concern the shape of the distribution of the data, which can be seen from a frequency histogram:

The samples being compared should have a reasonably symmetrical distribution; The samples being compared should have a mean which is close to the centre of the distribution; The distribution should have only one mode (highest point in the frequency histogram).

You will probably notice that these conditions are very similar to those of a normal distribution. There are pages on normal distributions and frequency histograms if you want to recap those topics.A t-test works best on normally distributed data. If the distribution is not normal, but still satisfies the conditions above, (it is flat, for example), then a t-test will still work as long as you have enough values in your sample (25 to 30 is usually okay). If your data is heavily skewed, then you may need a very large sample before a t-test will work. In such cases, an alternate non-parametric test should be used.

Exploration
Below are pictures of 6 different frequency histograms taken from samples that are small enough to require normal distribution-like properties. Decide whether or not you could perform a t-test on the data that produced each one.

Flat (or Even)

Nearly Flat

Normal

Bimodal

Skewed Left

Skewed Right

Application
Now we will look at your data here and satisfy ourselves that it is suitable for a t-test. Here is the histogram for when is pre-test. This plot has 6 highest bars, which are for the ranges 4 to < 6.38, 8.75 to < 11.13, 11.13 to < 13.5, 15.88 to < 18.25, 18.25 to < 20.63, 20.63 to 23. There is little symmetry in the distribution of the data.

Look at the shape of your histogram and answer the questions below to decide whether or not the distribution of your data is normal. Is the histogram symmetrical? Is the mode (the highest bar) at the centre of the histogram? Can we perform a t-test on this data?

Paired t-test
Tutorial Navigation General Instructions | Introduction to Your Study | Experimental Design | Stating a Hypothesis Histograms | Central Tendency | Standard Deviation | Confidence Descriptive Statistics Intervals Comparing Two Samples and Populations | Choosing a T-Test | Paired T-Test | PSamples Values and T-Tables Important Concepts The Normal Distribution | Z Scores | Probability Distributions You are currently on Paired T-Test at level 2. Level 1 | Level Levels 2 | Level 3 Next Topic Choosing a T-Test | P-Values and T-Tables Getting Started

Explanation
The paired t-test is calculated to take into account the fact that pairs of subjects (one from each condition) go together. It is based on the differences between the values of each pair that is one subtracted from the other. In the formula for a paired t-test, this difference is notated as d. When you look at the formula for the paired t-test below, you will see that it uses just d and n (the number of values in the data), and nothing else. The way these two values effect the value of t are as follows:

As the average of the differences gets bigger, t gets bigger; As the variation in the differences gets bigger, t gets smaller; As the number of values gets bigger, t gets bigger.

There is another way of writing the paired t-test formula, that you might see in a book. It makes the above points clearer, but is not so easy to use to calculate a t-value from data. It is shown in the help topic below if you are interested in seeing it.

An Alternative Formula

Exploration
Here is the formula for a paired t-test. Hover over any part to see that part explained.

The top of the formula is the sum of the differences (i.e. the sum of d). The bottom of the formula reads as: The square root of the following: n times the sum of the differences squared minus the sum of the squared differences, all over n-1.

The sum of the squared differences: d2 means take each difference in turn, square it, and add up all those squared numbers. The sum of the differences squared: (d)2means add up all the differences and square

the result. Brackets around something in a formula mean (do this first), so (d)2 means add up all the differences first, then square the result.

Application
Use the calculator to the right to work through the formula above and work out the t-value for your data. Keep your subtotals in the calculator memory so that you do not loose accuracy with rounding errors. Only round your final answer. There are 30 data points, so n=30.
1. Click here to calculate the sum of the differences and the sum of the squared

2. 3. 4. 5. 6.
7.

differences between the paired observations. Make sure your browser allows pop-ups so that you can see this data. Multiply n by the sum of the squared differences Subtract the sum of the differences squared from the current answer. Divide the current answer by n-1 Now find the square root of the current answer (Sqrt button) and put the answer into memory (M= button) Now Enter the sum of the differences value and divide by the value stored in memory (MR button recalls from memory). Finally, use the RD button to round your answer to 3 decimal places and enter the answer below.

Enter your value for t here: Note that it is perfectly okay if you get a negative value for t. If you do, remember to put the minus sign (-) before the number you enter above.

P-Values and T-Tables


Tutorial Navigation General Instructions | Introduction to Your Study | Experimental Design | Stating a Hypothesis Histograms | Central Tendency | Standard Deviation | Confidence Descriptive Statistics Intervals Comparing Two Samples and Populations | Choosing a T-Test | Paired T-Test | PGetting Started

Samples Values and T-Tables Important Concepts The Normal Distribution | Z Scores | Probability Distributions You are currently on P-Values and T-Tables at level 2. Level Levels 1 | Level 2 | Level 3 Next Topic Paired T-Test | The Normal Distribution

Explanation
Using T-Tables We have learned the following things about a t-test:

The t-test produces a single value, t, which grows larger as the difference between the means of two samples grows larger; t does not cover a fixed range such as 0 to 1 like probabilities do; You can convert a t-value into a probability, called a p-value; The p-value is always between 0 and 1 and it tells you the probability of the difference in your data being due to sampling error; The p-value should be lower than a chosen significance level (0.05 for example) before you can reject your null hypothesis.

Converting from t-values to p-values is usually done by software but you can do it by hand by looking values up in a table. This page explains how. T-tables are filled with t-values. Each t-value is in a column that is specific to a given significance level. The t-values shown in the table are known as critical values. If your t-value is greater than or equal to the critical value in the table, then you can conclude that your pvalue is less than the significance level you have chosen. The procedure is simple. Once you have chosen your significance level you look to see whether your t-value is larger than the critical t-value shown in the column relating to your chosen significance level. If it is, then you can say that p is less than your chosen significance level. The t-table has more than one row of t-values. Each row corresponds to a given number of degrees of freedom. Degrees of freedom are explained at level on of this topic. Use the row of the table that corresponds to the degrees of freedom in your data. The final thing that you need to know about using t-tables is that you must read them differently depending on whether your test is one-tailed or two-tailed. The rule is simple enough: The p-value for a two tailed test is twice what it would be for a one tailed test. Some tables (such as the ones on this page) show a p-value heading for both one and two-

tailed tests to make it easier. If your test is one tailed, find the p-value you need in the top row. If your test is two tailed, find your p-value in the second row. Notice how the p-values for the two tailed test are simply double those for the one tailed test. Unfortunately, some tables show only one-tailed values and some show only two-tailed values. If you are using a table from a book or from the internet, make sure you know what it shows. You can easily convert from one to two or two to one by remembering the p for twotailed tests is twice what it is for one-tailed tests. Reporting p-Values and t-Values You report the results of a t-test in the following way: t(df)=t, p<p Where df is the degrees of freedom of your data, t is the t-value you found and p is the p-value you found.

Using the Less Than Sign (<)

Exploration
Converting from a t-value to a p-value requires some tricky maths, so statisticians use precalculated tables to make it easy. These tables are called t-tables. A t-table is shown below. Notice that it has a number of columns, each showing a different significance level (p). T-tables do not tell you the exact value of p. They list a few key values of p and tell you what value of t is required to produce a p-value less than the listed value. It has many rows and each row is marked with a number showing the degrees of freedom (df). Once you have chosen your significance level, p, calculated a value for t, and worked out how many degrees of freedom you have, you can find the entry in the t-table that you need as follows:

Look down the column that corresponds to your chosen value for p Find the row that corresponds to your degrees of freedom, and where they meet, you will find the value you need.

This value is called the critical value. The final thing to do is compare this value with your value of t

If your t-value is greater than or equal to this value, then t is significant and you have

found a difference If your t-value is less than this value is then t is not significant.

Here are some examples for you to work out using the table below. p<0.05 df=4 t=3.143 Tails=1 Critical value: Significance: p<0.01 df=12 t=3.143 Tails=1 Critical value: Significance: p<0.05 df=8 t=3.143 Tails=2 Critical value: Significance: p<0.01 df=7 t=3.143 Tails=2 Critical value: Significance: Onetailed p Twotailed p df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.1 0.2 0.05 0.1 0.025 0.05 0.01 0.02 0.005 0.01

3.078 6.314 12.706 31.821 63.657 1.886 2.92 4.303 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.65 2.624 2.602 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.25 3.169 3.106 3.055 3.012 2.977 2.947

1.638 2.353 3.182 1.533 2.132 2.776 1.476 2.015 2.571 1.44 1.943 2.447

1.415 1.895 2.365 1.397 1.86 2.306

1.383 1.833 2.262 1.372 1.812 2.228 1.363 1.796 2.201 1.356 1.782 2.179 1.35 1.771 2.16

1.345 1.761 2.145 1.341 1.753 2.131

Application
Now lets look at your own data. The first step is to decide on the number of degrees of freedom you have. This depends on whether your samples are paired or independent.

Your experiment compares when is Pre-Test with when is Post-Test. is measured from the same People under both conditions, Pre-Test and Post-Test. You measured 30 People in each condition. Your test is one-tailed.

What is your experimental design? Should you be looking in the one-tailed or two-tailed column? How many degrees of freedom does your data have? Your t-value is -2.13. We will ignore the minus sign and just use 2.13, as the values in the ttables are all positive. Using a p-value of 0.05, look in the section of the t-table shown below and find the critical value for your data. One-tailed p Two-tailed p df 24 25 26 27 28 29 30 31 32 33 34 0.1 0.2 0.05 0.1 0.025 0.05 0.01 0.02 0.005 0.01

1.318 1.711 2.064 2.492 2.797 1.316 1.708 2.06 2.485 2.787

1.315 1.706 2.056 2.479 2.779 1.314 1.703 2.052 2.473 2.771 1.313 1.701 2.048 2.467 2.763 1.311 1.699 2.045 2.462 2.756 1.31 1.697 2.042 2.457 2.75 2.453 2.744

1.309 1.696 2.04

1.309 1.694 2.037 2.449 2.738 1.308 1.692 2.035 2.445 2.733 1.307 1.691 2.032 2.441 2.728

p<0.05 df=29 t=2.13 Tails=1 Critical value: Significance: Which of these would be the correct way to report this result? Now look up your t-values at the 0.01 level. p<0.01 df=29 t=2.13 Tails=1 Critical value: Significance: Which of these would be the correct way to report this result?

The Normal Distribution


Tutorial Navigation General Instructions | Introduction to Your Study | Experimental Design | Stating a Hypothesis Histograms | Central Tendency | Standard Deviation | Confidence Descriptive Statistics Intervals Comparing Two Samples and Populations | Choosing a T-Test | Paired T-Test | PSamples Values and T-Tables Important Concepts The Normal Distribution | Z Scores | Probability Distributions You are currently on The Normal Distribution at level 2. Level Levels 1 | Level 2 | Level 3 Next Topic P-Values and T-Tables | Z Scores Getting Started

Explanation

The Normal Shape We have said that a normal distribution is bell shaped but we can be much more specific than that. There is actually a very precise shape that represents a perfect normal distribution. The shape is defined by a formula and this page describes the shape and the formula. Given a sample of data, it can be useful to calculate the shape that its histogram would take if the data were normally distributed. This makes it easier to compare your histogram to the ideal normal shape. You saw this in the game at level one of this topic. It is useful to capture the normal histogram shape in a formula for other reasons that we will meet later. The key points to remember now are:

You can represent the shape of a normal distribution histogram using a formula; The closest normal curve for a given sample of data is generated based on two measurements from that data: o Its mean, o Its standard deviation. The formula for generating the curve uses these two values.

Comparing this shape to your data's own histogram tells you how close to normality your data are;

If your data are normally distributed, then the formula can replace the histogram! You can use the formula to tell you how many values you would expect to be above a certain value or in a certain range. You can know all these things from just two numbers: the mean and the standard deviation! That is pretty useful. You will learn how to make these calculations on the page about z scores and probabilities.

The Formula for Calculating the Normal Distribution Curve

Exploration
The shape of the Normal Distribution changes depending on the mean and standard deviation of the data. You can generate the shape of the curve using a formula, which is shown in the help topic above. The formula above is used in the game to the right. Numbers are generated by the formula using a mean and standard deviation that you choose. These values are plotted across the range from -5 to +5. The higher the curve, the higher the frequency of values in that part of the range.

How does changing the mean affect the normal distribution's shape? How does increasing the standard deviation affect the normal distribution's shape? How does decreasing the standard deviation affect the normal distribution's shape?

Application
By generating the closest normal curve to your data, you can see how well the mean and standard deviation reflect the distribution of your data. We have done this for your data in the chart below.

The mean of the data plotted above is 13.77 and the standard deviation is 5.88. You will notice that we have plotted the data in continuous bins, rather than using a bar for each value, as we did in the section on histograms. This is because the normal curve only works over a continuous scale. What does the highest point along the red line correspond to?

Look at where the mean of the red line is and think about whether this reflects where the bulk of your data lie.

Z Scores
Tutorial Navigation General Instructions | Introduction to Your Study | Experimental Design | Stating a Hypothesis Histograms | Central Tendency | Standard Deviation | Confidence Descriptive Statistics Intervals Comparing Two Samples and Populations | Choosing a T-Test | Paired T-Test | PSamples Values and T-Tables Important Concepts The Normal Distribution | Z Scores | Probability Distributions Levels You are currently on Z Scores at level 2. Level 1 | Level 2 | Level 3 Next Topic The Normal Distribution | Probability Distributions Getting Started

Explanation
Calculating and Using z Scores The method of calculating a z score is very simple: subtract the sample mean from the value and divide what you get by the sample standard deviation. It should be obvious to the reader that the resulting number (z) has the following properties:

z is positive when the value is greater than the mean z is negative when the value is less than the mean z is the number of standard deviations between the value and the mean. z is zero when the value equals the mean z has no theoretic upper or lower bound apart from that caused naturally by the range that values can take.

In normally distributed data, 99.99% of the distribution falls below the point where z = 4, so z will rarely be greater than 4 or less than -4.

Exploration
The formula for calculating a z score is given below.

It reads: z equals x minus the sample mean, all over the sample standard deviation, where x is the value for which a z score is required. Hover over any part of the formula for an explanation of what it means.

Application
Now we turn to your own data. We will look at in the pre-test sample. The mean of in the pretest sample is 13.77 and the standard deviation is 5.88. Use the calculator to calculate z scores for the following values from your data. Give your answers to 2 decimal places. To calculate the z score for the value 22, first calculate 22 - 13.77: Now divide the number you entered above by the standard deviation (shown above) The value you get is the z score for 22 Calculate the z score for 13 Calculate the z score for 8 Calculate the z score for 5 Calculate the z score for 18

Calculator Help

Probability Distributions and z Scores


Tutorial Navigation General Instructions | Introduction to Your Study | Experimental Design | Stating a Hypothesis Histograms | Central Tendency | Standard Deviation | Confidence Descriptive Statistics Intervals Getting Started

Comparing Two Samples and Populations | Choosing a T-Test | Paired T-Test | PSamples Values and T-Tables Important Concepts The Normal Distribution | Z Scores | Probability Distributions You are currently on Probability Distributions at level 2. Level Levels 1 | Level 2 | Level 3 Next Topic Probability Distributions

Explanation
Converting z Scores to Probabilities With Z Tables Once you have calculated a z score, you can answer questions concerning the probability of new measurements taking certain values. For example, if you planned to sell shoes in a shop and you knew that shoe sizes were normally distributed, you could work out how many customers for each shoe size you would be likely to see. There is a formula for converting from z scores to probabilities, but it is rarely used by hand. Most people use either computer software or z score tables to calculate probabilities from z scores. Here are the first few lines of a z score table. Z 0.0 0.1 0.2 0.00 0.5000 0.5398 0.5793 0.01 0.5040 0.5438 0.5832 0.02 0.5080 0.5478 0.5871 0.03 0.5120 0.5517 0.5910 0.04 0.5160 0.5557 0.5948 0.05 0.5199 0.5596 0.5987 0.06 0.5239 0.5636 0.6026 0.07 0.5279 0.5675 0.6064 0.08 0.5319 0.5714 0.6103 0.09 0.5359 0.5753 0.6141

The bold figures down the left hand side represent the first decimal place in the value of z that you wish to look up and the values across the top represent the second decimal place of the value. This allows you to look up values up to two decimal places without needing a very long table. In the extract above, for example, the z score 0.17 is highlighted - the corresponding value in the table is 0.5675 (shown in bold). What do the values that you look up represent? In the table above, the values represent the proportion of a standard normal distribution that falls to the left of (that is, below) the given z score. So in our highlighted example, we see that 0.5675 (about 57%) of a normal distribution lies below the point that is 0.17 standard deviations above the mean. Some tables show negative z scores too. Some (like ours above) do not. In such cases, you must look up the positive value (so if you have a z score of -2, look up 2) and subtract the value you get from 1. Returning to our example again, if we wanted to look up -0.17, we would find the value for 0.17 (0.5675) and subtract that from 1, giving 0.4325. That tells us

that 43% of the values in a normal distribution lie below the point 0.17 standard deviations below the mean.

To find the proportion above a given z score, simply look up the z score in the table and subtract the value you find from 1. This works because all the probabilities must add up to one, so the proportion below a point plus the proportion above a point must sum to one. To find the proportion between two z scores, look up both z scores and subtract the value from the table for the lowest from that for the highest. For example, looking up z = 1.96 in the table gives 0.975. The value for z = -1.96 is 0.025. Subtract 0.025 from 0.975 and you get 0.95. So 95% of a standard normal distribution lies between -1.96 and +1.96 of the mean.

Exploration
This section will let you see how some calculations are made using z-tables and z-scores. Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Enter a z-score between -2 and 2 in 0 0.5 0.504 0.508 0.512 0.516 0.52 0.524 0.528 0.532 0.536 the box below and 0.1 0.54 0.544 0.548 0.552 0.556 0.56 0.564 0.567 0.571 0.575 click one of the 0.2 0.579 0.583 0.587 0.591 0.595 0.599 0.603 0.606 0.61 0.614 buttons. The prgram 0.3 0.618 0.622 0.626 0.629 0.633 0.637 0.641 0.644 0.648 0.652 will calculate the proportion of data 0.4 0.655 0.659 0.663 0.666 0.67 0.674 0.677 0.681 0.684 0.688 above or below your 0.5 0.691 0.695 0.698 0.702 0.705 0.709 0.712 0.716 0.719 0.722 chosen z-score, and 0.6 0.726 0.729 0.732 0.736 0.739 0.742 0.745 0.749 0.752 0.755 explain how the calculation is done. 0.7 0.758 0.761 0.764 0.767 0.77 0.773 0.776 0.779 0.782 0.785 0.8 0.788 0.791 0.794 0.797 0.8 0.802 0.805 0.808 0.811 0.813 Enter an example z0.9 0.816 0.819 0.821 0.824 0.826 0.829 0.831 0.834 0.836 0.839 score here Proportion above the 1 0.841 0.844 0.846 0.848 0.851 0.853 0.855 0.858 0.86 0.862 chosen z-score 1.1 0.864 0.867 0.869 0.871 0.873 0.875 0.877 0.879 0.881 0.883 Proportion below the 1.2 0.885 0.887 0.889 0.891 0.893 0.894 0.896 0.898 0.9 0.901 chosen z-score 1.3 1.4 1.5 1.6 1.7 0.903 0.919 0.933 0.945 0.955 0.905 0.921 0.934 0.946 0.956 0.907 0.922 0.936 0.947 0.957 0.908 0.924 0.937 0.948 0.958 0.91 0.925 0.938 0.949 0.959 0.911 0.926 0.939 0.951 0.96 0.913 0.928 0.941 0.952 0.961 0.915 0.929 0.942 0.953 0.962 0.916 0.931 0.943 0.954 0.962 0.918 between the 0.932 chosen z-score 0.944 and the mean 0.954 0.963
Proportion

1.8 0.964 0.965 0.966 0.966 0.967 0.968 0.969 0.969 0.97 0.971 1.9 0.971 0.972 0.973 0.973 0.974 0.974 0.975 0.976 0.976 0.977 2 0.977 0.978 0.978 0.979 0.979 0.98 0.98 0.981 0.981 0.982 Go back to the level one game to see how these values cover a normal distribution plotted on a histogram.

Application
Below we present some z-scores calculated from your data, looking at in the pre-test sample. Use the z-table below to answer the following questions. Enter values to 3 decimal places. Remember to subtract the value you find from one when the z-score is negative. What proportion of your data would you expect to find below 5, which has a z-score of -1.49? What proportion of your data would you expect to find above 7, which has a z-score of -1.15? What proportion of your data would you expect to find between 17, which has a z-score of 0.55 The z table is shown below. Z 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.00 0.5 0.54 0.579 0.618 0.655 0.691 0.726 0.758 0.788 0.816 0.841 0.01 0.504 0.544 0.583 0.622 0.659 0.695 0.729 0.761 0.791 0.819 0.844 0.02 0.508 0.548 0.587 0.626 0.663 0.698 0.732 0.764 0.794 0.821 0.846 0.03 0.512 0.552 0.591 0.629 0.666 0.702 0.736 0.767 0.797 0.824 0.848 0.04 0.516 0.556 0.595 0.633 0.67 0.705 0.739 0.77 0.8 0.826 0.851 0.05 0.52 0.56 0.599 0.637 0.674 0.709 0.742 0.773 0.802 0.829 0.853 0.06 0.524 0.564 0.603 0.641 0.677 0.712 0.745 0.776 0.805 0.831 0.855 0.07 0.528 0.567 0.606 0.644 0.681 0.716 0.749 0.779 0.808 0.834 0.858 0.08 0.532 0.571 0.61 0.648 0.684 0.719 0.752 0.782 0.811 0.836 0.86 0.09 Calculator 0.536 Help 0.575 0.614 0.652 0.688 0.722 0.755 0.785 0.813 0.839 0.862 and 8, which has a z-score of -0.98?

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4

0.864 0.885 0.903 0.919 0.933 0.945 0.955 0.964 0.971 0.977 0.982 0.986 0.989 0.992 0.994 0.995 0.997 0.997 0.998 0.999 0.999 0.999 1 1 1 1 1 1 1 1

0.867 0.887 0.905 0.921 0.934 0.946 0.956 0.965 0.972 0.978 0.983 0.986 0.99 0.992 0.994 0.995 0.997 0.998 0.998 0.999 0.999 0.999 1 1 1 1 1 1 1 1

0.869 0.889 0.907 0.922 0.936 0.947 0.957 0.966 0.973 0.978 0.983 0.987 0.99 0.992 0.994 0.996 0.997 0.998 0.998 0.999 0.999 0.999 1 1 1 1 1 1 1 1

0.871 0.891 0.908 0.924 0.937 0.948 0.958 0.966 0.973 0.979 0.983 0.987 0.99 0.992 0.994 0.996 0.997 0.998 0.998 0.999 0.999 0.999 1 1 1 1 1 1 1 1

0.873 0.893 0.91 0.925 0.938 0.949 0.959 0.967 0.974 0.979 0.984 0.987 0.99 0.993 0.994 0.996 0.997 0.998 0.998 0.999 0.999 0.999 1 1 1 1 1 1 1 1

0.875 0.894 0.911 0.926 0.939 0.951 0.96 0.968 0.974 0.98 0.984 0.988 0.991 0.993 0.995 0.996 0.997 0.998 0.998 0.999 0.999 0.999 1 1 1 1 1 1 1 1

0.877 0.896 0.913 0.928 0.941 0.952 0.961 0.969 0.975 0.98 0.985 0.988 0.991 0.993 0.995 0.996 0.997 0.998 0.998 0.999 0.999 0.999 1 1 1 1 1 1 1 1

0.879 0.898 0.915 0.929 0.942 0.953 0.962 0.969 0.976 0.981 0.985 0.988 0.991 0.993 0.995 0.996 0.997 0.998 0.999 0.999 0.999 0.999 1 1 1 1 1 1 1 1

0.881 0.9 0.916 0.931 0.943 0.954 0.962 0.97 0.976 0.981 0.985 0.989 0.991 0.993 0.995 0.996 0.997 0.998 0.999 0.999 0.999 0.999 1 1 1 1 1 1 1 1

0.883 0.901 0.918 0.932 0.944 0.954 0.963 0.971 0.977 0.982 0.986 0.989 0.992 0.994 0.995 0.996 0.997 0.998 0.999 0.999 0.999 0.999 1 1 1 1 1 1 1 1

Probability Distributions and z Scores

Tutorial Navigation General Instructions | Introduction to Your Study | Experimental Design | Stating a Hypothesis Histograms | Central Tendency | Standard Deviation | Confidence Descriptive Statistics Intervals Comparing Two Samples and Populations | Choosing a T-Test | Paired T-Test | PSamples Values and T-Tables Important Concepts The Normal Distribution | Z Scores | Probability Distributions You are currently on Probability Distributions at level 3. Level Levels 1 | Level 2 | Level 3 Next Topic Probability Distributions Getting Started

Explanation
When The Distribution is Not Normal Z-tables are used to convert z-scores to probabilites. They assume that the data from which the z-scores are calculated is normally distributed. You can see this fact in the structure of the ztable:

A normal distribution is symmetrical and z-tables usually only present one half of the distribution, leaving you to subtract from 1 to find the proportion in the other direction. This would not work for any distribution that is not symmetrical. The entry in the z-table for z = 0 is 0.5, reflecting that the mean of a normal distribution is at its centre. The pattern of values in the z-table follow the bell shape of the normal distribution.

If your data is not normal, the z-table will give you misleading answers.

Calculating Percentages

Exploration
The game below allows you to draw frequency histograms and compare them to a normal distribution. To make any bar higher or lower, simply click on the chart at the height you want. The total number of data points does not change, so the chart re-draws itself each time you make a change.

The game will draw a red line indicating the shape of the normal distribution closest to your distribution. When you use z-tables, you are assuming that this is the shape of the distribution of your data. The game reports the percentage of your data (as shown by the blue histogram) to the right of three different values. They are the mean (which has a z-score of 0), one standard deviation above the mean (z=1) and two standard deviations above the mean (z=2). The numbers in brackets tell you the actual values of the mean and the points one and two standard deviations from the mean. When you assume that data is normally distributed, you assume that the percentages of data above each of these points are as follows:

50% of the data lies above the mean 16% of the data lies more than one standard deviation above the mean 2% of the data lies more than two standard deviations above the mean

Try the following and see how different the normal curve is from the actual histogram:

Make the distribution flat; Make the distribution skewed left or right; Make the distribution multi-modal; Try to make the percentage of data above the mean as high as you can.

Application
Now we can look at your data and see how well it matches the calculations made from a ztable. The frequency histogram for for the pre-test sample is shown below.

The mean of for the pre-test sample is 13.77 and the standard deviation is 5.88. There are 30 values in this data. 10 values are below the mean. There is a calculator below to help you answer these questions and a help topic on calculating percentages above. What percentage, to the nearest whole number (0 to 100), of your data lie below the mean? Z-tables say that half (50%) of the data in a normal distribution lie below the mean. How many values would you expect to find below the mean in your data if it were normally distributed? Enter your answer as a whole number: 19 values in your data lie more than one standard deviation from the mean. What is that as a percentage (to the nearest whole number)? Z-tables say that 34% of the data in a normal distribution lie more than one standard deviation from the mean. How many values would you expect to find in that range in your data if it were normally distributed? Enter your answer as a whole number: Think for a moment about whether the distribution of your data is sufficiently Calculator normal for the probabilities from a z-table to be of use. If not, think about whether this is because the population from which your sample is drawn is not normally distributed or because your sample is too small to capture the distribution of the Help population.