Sie sind auf Seite 1von 43

OVERVIEW

STATISTICS Modules 5-8


Prepared by: Mrs. Cristina H. Price

Module 5 The Normal Curve


The normal curve or the normal frequency distribution is a hypothetical distribution of scores that is widely used in statistical analysis. Since many psychological and physical measurements are normally distributed, the concept of the normal curve can be used with many scores. The characteristics of the normal curve make it useful in education and in the physical and social sciences. Characteristics of the Normal Curve Some of the important characteristics of the normal curve are:
1. The normal curve is a symmetrical distribution of scores with an equal number of scores 2. 3. 4. 5.

above and below the midpoint of the abscissa (horizontal axis of the curve). The tails of the curve are asymptotic to the horizontal axis. Since the distribution of scores is symmetrical the mean, median, and mode are all at the same point on the abscissa. In other words, the mean = the median = the mode. If we divide the distribution up into standard deviation units, a known proportion of scores lies within each portion of the curve. The total area of the curve is equal to 1. Tables exist so that we can find the proportion of scores above and below any part of the curve, expressed in standard deviation units. Scores expressed in standard deviation units are referred to as Z-scores.

Standard score
It is the distance of an observed value (x) from the mean in terms of the standard deviation. It tells how many standard deviations the observed value lies above or below the mean of its distribution.

xx z= s

OR

x z=

Where: x = observed value or raw score x = sample mean s = sample standard deviation = population mean = population standard deviation

Using MS Excels Statistical Functions


Conversion of raw score to standard score

=standardize(x, x , s)
Finding the area/probability value given the value of z

=normsdist(z)
Finding the value of z given the probability value

=normsinv(p-value)

Sample problem
The average daily income of 2000 workers is P362.00 with a standard deviation of P15.00. Assuming that the daily incomes are normally distributed, a) what percent of the workers earn at least P380.00 per day? b) what percent of the workers earn below P350.00 per day? c) determine the number of workers who earn from P350.00 to P375.00 per day.

Exercises
In a departmental examination in statistics, the mean grade was 74 and the standard deviation was 10. If the grades are approximately normally distributed and 40 students got grades between 70 and 80, how many students took the examination? 2. The experience of a certain hospital showed that the distribution of length of stay of its patients is normal with a mean of 11.5 days and a standard deviation of 2 days. a) What percent of the patients stayed 9 days or less? b) If a new method in nursing care is to be administered to the middle 95% of the group, how long should a patient stay to be included in the study? 3. A study finds that the time spent on advertisement per hour on a certain TV station is approximately distributed with mean equal to 12.8 minutes and standard deviation equal to 2.2 minutes. During a randomly selected hour, what is the probability that between 14 and 16 minutes were devoted to advertisements?
1.

Module 6 - Hypothesis Testing


Hypothesis statement that is formulated which

cannot be accepted to be true unless otherwise proven Assumption statement that is formulated and accepted to be true without the necessity of a proof. It serves as the springboard of the study Types of hypothesis null and alternative

Null hypothesis (Ho) vs Alternative hypothesis (Ha)


Ho (stating a claim like no difference, no effect, no relationship, etc) Ha (hypothesis that claims against the Ho)

Other key concepts


Types of test one-tailed and two-tailed Level of significance alpha (0.01, 0.05, 0.1) Observed value the obtained computed

value based on the data gathered Critical value the value obtained from the table; the value that divides the distribution of the test into the rejection and the acceptance region

Critical Values of z
Test Type One-tailed Two-tailed Level of significance 0.05 0.01 1.645 2.33 1.96 2.575

Steps in hypothesis testing


Steps to performing hypothesis testing 1. Write the original claim and identify whether it is the null hypothesis or the alternative hypothesis. 2. Write the null and alternative hypotheses. Use the alternative hypothesis to identify the type of test. 3. Write down all information from the problem. 4. Determine the appropriate test statistics. Find the critical value using the tables. 5. Compute the test statistic. 6. Make a decision to reject or fail to reject the null hypothesis. A picture showing the critical value and test statistic may be useful. 7. Write the conclusion.

Bivariate Distribution
Involving two variables - Significant difference (t-test, z-test, ANOVA) - Significant relationship (Pearson r,

Spearmans rho, Chi-square and other correlational techniques)

Testing significant difference using parametric test (two groups)


t test

distribution is normal homogeneous variance sample std. deviation is known n < 30


z test

distribution is normal homogeneous variance population std. deviation is known n 30

t-test & z test (sample vs population)

(x ) n t= ; df = n 1 s

(x ) n z=

Decision rules
Observed value < Critical value OR p-value > level of significance() Accept the null hypothesis (Theres not enough evidence to reject the null hypothesis) Observed value Critical value OR p-value < level of significance() Reject the null hypothesis

Sample Problems
1.

2.

A certain rice miller claims that the average weight of a cavan of rice is 50 kilograms with a standard deviation of 5 kilograms. A retailer sampled 20 cavans of this rice and got an average weight of 46.6 kilograms. Is the claim of the rice miller valid using 5% level of significance? A standardized test was administered to thousands of pupils with a mean score of 85 and a standard deviation of 8. A random sample of 50 pupils were given the same test and showed an average score of 83.20. Is there evidence to show that this group has a lower performance than the ones in general at 0.05 level of significance?

t-test & z test (two-sample groups)


t= ( x y) ( x y ) (nx 1) S x 2 + (n y 1) S y 2 1 1 + nx + n y 2 nx n y ; df = nx + n y 2

z=

( x y) ( x y )

x y + nx ny
2

Sample Problems
1.

A random sample of 20 newly-born baby boys showed an average weight of 7.4 pounds while a sample of 25 newly-born baby girls showed a mean weight of 6.5 pounds. If the variance of all newly-born babies is 1.25 pounds, can we say that newly-born baby boys are heavier than newly-born baby girls? Two hamburger stores were compared in terms of the number of orders of hamburger per day. The results of the ten-day observation were as follows:

2.

Day Nutri Deli

1 148 150

2 126 127

3 103 125

4 169 152

5 135 129

6 152 146

7 144 153

8 124 118

9 132 126

10 128 119

Using the 0.05 level of significance, test if there is a significant difference in the number of orders of hamburger from the two stores.

t-test for paired observations (dependent groups)


d

t=

nd n(n 1)
2

; df = n 1

Sample Problem
A certain diet pill was developed by a pharmaceutical company. To test the efficacy of the said pill, 10 randomly selected individuals were selected. The results of the study are presented in the following table: SUBJECTS WEIGHT BEFORE WEIGHT AFTER 1 2 3 4 5 6 7 8 9 10 148 142 131 128 121 118 120 152 112 110 150 139 130 128 123 115 119 151 110 105

Use hypothesis testing to determine whether the diet pill is effective or not.

Module 7 - Correlation and Simple Linear Regression


1. Pearsons product-moment correlation

coefficient (Pearson r) 2. Spearmans rank-order correlation coefficient (Spearmans )

Pearsons product-moment correlation coefficient

r=

[ n x ( x ) ][ n y ( y ) ]
2 2 2 2

n x y ( x )( y )

Sample Problem
Determine if there is a relationship between the number of years of service and the employees monthly salary based on the data gathered from a certain company. No. of yrs. of service Monthly salary (in T) 5 7 8 10 12 2 11 15 20 25 25 28 29 32 34 18 32 35 40 50

Legend for Pearsons r and Spearmans rho


0.00 0.3 Little or no positive correlation 0.31 0.5 Low positive correlation 0.51 0.7 Moderately positive correlation 0.71 0.9 High positive correlation 0.91 1.0 Very high positive correlation

Overview

Testing the significance of the relationship

n2 t=r , 2 1 r
Where: r = the correlation coefficient n = no. of pairs df = degrees of freedom

df = n 2

Key concepts to remember


Correlation simply describes a relationship

between two variables. It does not explain why the two variables are related. Specifically, a correlation should not and cannot be interpreted as proof of a causeand-effect relationship between the two variables. The value of a correlation can be affected greatly by the range of scores represented in the data.

Key concepts to remember


One or two extreme data points, often called

outliers, can have a dramatic effect on the value of a correlation. A correlation measures the degree of relationship between two variables. The values of r range from -1.00 to +1.00. The value r2 is called the coefficient of determination because it measures the proportion of variability in one variable that can be determined from the relationship with the other variable.

Spearmans rank-order correlation coefficient (Spearmans )

= 1

n (n 1)
2

6 d

Where: n = no. of pairs d = difference between the ranks of each pair

Statistics to test the significance of

z = n 1

Sample Problem
Seven instructors are rated by freshmen and sophomore students on clarity of presentation and the results are tabulated. What is the Spearman rho for the following?

Instructor 1 2 3 4 5 6 7

Freshmen 44 39 36 35 33 29 22

Sophomore 58 42 18 22 31 38 38

Where a is the intercept and b is the slope or the incremental change in Y when X changes by one unit.

Regression Analysis
Regression Analysis is a statistical technique used to

describe relationships among variables. This relationship is expressed in a form of mathematical equation. The simplest case of such a relationship is when there is a single independent variable (X) explaining the dependent variable (Y) in a linear fashion.

y = a + bx
Where a is the intercept and b is the slope or the incremental change in Y when X changes by one unit.

Key concepts to remember


Regression analysis is the most widely used technique of

Multivariate Analysis with applications across all types of problems and all disciplines. It is a statistical technique that is concerned with describing and evaluating the relationship between a metric variable called dependent variable and one or more metric or non-metric variables called independent variables or regressors. It attempts to predict the change in the dependent variable as a result of changes in the independent variables. In addition, the analysis of the independent variables allows assessment of their respective explanatory impact on the dependent variable.

NSAT Achievement SUMMARY OUTPUT Grade 78 82 79 83 Regression Statistics 80 82 Multiple R 0.95 92 91 0.90 93 94 Adjusted 0.88666 86 88 86 85 87 86 ANOVA df Regression Residual Total 1 6 7 SS MS 118.7248 118.725 12.77516 131.5 Lower Upper 95% 95% 5.50 45.42 0.48 2.1292 F Significance F 55.7605 0.0003 Standard Error
Observations

1.45918 8

Intercept NSAT

Coefficients Standard t Stat P-value Error 25.46 8.16 3.12 0.02 0.71 0.10 7.47 0.00

0.95

Using the formula:


Predicted Achievement grade = 25.46 + 0.71 * 70 = 75.38 The value of r-squared indicates the percentage of relationship between the NSAT scores and the achievement grade. Thus, there is 90.29 % association.

Module 8 Selected Nonparametric Statistics


Chi-square test (2) Mann-Whitney U test Kruskal-Wallis H test

Chi-square Test
Significant relationship Test of goodness-of-fit Test of independence

=
2

( OF EF )
EF

; df = (r 1)(c 1)

Where: OF= observed frequency EF = expected frequency

Sample problem:
Suppose we want to find out if there is a relationship between the students color preference and personality. The data may be illustrated in the contingency table below: Observed frequencies:

Red Introvert Extrovert 10 90 100

Yellow 3 17 20

Green 15 25 40

Blue 22 18 40

Total 50 150 200


Grand total (n) Row totals (fr)

Column totals (fc)

To determine the expected frequencies for each cell, we use the formula below: Where ef = expected frequency fr = total frequencies of the corresponding row fc = total frequencies of the corresponding column n = grand total

( f r )( f c ) ef = n

Testing significant difference using nonparametric test (two groups)


Mann-Whitney U test

N1 ( N1 + 1) U 1 = N1 N 2 + R1 2
N 2 ( N 2 + 1) U 2 = N1 N 2 + R2 2
Where: U is the lower value between U1 and U2.

Example:
Treatment 4 7 1 12 2 2 9 Control 20 17 3 15 7 12 18

Testing significant difference (3 or more groups)


Parametric test (distribution is normal)

ANOVA (Analysis of variance) Nonparametric test Kruskal-Wallis


k 12 H= Ri2 3(n + 1), df = k 1 n(n + 1) i =1

To compare four bowling balls, a professional bowler bowls five games with each ball and gets the following scores:
Bowling ball A Bowling ball B Bowling ball C Bowling ball D 221 202 210 229 232 225 205 192 207 252 189 247 198 218 196 220 212 226 216 208

Use the H test at 0.05 level of significance to test the null hypothesis that on the average the bowler performs equally well with the four bowling balls.

Das könnte Ihnen auch gefallen