Statistics For College Students-Part 2

OVERVIEW
STATISTICS Modules 5-8

Prepared by: Mrs. Cristina H. Price
Module 5 The Normal Curve

The normal curve or the normal frequency distribution is a hypothetical distribution of scores that is widely used in statistical analysis. Since many psychological and physical measurements are normally distributed, the concept of the normal curve can be used with many scores. The characteristics of the normal curve make it useful in education and in the physical and social sciences. Characteristics of the Normal Curve Some of the important characteristics of the normal curve are:
1. The normal curve is a symmetrical distribution of scores with an equal number of scores 2. 3. 4. 5.
above and below the midpoint of the abscissa (horizontal axis of the curve). The tails of the curve are asymptotic to the horizontal axis. Since the distribution of scores is symmetrical the mean, median, and mode are all at the same point on the abscissa. In other words, the mean = the median = the mode. If we divide the distribution up into standard deviation units, a known proportion of scores lies within each portion of the curve. The total area of the curve is equal to 1. Tables exist so that we can find the proportion of scores above and below any part of the curve, expressed in standard deviation units. Scores expressed in standard deviation units are referred to as Z-scores.
Standard score
It is the distance of an observed value (x) from the mean in terms of the standard deviation. It tells how many standard deviations the observed value lies above or below the mean of its distribution.
xx z= s
OR
x z=
Where: x = observed value or raw score x = sample mean s = sample standard deviation = population mean = population standard deviation
Using MS Excels Statistical Functions

Conversion of raw score to standard score
=standardize(x, x , s)
Finding the area/probability value given the value of z
=normsdist(z)
Finding the value of z given the probability value
=normsinv(p-value)
Sample problem
The average daily income of 2000 workers is P362.00 with a standard deviation of P15.00. Assuming that the daily incomes are normally distributed, a) what percent of the workers earn at least P380.00 per day? b) what percent of the workers earn below P350.00 per day? c) determine the number of workers who earn from P350.00 to P375.00 per day.
Exercises
In a departmental examination in statistics, the mean grade was 74 and the standard deviation was 10. If the grades are approximately normally distributed and 40 students got grades between 70 and 80, how many students took the examination? 2. The experience of a certain hospital showed that the distribution of length of stay of its patients is normal with a mean of 11.5 days and a standard deviation of 2 days. a) What percent of the patients stayed 9 days or less? b) If a new method in nursing care is to be administered to the middle 95% of the group, how long should a patient stay to be included in the study? 3. A study finds that the time spent on advertisement per hour on a certain TV station is approximately distributed with mean equal to 12.8 minutes and standard deviation equal to 2.2 minutes. During a randomly selected hour, what is the probability that between 14 and 16 minutes were devoted to advertisements?
1.
Module 6 - Hypothesis Testing

Hypothesis statement that is formulated which
cannot be accepted to be true unless otherwise proven Assumption statement that is formulated and accepted to be true without the necessity of a proof. It serves as the springboard of the study Types of hypothesis null and alternative
Null hypothesis (Ho) vs Alternative hypothesis (Ha)

Ho (stating a claim like no difference, no effect, no relationship, etc) Ha (hypothesis that claims against the Ho)
Other key concepts

Types of test one-tailed and two-tailed Level of significance alpha (0.01, 0.05, 0.1) Observed value the obtained computed
value based on the data gathered Critical value the value obtained from the table; the value that divides the distribution of the test into the rejection and the acceptance region
Critical Values of z
Test Type One-tailed Two-tailed Level of significance 0.05 0.01 1.645 2.33 1.96 2.575
Steps in hypothesis testing

Steps to performing hypothesis testing 1. Write the original claim and identify whether it is the null hypothesis or the alternative hypothesis. 2. Write the null and alternative hypotheses. Use the alternative hypothesis to identify the type of test. 3. Write down all information from the problem. 4. Determine the appropriate test statistics. Find the critical value using the tables. 5. Compute the test statistic. 6. Make a decision to reject or fail to reject the null hypothesis. A picture showing the critical value and test statistic may be useful. 7. Write the conclusion.
Bivariate Distribution
Involving two variables - Significant difference (t-test, z-test, ANOVA) - Significant relationship (Pearson r,
Spearmans rho, Chi-square and other correlational techniques)
Testing significant difference using parametric test (two groups)

t test
distribution is normal homogeneous variance sample std. deviation is known n < 30

z test
distribution is normal homogeneous variance population std. deviation is known n 30
t-test & z test (sample vs population)
(x ) n t= ; df = n 1 s
(x ) n z=
Decision rules
Observed value < Critical value OR p-value > level of significance() Accept the null hypothesis (Theres not enough evidence to reject the null hypothesis) Observed value Critical value OR p-value < level of significance() Reject the null hypothesis
Sample Problems
1.
2.
A certain rice miller claims that the average weight of a cavan of rice is 50 kilograms with a standard deviation of 5 kilograms. A retailer sampled 20 cavans of this rice and got an average weight of 46.6 kilograms. Is the claim of the rice miller valid using 5% level of significance? A standardized test was administered to thousands of pupils with a mean score of 85 and a standard deviation of 8. A random sample of 50 pupils were given the same test and showed an average score of 83.20. Is there evidence to show that this group has a lower performance than the ones in general at 0.05 level of significance?
t-test & z test (two-sample groups)

t= ( x y) ( x y ) (nx 1) S x 2 + (n y 1) S y 2 1 1 + nx + n y 2 nx n y ; df = nx + n y 2
z=
( x y) ( x y )
x y + nx ny
2
Sample Problems
1.
A random sample of 20 newly-born baby boys showed an average weight of 7.4 pounds while a sample of 25 newly-born baby girls showed a mean weight of 6.5 pounds. If the variance of all newly-born babies is 1.25 pounds, can we say that newly-born baby boys are heavier than newly-born baby girls? Two hamburger stores were compared in terms of the number of orders of hamburger per day. The results of the ten-day observation were as follows:
2.
Day Nutri Deli
1 148 150
2 126 127
3 103 125
4 169 152
5 135 129
6 152 146
7 144 153
8 124 118
9 132 126
10 128 119
Using the 0.05 level of significance, test if there is a significant difference in the number of orders of hamburger from the two stores.
t-test for paired observations (dependent groups)

d
t=
nd n(n 1)
2
; df = n 1
Sample Problem
A certain diet pill was developed by a pharmaceutical company. To test the efficacy of the said pill, 10 randomly selected individuals were selected. The results of the study are presented in the following table: SUBJECTS WEIGHT BEFORE WEIGHT AFTER 1 2 3 4 5 6 7 8 9 10 148 142 131 128 121 118 120 152 112 110 150 139 130 128 123 115 119 151 110 105
Use hypothesis testing to determine whether the diet pill is effective or not.
Module 7 - Correlation and Simple Linear Regression

1. Pearsons product-moment correlation
coefficient (Pearson r) 2. Spearmans rank-order correlation coefficient (Spearmans )
Pearsons product-moment correlation coefficient
r=
[ n x ( x ) ][ n y ( y ) ]
2 2 2 2
n x y ( x )( y )
Sample Problem
Determine if there is a relationship between the number of years of service and the employees monthly salary based on the data gathered from a certain company. No. of yrs. of service Monthly salary (in T) 5 7 8 10 12 2 11 15 20 25 25 28 29 32 34 18 32 35 40 50
Legend for Pearsons r and Spearmans rho

0.00 0.3 Little or no positive correlation 0.31 0.5 Low positive correlation 0.51 0.7 Moderately positive correlation 0.71 0.9 High positive correlation 0.91 1.0 Very high positive correlation
Overview
Testing the significance of the relationship
n2 t=r , 2 1 r
Where: r = the correlation coefficient n = no. of pairs df = degrees of freedom
df = n 2
Key concepts to remember

Correlation simply describes a relationship
between two variables. It does not explain why the two variables are related. Specifically, a correlation should not and cannot be interpreted as proof of a causeand-effect relationship between the two variables. The value of a correlation can be affected greatly by the range of scores represented in the data.

One or two extreme data points, often called
outliers, can have a dramatic effect on the value of a correlation. A correlation measures the degree of relationship between two variables. The values of r range from -1.00 to +1.00. The value r2 is called the coefficient of determination because it measures the proportion of variability in one variable that can be determined from the relationship with the other variable.
Spearmans rank-order correlation coefficient (Spearmans )
= 1
n (n 1)
2
6 d
Where: n = no. of pairs d = difference between the ranks of each pair
Statistics to test the significance of
z = n 1
Sample Problem
Seven instructors are rated by freshmen and sophomore students on clarity of presentation and the results are tabulated. What is the Spearman rho for the following?
Instructor 1 2 3 4 5 6 7
Freshmen 44 39 36 35 33 29 22
Sophomore 58 42 18 22 31 38 38
Where a is the intercept and b is the slope or the incremental change in Y when X changes by one unit.
Regression Analysis
Regression Analysis is a statistical technique used to
describe relationships among variables. This relationship is expressed in a form of mathematical equation. The simplest case of such a relationship is when there is a single independent variable (X) explaining the dependent variable (Y) in a linear fashion.
y = a + bx
Where a is the intercept and b is the slope or the incremental change in Y when X changes by one unit.

Regression analysis is the most widely used technique of
Multivariate Analysis with applications across all types of problems and all disciplines. It is a statistical technique that is concerned with describing and evaluating the relationship between a metric variable called dependent variable and one or more metric or non-metric variables called independent variables or regressors. It attempts to predict the change in the dependent variable as a result of changes in the independent variables. In addition, the analysis of the independent variables allows assessment of their respective explanatory impact on the dependent variable.
NSAT Achievement SUMMARY OUTPUT Grade 78 82 79 83 Regression Statistics 80 82 Multiple R 0.95 92 91 0.90 93 94 Adjusted 0.88666 86 88 86 85 87 86 ANOVA df Regression Residual Total 1 6 7 SS MS 118.7248 118.725 12.77516 131.5 Lower Upper 95% 95% 5.50 45.42 0.48 2.1292 F Significance F 55.7605 0.0003 Standard Error
Observations
1.45918 8
Intercept NSAT
Coefficients Standard t Stat P-value Error 25.46 8.16 3.12 0.02 0.71 0.10 7.47 0.00
0.95
Using the formula:

Predicted Achievement grade = 25.46 + 0.71 * 70 = 75.38 The value of r-squared indicates the percentage of relationship between the NSAT scores and the achievement grade. Thus, there is 90.29 % association.
Module 8 Selected Nonparametric Statistics

Chi-square test (2) Mann-Whitney U test Kruskal-Wallis H test
Chi-square Test
Significant relationship Test of goodness-of-fit Test of independence
=
2
( OF EF )
EF
; df = (r 1)(c 1)
Where: OF= observed frequency EF = expected frequency
Sample problem:
Suppose we want to find out if there is a relationship between the students color preference and personality. The data may be illustrated in the contingency table below: Observed frequencies:
Red Introvert Extrovert 10 90 100
Yellow 3 17 20
Green 15 25 40
Blue 22 18 40
Total 50 150 200

Grand total (n) Row totals (fr)
Column totals (fc)
To determine the expected frequencies for each cell, we use the formula below: Where ef = expected frequency fr = total frequencies of the corresponding row fc = total frequencies of the corresponding column n = grand total
( f r )( f c ) ef = n
Testing significant difference using nonparametric test (two groups)

Mann-Whitney U test
N1 ( N1 + 1) U 1 = N1 N 2 + R1 2
N 2 ( N 2 + 1) U 2 = N1 N 2 + R2 2
Where: U is the lower value between U1 and U2.
Example:
Treatment 4 7 1 12 2 2 9 Control 20 17 3 15 7 12 18
Testing significant difference (3 or more groups)

Parametric test (distribution is normal)
ANOVA (Analysis of variance) Nonparametric test Kruskal-Wallis

k 12 H= Ri2 3(n + 1), df = k 1 n(n + 1) i =1
To compare four bowling balls, a professional bowler bowls five games with each ball and gets the following scores:
Bowling ball A Bowling ball B Bowling ball C Bowling ball D 221 202 210 229 232 225 205 192 207 252 189 247 198 218 196 220 212 226 216 208
Use the H test at 0.05 level of significance to test the null hypothesis that on the average the bowler performs equally well with the four bowling balls.

Statistics For College Students-Part 2

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Statistics For College Students-Part 2

Hochgeladen von

Copyright:

Verfügbare Formate

OVERVIEW

STATISTICS Modules 5-8

Module 5 The Normal Curve

Using MS Excels Statistical Functions

Module 6 - Hypothesis Testing

Null hypothesis (Ho) vs Alternative hypothesis (Ha)

Other key concepts

Steps in hypothesis testing

Spearmans rho, Chi-square and other correlational techniques)

Testing significant difference using parametric test (two groups)

distribution is normal homogeneous variance sample std. deviation is known n < 30

distribution is normal homogeneous variance population std. deviation is known n 30

t-test & z test (sample vs population)

t-test & z test (two-sample groups)

Day Nutri Deli

t-test for paired observations (dependent groups)

Module 7 - Correlation and Simple Linear Regression

coefficient (Pearson r) 2. Spearmans rank-order correlation coefficient (Spearmans )

Pearsons product-moment correlation coefficient

Legend for Pearsons r and Spearmans rho

Testing the significance of the relationship

Key concepts to remember

Key concepts to remember

Spearmans rank-order correlation coefficient (Spearmans )

Where: n = no. of pairs d = difference between the ranks of each pair

Statistics to test the significance of

Key concepts to remember

Using the formula:

Module 8 Selected Nonparametric Statistics

Where: OF= observed frequency EF = expected frequency

Red Introvert Extrovert 10 90 100

Total 50 150 200

Column totals (fc)

Testing significant difference using nonparametric test (two groups)

Testing significant difference (3 or more groups)

ANOVA (Analysis of variance) Nonparametric test Kruskal-Wallis

Das könnte Ihnen auch gefallen