Sie sind auf Seite 1von 116

Collection of data-process of obtaining numerical

measurements

Tabulation or presentation-organization of data


into tables, graphs or charts in order to have a
logical and statistical conclusion

Statistics Analysis of data-the process of extracting from


the given data relevant information from which
numerical description can be formulated

Interpretation of data- task of drawing


conclusions from the analyzed data, formulation
of forecasts or predictions
Descriptive(measures of central
Tendency, variability, skewness and
kurtosis)

Statistics Inferential-higher of order critical


judgment and mathematical
methods (testing of hypothesis, z
and t-test, correlation, analysis of
variance, chi-square test,
regression analysis and time series
analysis
education
sociology business

Sports Uses of Statistics economics

Medicine Psychology
government
Terminologies
• Data- a set of observations, values, elements
or objects under consideration
• Population (N)- complete set of all possible
observations or elements.
• Universe- the set of all entities under study
• Sample (n)- Representative of a population
• Variable- attribute of interest observable on
each entity in the universe
Types of data
1. Qualitative data (attributes)- categories
1. Students according to sex
2. Age level
3. Curriculum level

• 2. Quantitative data – (variable)-result of


counting or measuring
 Discrete –(whole number) no. of students in the
classroom
 Continuous-dimension of a room 8.5m x10m x3.4m
Point estimate
Interval estimate
Statistic x
Population Sample
Sample
(Parameter)
μ (mu)
N
n= 2
Where:
n- sample size
1 +N(e)
N- population size
e- margin of error
Given: 15,000
Population = 15,000 n=
margin of error 5% (.05) 1 + 37.5
Solve for the sample size (n)
15,000
15,000 n=
n= 38.5
1 + (15,000) (0.05)2
n= 389.61
15,000 say 390 or 400
n=
1 + (15,000) (0.0025)
Scales of Measurement
1. Numerical Scale – numbers may be
substituted for names of various classes of
the variable (kind and not degree categorical
data)
• Color (red =1, yellow =2, 3=blue. . .)
• Size (small=1, medium=2 . . .)
• Sex (male =0, Female=1)
• Race
• Civil status
Continuation of Scales of Measurement

2. Ordinal Scales (rank, degree of variable)


• Height ( tall = 1, taller=2, tallest=3)
• Educational Attainment
– None =0
– elementary =1
– High school =2
– Diploma =3
– BS=4
– Masters Degree =5
– Doctorate Degree= 6
Continuation of Scales of Measurement

• 3. Interval scales- degree of difference or


distance between observations. Unchanged by
linear transformation

Grades Ranks Difference


90 1 20
(r1 & r2)
70 2 30
(r1&r3)
10 (r2 &r3)
60 3
Continuation of Scales of Measurement

• 4. Ratio Scales in cases where the ratio of any


two given values of a variable in an interval
scale starts from a fixed origin (zero)

• 0 90 0 40
• 90/40 = 9/4 add 5cm to both lengths
• 95/45≠ 9/4
Subscript and summation notation
• ∑ ( summation symbol Greek capital letter Sigma)
denotes that subscripted variables are to be
added.
• Xi (x sub i) where i stands for numbers 1,2,3,…,n.
n
• ∑ (xi) = (x1) + (x2) + (x3) +. . . (xn).
i =1

4
• ∑ (xi)3 = (x1)3 + (x2)3 + (x3)3 + (x4)3
i =1
Continuation of Subscript and summation notation

• ∑x2 ?
= (∑x)2
2 2 ?
• ∑x ∑y = ∑(x2 y2)
?
• ∑x ∑y = ∑(x y)
2 2 ?
• ∑x ∑y = ∑(x2 y2)
Continuation of Subscript and summation notation

x y xy x2 y2 x2y2

5 2 10 25 4 100

2 6 12 4 36 144

3 5 15 9 25 225

4 3 12 16 9 144

∑x= 14 ∑y=16 ∑xy= 49 ∑x2=54 ∑y2=74 ∑x2y2=613


(∑x)2=
(14)2
Chapter 2 . Collection of Data
• Two types of data
1. Primary – original source
2. Secondary – published or unpublished
data previously gathered
Methods in Collection Data

• Direct or interview method


• Indirect or questionnaire method
• Registration method
• Observation Method
• Experiment Method
Sampling Techniques
Simple Random Sampling (SRS)- most basic
method of probability sample, assigns equal
probabilities of selection to each possible
sample. Equal chance of being selected.
 Lottery sampling
 Table of Random Numbers
 SRS without Replacement- does not allow
 SRS with Replacement- allows repeats in
selection
Continuation of Sampling Techniques

• Systematic Sampling

Population Systematic Sample


Continuation of Sampling Techniques
 Stratified Random
Sampling
a
 The universe is divided into
L mutually exclusive
Sub-universe called strata c
b

Population d Stratified Random


Sample

 Independent Simple random samples are


Obtained from each stratum
Continuation of Sampling Techniques
Cluster Sampling
Continuation of Sampling Techniques

Non- Random Sampling


 Purposive Sampling
 Quota Sampling
 Convenience Sampling
Chapter 3. Presentation of Data
• Textual
• Tabular
 Statistical Table
Table Number and Heading

Stub Head Master Caption


Box
Head
Column Column Column Column
Caption Caption Caption Caption

Body
Simple Frequency Table

• Steps
 Determine the range (highest score-lowest Score)
 Determine the class intervals or categories (5-15)
 Range/i
 Write the class interval (lowest lower limit)
 Determine the class frequency
 Compute the class mark (LL+UL)/2
 Class boundaries (0.5 below the LL and 0.5 above
the UL)
10 12 16 18 20 26 29 30 33 36 39 40 44 45 49 50 52 54 54
11 13 15 19 21 24 26 27 33 32 34 39 45 46 49 50 52 51 53
14 17 19 22 25 29 32 35 36 45 44 47 48 50 51 53 52 54 53

Class Tally Frequency Class


Intervals (f) Mark(xi)
10-18 9 14
19-27 10 23
28-36 32
37-45 41
46-54 50
• Range = 54-10 = 44
• Desired no of steps = 5
• i = 44/5= 8.8
• LL = 10
• HL=54
Graphical
• Bar Graphs – Vertical Graph

6
5
4
3 Series 1
2 Series 2
Series 3
1
0
Category Category Category Category
1 2 3 4
Horizontal Bar Graph

Category 4

Category 3
Series 3
Series 2
Category 2 Series 1

Category 1

0 2 4 6
Line graph
6

4
Series 1
3
Series 2
Series 3
2

0
Category 1 Category 2 Category 3 Category 4
Pie Chart
Sales

1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
10 12 16 18 20 26 29 30 33 36 39 40 44 45 49 50 52 54 54
11 13 15 19 21 24 26 27 33 32 34 39 45 46 49 50 52 51 53
14 17 19 22 25 29 32 35 36 45 44 47 48 50 51 53 52 54 53

Class Frequency Class Mark <cf >cf rf


Intervals (f) (xi) (%)

10-18 14

19-27 23

28-36 32

37-45 41

46-54 50

100
%
Measures of Central Tendency
(Measures of Position)
• A measure of position or central tendency is a
single figure which is representative of the
general level of magnitudes or values of the
items in a set of data.
• Mean is the arithmetic average of a set of
values, or distribution
• Median
• Mode
Mean
Mean is the sum of all items divided by the total number of items or
terms
12 15 16

12 21 16

13 21 21

14 23 21

14 23 23

Mean = 65/5 =13 24 23

25 24

Mean = 152/7= 21.71 Mean = 144/7= 20.57


Mean
• Characteristics • When to Use
• -an interval statistic
• -calculated average • Interval
• -Value is determined by every interpretation is
case in the distribution appropriate
• -Affected by extreme values • The value of
• -Sum of deviations about the each score is
mean is zero desired
• -Can be subjected to numerous
mathematical computations • Further statistical
• -Most widely used computation is
• -Represents average quantity expected
Median
• The median of a finite list of numbers can
be found by arranging all the observations
from lowest value to highest value and
picking the middle one. If there is an even
number of observations, then there is no
single middle value; the median is then
usually defined to be the mean of the two
middle values
Median is the value of the middle item after
arranging the data in ascending or descending
order
12 15 16

12 21 17

13 21 20

14 23 21

14 23 24

Median = 13 24 25

25 Median= (20+21)/2 = 20.5

Median = 23
Median
• Characteristics • When to Use

• -an ordinal statistic • An ordinal


• -Rank or position average interpretation is
needed
• -Value is determined by • The middle score
the scores near the is desired
middle of the
• Avoid influence
distribution of extreme scores
• Not affected by extreme
scores
Mode is defined as a value of the term that appears most
frequently ( unimodal, bimodal or polymodal)
12 15 16

12 21 16

13 21 21

14 23 21

15 23 23

Mode = 12 24 23

25 24

Mode = 21,23 Mode = 16,21,23


Mode
• Characteristics • When to Use

• -an nominal statistic • A nominal


• -an inspection average interpretation is
• -the most occurring value is needed
in the distribution • Quick
• -Usually occurs near the approximation of
center of the distribution the central
• -can not be manipulated tendency
mathematically • Most frequently
• Rarely used occurring score is
• Most popular score needed
In a symmetrical distribution the mean, the median and the
mode are located at one point
1. The Median is the score
point which bisects the total
area. Half of the area falls to
the left and half to the right
2. The mode is the score point
with the greatest frequency ,
the point on the x-axis which
corresponds to the tallest
point in a curve
3. The mean is the score point Mean
on the x-axis which Median
corresponds to the point of mode
balance in the distribution
Source of picture:
en.wikipedia.org/wiki/Normal_di
stribution (7/8/2011)
• Skewness is the degree of departure from
symmetry of a distribution. A positively
skewed distribution has a "tail" which is pulled
in the positive direction. A negatively skewed
distribution has a "tail" which is pulled in the
negative direction.
Positively skewed and Negatively skewed
Distribution

Mean>Median>Mode

Mode mean
median

Mode>Median>mean

Source
Mean Mode mvpprograms.com
median (7/7/2011)
SOURCE: stewardess.inhatc.ac.kr (7/7/2011)
Kurtosis is the degree of peakedness of a
distribution. A normal distribution is a mesokurtic
distribution. A pure leptokurtic distribution has a
higher peak than the normal distribution and has
heavier tails. A pure platykurtic distribution has a
lower peak than a normal distribution and lighter
tails
MEAN FOR GROUPED DATA
10 12 16 18 20 26 29 30 33 36 39 40 44 45 45 50 52 54 54
11 13 15 19 21 24 26 27 29 30 33 32 34 39 45 45 52 51 53
18 11 15 14 17 19 22 25 29 32 35 36 45 44 45 53 52 54 53

Class f xi f(xi)
Interval
10-18 12 14 168 x = fx/n
19-27 10 23 230
= 1815/57
= 31.84
28-36 13 32 416
37-45 11 41 451
46-54 11 50 550
i= 9 N=57 1815
MEDIAN FOR GROUPED DATA
ci TLL-TUL f <cf

10-18 9.5-18.5 12 12

19-27 18.5-27.5 10 22

28-36 27.5-36.5 13 35
Median Class
37-45 36.5-45.5 11 46

46-54 45.5-54.5 11 57

i= 9 N=57
• n/2 = 57/2=28.5
WHERE : fm-1 is the sum of all frequencies before the median class
fm is the frequency of the median class
TLL is the true lower limit of the median class

(n/2 –fm-1)
• Median = TLL + --------------- (i)
fm

(28.5 – 22)
= 27.5 + ------------- (9)
13
= 27.5 + { 6.5/13} (9)
= 27.5 + (0.5) (9)
=27.5 +4.5
Median = 32
MODE FOR GROUPED DATA
ci TLL-TUL f <cf

10-18 9.5-18.5 12 12

19-27 18.5-27.5 10 22

28-36 27.5-36.5 13 35
Modal Class
37-45 36.5-45.5 11 46

46-55 45.5-55.5 11 57

i= 9 N=57
mode

∆1
• Mode = TLL + ------------ i Where :
∆1+ ∆2 TLL is the true lower
limit of the modal
10 class (highest number
Mode = 27.5 + ------------ (9) of frequency
10+11
= 27.5 + [10/21] (9) ∆1 frequency above
= 27.5 + (0.47) (9) the modal class
= 27.5 + (4.29)
=31.79 ∆2 frequency below
the modal class
Measures of Variation

• Range
• Variance
• Standard Deviation
Range is the simplest measure of variation . The
difference between the largest and the smallest value
in a given set of data. A much larger range suggests
greater variation or dispersion and is influenced by
extreme values called outliers. Only two variables are
considered and all other values are being ignored.

• 2, 7, 9, 12, 15, 17 Range = 17-2 = 15

• 5, 7 ,9,12, 15, 16 Range = 16-5 = 11


Standard Deviation and Variance

• Standard Deviation (σ)- commonly used


measure of variation . It indicates how closely
the values of a given data set are clustered
around the mean. A large value of the
standard deviation means values of that data
set are spread over a larger range around the
mean. It is also the positive square root of the
variance (σ2).
σ2 = ∑ (x -)2
N

σ = ∑ (x -)2
N
The final exam scores of five students were 80, 88, 92, 90
and 85. Determine the variance and standard deviation.

x - (x -)2
88 1 1
80 -7 49
92 5 25
90 3 9
85 -2 4
 = 435/5 = 87 88

σ2 = ∑ (x -)2 = 88 = 17.6 Variance


N 5
σ= 17.6 = 4.2 Standard deviation
Coefficient of Variation – a statistic that allows
comparison of data sets that have different units
of measurement
• CV = s/ x (100%) for samples

• CV = σ/ (100%) for population

• The data with larger CV is more variable.


The average score of the students in one English class is
110 with a standard deviation of 5; the average score of
students in a History class is 106, with a standard
deviation of 4. Which class is more variable?
• CV (English) = 5 / 110 (100%) = 4.55%

• CV(History) = 4 / 106 (100%) = 3.77%


• Since the CV for English Class is larger, their
scores are variable than the History Class
Coefficient of Skewness (Pearson Coefficient of
Skewness)
• SK = 3 ( x – Md)
s
• When the distribution is symmetrical the
coefficient is zero
• * When the distribution is Positively skewed
the coefficient is positive
• When the distribution is negatively skewed
the coefficient is negative.
Find the coefficient of skewness of a distribution with
mean 10, median 8 and standard
deviation 3.
3 ( 10 -8 )
• SK =
3
SK = 2 ( The distribution is positively Skewed)
NORMAL DISTRIBUTION

Normal Curve Properties of the Normal Curve

1. Bell shaped
2. Mean, median and mode are
equal and located at the center
of the distribution.
3. The normal curve is unimodal
4. The curve is symmetrical about
the mean
5. The curve is continuous
6. The curve never touches the x-
axis(asymptotic about the x-
axis)
7. The total area under the
normal distribution is 100%.
Areas under the normal curve

Standard score to z-score

x-
Z=
σ
Where:
X is the score
 is the mean
σ is the standard
deviation

Mean = median= mode


Determine the area under the standard normal
curve between 0 and 1.96

Area =
47.50%
Area = 0.4750 X 100% = 47.50%

0 1.96 = 0. 4750
Area from -1.53 - 0

43.70%

- 1.53 0
= 0.4370 or 43.70%
Area from z= -1.3 to 0.99

Total Area
-1.3 to 0 = 0.4032
0 to 0.99 = 0.3389
===============
0.7421
0.338
Or 74.21%
0.4032

- 1.3 0 0.99
0.4032 0.3389
Area to the right of z= 0.71

Total Area
Area (half) = 0.5000
0 to 0.71 = 0.2611 (-)
====================
0.2389

23.89%

0 0.71
= 0.2611
The average weekly income of 2,000 workers is P151.00 with a standard
deviation of P15.00. Assuming that the weekly incomes are normally
distributed, find the number of workers who earn:
1. from P119.50 to P155.50 per week
2. less than or equal to 127.50 per week
3. greater than or equal to P185.50 per week

x-
Z= mean = P151.00 sd = P15.00
σ
(1) Z for 119.50 and 155.50
z119.50 = (119.50 - 151.00)/15 = -2.1 ; area = 0.4821
z155.50 = (155.50 -151.00)/15 = 0.30 ; area = 0.1179
======
Total Area = 0.6000
-2.1 0 0.30
or 60%
Total number of workers whose salary falls from P119.50 to P155.50
= 2000 x .60
= 1,200 workers
2. less than or equal to 127.50 per week
(1) Z for 127.50
Area = 0.5000
z127.50 = (127.50 - 151.00)/15 = -1.56 ; area = - 0.4406
======
0.05940 or 5.94%

1.56 0
Total number of workers whose salary is less than or equal to P127.50
= 2000 x 0.05940
= 118.8 or 119
3. greater than or equal to P185.50 per week
(1) Z for 185.50
Area = 0.5000
z185.50 = (185.50 - 151.00)/15 = 2.3 ; area = - 0.4893
======
Total Area = 0.0107 or 1.07%

0 2.3
Total number of workers whose salary is greater than or equal to P185.50
= 2000 x 0.0107
= 21.4 or 21
Simple Tests of Hypothesis
• Hypothesis is a statement or tentative theory which aims to
explain facts about the real world.
• Hypothesis are subjected to testing and if found to be
statistically true , they are accepted if found to be false are
rejected.
• Ho null hypothesis
• Ha alternative hypothesis
• Rejection of the Ho implies acceptance of Ha
• Acceptance of Ha implies rejection of Ho
Type 1 and type 2 errors
Type 1 error (α error-when we reject the null
hypothesis (action) when in fact the null hypothesis
or Ho is true (actual condition) and therefore the
alternative hypothesis or Ha is false.(Producers’ risk
because it means articles produced may be rejected or
not sold)
Type 2 error (β error – error-when we accept the null
hypothesis (action) when in fact the null hypothesis
or Ho is false(actual condition) and therefore the
alternative hypothesis or Ha is true.(Consumers’ risk
because the consumer may accept products which are
poor of quality to meet the standards.)

Decision Ho = true Actual Condition


Ha = true
Reject Ho Type 1 error Correct Decision
Accept Ho Correct decision Type 2 error
One tailed and two tailed tests

Acceptance
region Rejection
Rejection
Rejection Acceptance region region
region region

One Tailed Two Tailed


(Directional < >) (Non-equality ≠)
Rejection
Region Ha : M>Mo
One tailed
Acceptance
/sided test to
Region
1-α
the left

α
>
Critical / Tabular
Value
Ha : M<Mo
One tailed
Rejection
/sided test to
Region Acceptance
Region
the left
α

<
Critical /
Tabular
Value
Z test (population standard deviation)

• ( x - ) n
Z=
σ
x1 - x2
Z=
σ 1/n1 + 1/n2
T-test (sample standard deviation)

1. Sample mean compared with population Mean


t= ( x - ) n-1
s
2. Two sample means for independent samples

x1 - x 2
t=
(n1-1) (S1 )2 + (n2 -1) (S2)2 1/n1 + 1/n2
n1 + n2 - 2
Steps in hypothesis Testing
• Formulate the null hypothesis that there is no significant
difference between items being compared. State the alternative
hypothesis which will be used in case Ho is rejected
• Set the level of significance (α )
• Determine the test to be used. Use z if population standard
deviation is given and t if sample standard deviation is given.
• Determine the tabular value. Use z tabular for z test
• For t tabular, compute degree of freedom (df) then look for the
tabular value from the table of t-distribution.
– Single sample df = n-1
– Two samples df = n1+n2 -2
• Compute for the z or t test
• Compare the computed with the tabular value
– Reject Ho if the absolute computed value is equal or greater than (≥)
the absolute tabular value
– Accept Ho if the absolute value is less than (<) the absolute tabular
value
Critical Values of Z

Significance level 0.10 0.05 0.025 0.01

Test Type
One tailed test (< ,>) ±1.28 ±1.645 ±1.96 ±2.33
Two tailed test (≠) ±1.645 ±1.96 ±2.33 ±2.58
The average grade of students in freshmen were found to be 115. A sample of 25
students grade has a mean 126. 7 and a standard deviation of 24.2. Is there a reason to
believe that the grade of the 25 students has no significant difference with the others?
Use α= 5%to test Ho = 115 against Ha≠115
Ho : There is no significant difference on the average grades of the freshmen students
with the grades of the 25 students (Ho = 115)
Ha : There is a significant difference on the average grades of the freshmen students with
the grades of the 25 students (Ha ≠ 115)
α= 5% df = n-1 = 25-1 = 24 t.05,24 =1.711

( x - ) n-1
t=
s
(126.7 -115 ) 25-1
t = = 2.3685
24.2
Since
tc = 2.3685 > t(24,0.05)= 1.711

Reject Ho accept Ha : : There is a significant difference on the average grades of


the freshmen students with the grades of the 25 students (Ha ≠ 115
Data from the school census show that the mean weight of college students was 45 kls with a
standard deviation of 3 kls. A sample of 100 students were found to have a mean weight of 47
kls. Are the 100 college students really heavier than the rest? Use .05 level of significance
Ho :The 100 students are not heavier than the rest ( Ho= 45Kls)
Ha: The 100 college students are really heavier than the rest (Ha> 45kls)
α = 0.05
• ( x - ) n
Z=
σ
z = (47 - 45) 100
3
zcomp = 6.67
z0.05= 1.645 tabular value

Since Zcomputed 6.67 > z0.05= 1.645

Reject the null hypothesis and accept the alternative which is : The 100 college
students are really heavier than the rest.
• One of the deans of a higher education
institution claims that the average salary
of the instructor in the college is
P15,200.00 per month. The salaries of 10
instructors are P15,000, P15,100,
P15,400, P15,300, P15,200, P14,700,
P15,100, P14,800, P15,000 and 14,900. Is
there enough evidence to reject the
Dean’s claim @ α=0.01? Assume that
pop. Is normally distributed
• Ho: M=P15,200/mo. (dean’s claim)
• Ha: M≠P15,200/mo
• df=n-1 , df=10-1=9
• x = 15,050
• std dev= 217.307
• tc=-2.183 . tc = I-2.183I , tcomp = 2.183
• α=0.01 α/2= 0.005
• df=10-1 =9
• ttab= 3.250
• tcomp = 2.183 < ttab= 3.250
• Accept the Ho and reject the alternative
• The mean salary of an instructor in the college is P15,200.00@
α=1%level.
Regression & Correlation
The pearson product moment coefficient of correlation

• The Pearson product moment


coefficient of correlation ( r ) is an
index of relationship between two
variables. The independent variable
can be presented by “x” while the
dependent variable can be presented
by “y”. The value of r is +1, zero to -1.
• r is used to determine the index of
relationship between variables
Perfect positive correlation (r = 1)
Negative perfect correlation (r = -1)
( n ∑xy) - ( ∑x ∑y)
r=
n ∑x2 - (∑x)2 n ∑y2 – ( ∑y)2
Test if there is a significant relationship between alcohol
level and number of accidents
Step 1. Use level of significance= 0.05
Step 2. Ho: There is no significant relationship between
alcohol level and number of accidents
Ha: There is significant relationship between
alcohol level and number of accidents
Step 3. df = n-2 , df = 7-2 , df =5
r.05 = 0.7545

Table 1. Blood alcohol level and no. of accidents


Alcohol 5 10 15 20 25 30 35
Level (x)
No. of accidents 10 17 26 30 32 38 42
(thousands (y)
x y x2 y2 xy
5 10 25 100 50
10 17 100 289 170
15 26 225 676 390
20 30 400 900 600
25 32 625 1024 800
30 38 900 1444 1140
35 42 1225 1764 1470

∑x=140 ∑y=195 ∑x2=3500 ∑y2= 6197 ∑xy=4620


Step 4. solve for r

(7) 4620 - (140 ) (195)
r=

7 (3500) - (140 )2 7(6197) - (195)2

r = 0.984 very high relationship


Step 5. Decision Rule : If the computed r
is greater than the r tabular reject Ho
rc= 0.984 > r.05 = 0.7545
Step 6 : Conclusion:
Since the computed r is greater than the r tabular Ho
is rejected. It means that there is a significant
relationship between alcohol level and the number
of accidents. It implies that the higher the alcohol
level the higher is the number of accidents to
happen
• Scatter plot

Figure 1
• Correlation
Table 2
Correlations

number of
accidents
alcohol level (thousands)
alcohol level Pearson Correlation 1
.984**
Sig. (2-tailed) .000

N 7 7
number of Pearson Correlation .984** 1
accidents
(thousands) Sig. (2-tailed) .000

N 7 7

**. Correlation is significant at the 0.01 level (2-tailed).


Table 3
ANOVAb
Sum of Mean
Model Squares df Square F Sig.
1
Regression 740.571 1 740.571 152.471 .000a
Residual 24.286 5 4.857
Total 764.857 6

a. Predictors: (Constant), alcohol level


b. Dependent Variable: number of accidents (thousands)

• significant @ 1% level
Interpretation:

• There is a significant relationship


between alcohol level and number of
accidents (r=0.984, p<0.000)
Simple Regression
Simple Linear Regression Analysis

• Simple linear regression analysis is used when there


is a relationship between the x independent variable
and y dependent variable. Used to predict the y
value given the x value.
y= a’+ b’x formula for least square regression line
(LSRL)
y= the dependent variable
x= the independent variable
a’= the y intercept
b’= the slope of the line
y=a’+ b’x (regression line)
b’= n Σxy – ΣxΣy = 7(4620)-(140)(195)
n Σx2 –(Σx)2 7(3500) –(140)2

b’= 1.029
a’ =Σy /n - b’ Σx/n = (195/7) – 1.029(140)/7
a’=27.86 -20.58 = 7.28
y=7.28 +1.029(x)
Table 4 Simple Regression
Coefficientsa
Standardiz
ed
Unstandardized Coefficient
Coefficients s
Model B Std. Error Beta t Sig.
1 Constant 7.286 1.863 3.912 .011
alcohol 1.029 .083 .984 12.348 .000
level
a. Dependent Variable: number of accidents (thousands)

Interpretation:
• Number of accidents = (7.286 + 1.029 x alcohol level)
• The model will be used to predict number of accidents
given an alcohol level
• As alcohol level increases, number of accidents also
increase.
Table 5

Model Summary

Adjusted R Std. Error of


Model R R Square Square the Estimate
.984a .968 .962 2.20389

a. Predictors: (Constant), alcohol level

R2 = 0.968 means that 96.80% of the number of


accidents is attributed to alcohol level and 3.20% is
attributed to other variables not included in the
analysis.
Multiple regression and
multiple correlation
A researcher wants to find out whether there is a relationship
on the final exam of 20 students with their entrance exam,
age and intelligence quotient. Furthermore, he wants to find
out what predictors has significant effects on the final exam of
the students.
Multiple regression & Multiple Correlation
Table 6 case final exam entrance exam age iq
1 35 38 18 120
2 46 44 18.4 125
3 50 46 18.5 122
4 34 37 18.1 130
5 22 27 17.3 127
6 10 22 17 96
7 26 32 17.5 96
8 45 44 17.5 120
9 49 42 17.8 121
10 45 40 18.3 132
11 33 39 18.2 122
12 32 33 17.4 118
13 35 36 17.3 110
14 37 35 17.8 125
15 43 43 18.4 121
16 41 42 18.5 95
17 40 47 18 93
18 44 46 18.8 94
19 36 42 17.6 92
20 42 36 17.4 90
Multiple Regression & Correlation Scatter Plot

Figure 2
Analysis
Table 7
Correlations
final entrance intelligence
exam exam age quotient
final exam Pearson Correlation 1 .892** .651** .196

Sig. (2-tailed) .000 .002 .407

N 20 20 20 20
entrance exam Pearson Correlation .892** 1 .742** .009

Sig. (2-tailed) .000 .000 .969

N 20 20 20 20
age Pearson Correlation .651** .742** 1 .182

Sig. (2-tailed) .002 .000 .442

N 20 20 20 20
intelligence quotient Pearson Correlation .196 .009 .182 1

Sig. (2-tailed) .407 .969 .442

N 20 20 20 20
**. Correlation is significant at the 0.01 level (2-tailed).
Anova table
Table 8 ANOVAb

Sum of Mean
Model Squares df Square F Sig.
1 Regression 1495.088 3 498.363 27.061 .000a

Residual 294.662 16 18.416

Total 1789.750 19

a. Predictors: (Constant), intelligence quotient, entrance exam, age

b. Dependent Variable: final exam

Significant @ 1% level
Interpretation

•There is a significant relationship between


final exam and entrance exam (r=0.892, strong)

•There is a significant relationship with age


(r=0.651, moderate)

• There is no significant relationship with


intelligence quotient ( r = .196 , p= very low)
Table 9
Model Summary

Adjusted R Std. Error of the


Model R R Square Square Estimate
1 .914a .835 .804 4.29143
a. Predictors: (Constant), intelligence quotient, entrance exam, age

R2 = 0.835 means that 83.50% of final exam is attributed to


intelligence quotient, entrance exam and age while 16.50% is
attributed to other variables not included in the analysis.
Table 10 Multiple Regression
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Std.
Model B Error Beta t Sig.
1
(Constant) 2.329 46.444 .050 .961

entrance
exam 1.438 .229 .967 6.274 .000
age -1.993 3.026 -.103 -.659 .520
intelligence
quotient .135 .069 .206 1.960 .068
a. Dependent Variable: final exam
Interpretation:
•Final Exam = 2.329 + 1.438 x entrance exam – 1.993 x age + .135
intelligence quotient
•The only predictors that has significant effects on final exam is the
result of entrance exam
Using Stepwise Regression Analysis, all variables that has no
significant effect on the final exam were removed and variables
which has significant effect were retained.

Final exam = - 13.895 + 1.327 entrance exam

Table 11
Coefficientsa
Standardi
zed
Unstandardized Coefficien
Coefficients ts

Model B Std. Error Beta t Sig.


1 (Constant) -13.895 6.193 -2.244 .038

entrance 1.327 .159 .892 8.370 .000


exam
a. Dependent Variable: final exam
F –test (ANOVA)

Method A Method B Method C a2 b2 c2

84 70 90 7056 4900 8100

90 75 95 8100 5625 9025

92 90 100 8464 8100 10000

96 80 98 9216 6400 9604

84 75 88 7056 5625 7744

88 75 90 7744 5625 8100

∑a = 534 ∑b= 465 ∑c= 561 ∑a2=47636 ∑b2=36275 ∑c2=52573


(∑a + ∑b + ∑c)2 (534 + 465 + 561) 2 (1,560)2
CF = = =
n1 + n2 + n3 6+ 6 + 6 18
CF = 135,200
∑x2 = 47636 + 36275 + 52573 = 136,484
Total Sum Of Squares (TSS)
TSS = ∑x2 - CF = 136,484 – 135200 = 1,284
Between Column Variance (SSb)
SSb = 1/6 ( 5342 +4652+5612) - 135,200 = 817
Within Column Variance (SSw)
SSw = TSS- SSb = 1,284 - 817 = 467
F –test (ANOVA)
ANOVA Table
Source of Sum of df MSS=SS/df F-Value
Variation Squares Computed Tabular

Between
Column

Within
Column

Total
Source of Sum of df MSS=SS/df F-Value
Variation Squares Computed Tabular
5% 1%

Between
Column 817 2 408.5 13.12 3.68 6.36
Within
Column 467 15 31.13
Total 1284 17

Total df = total no of items – 1 = n- 1 = 18-1 = 17


Bet column dfb = no of columns -1 = 3 - 1 = 2
Within column dfw = total df – bet column df = 17 -2 = 15
MSSb = SSb/ dfb = 817/2 = 408.5
MSSw = SSw /dfw = 467/15 = 31.13
F-value = MSSb/MSSw = 408.5/31.13 = 13.12
Fc<Ft accept null hypothesis
Fc>Ft reject null hypothesis

Das könnte Ihnen auch gefallen