Sie sind auf Seite 1von 29

Lecture 01

Date: 05.07.2019
Introduction to Statistical Package for the Social Sciences (SPSS)
SPSS Data Entry

Lecture 02
Date: 12.07.2019
Automation in Management Reporting
Google form = An online based questionnaire sent to respondent over
the Gmail and receive answer through G-mail
Google Scholar = A form of sub search engine of Google to get
scholarly article about the intended topic over the world.

1
Lecture 03
Date: 19.07.2019
Introduction to statistics
1. What is statistics?
Answer: Any tabulated and structured information in form of fact and figures used for decision
making is called statistics.
2. What is Population?
Answer: The collection of individuals with a view to achieve objectives. The size and shape,
nature and manner of population will be determined by the intention of the researcher.
3. What is sample?
Answer: The representative part of population is called sample. The calculation process will be
done on sample but conclusion will be concluded about entire population.
4. What is parameter?
Answer: Aggregate characteristics of population are called parameter. Parameters are the
calculation of sample’s features like
i. Mean
ii. Median
iii. Mode
5. What are the types of Statistics?
Answer: There are two types of statistics.
i. Descriptive Statists: Liberalize: Functions done on sampling
ii. Inferential statistics: Science : Conclusion drawn on population
6. What is variable?
Answer: A variable is any characteristics, number, or quantity that can be measured or counted.
A variable may also be called a data item. Age, Gender, business income and expenses, country
of birth, capital expenditure, class grades, and eye color and vehicle type are examples of
variables
7. What are the types of Variables?
Answer: Common types of variables are qualitative and quantitative variable.
Variable that has value in number is called quantitative variables. On the contrary, variables that
have no value in number, rather express quality is called qualitative variable.

2
***1st Scales of measurement is identified then data is needed to entry****
8. What are scales of measurement?
Answer: Scales of measurement refer to ways in which variables/numbers are defined and
categorized.
9. What are the types of Scales of measurement?
Answer: There are four types of Scales of measurement which are
i. Nominal: Scales are often called qualitative scales, and measurements made on
qualitative scales are called qualitative data.
If numbers are assigned as labels in nominal measurement, they have no specific
numerical value or meaning.
No form of arithmetic computation (+, −, ×, etc.) may be performed on nominal
measures.
Equality, Inequality, mode are used in this scale of measurement.
ii. Ordinal: The ordinal type allows for rank order (1st, 2nd, 3rd, etc.) by which data can
be sorted. The median, i.e. middle-ranked, item is allowed as the measure of central
tendency; however, the mean (or average) as the measure of central tendency is not
allowed. The mode is allowed.
iii. Interval: The interval type allows for the degree of difference between items
The mode, median, and arithmetic mean are allowed to measure central tendency of
interval variables, while measures of statistical dispersion include range and standard
deviation.
iv. Ratio: All statistical measures are allowed because all necessary mathematical
operations are defined for the ratio scale.

10. What are the steps of data processing in analysis as per variable?
Answer: There are four steps of data processing in analysis that are:
i. Counting: Nominal
ii. Ranking : Ordinal
iii. Addition: Interval
iv. Subtraction: Ratio

3
11. What are the analytical tools of descriptive statistics?
Answer: Describing the samples is called descriptive statistics.
S.N Number of Variable Descriptive statistics tools
1 Uni-variate i. Frequency Distribution
ii. Central measure: Mean, Median, Mode.
iii. Dispersion: Standard Deviation, Variance, CV
iv. Shape Characteristics: Skewness, Kertosis
2 Bi-Variate i. Qualitative variable: Cross Tab, chi-square
ii. Quantitative variable: Correlation.
iii. Relational variables both Quantitative and qualitative
variables: Regression Analysis.
3 Multi-variate i. Causal facts : ANOVA
ii. Regression Analysis.

12. What is the process of analysis of inferential statistics?


Answer: There are three steps in analyzing of inferential statistics which are as follows:
i. Point of estimation: Mean of sample will be regarded as mean of population
Sample Mean µ
ii. Interval estimation: Estimate the confidence interval. Generally 95% confidence level
is standard in research.
iii. Test of hypothesis: H0 : Null hypothesis
H1: Alternative hypothesis
The researcher’s own statement is called the alternative hypothesis.
The null hypothesis is the opposite of alternative hypothesis.
Rule of thumb in case of interval estimation of inferential statistics
1. µ - s to µ + s = 68% data
2. µ - 2s to µ + 2s = 95% data
3. µ - 3s to µ + 3s = 99.7% data
Here, µ = Mean
s = Sample Standard deviation
Bell Shape Curve = Normal distribution curve.
[µ - ks to µ + ks with content at least (1-1/k2)100%, where k=2]
Likart scale is used to collect qualitative data
Transforming data from Excel to SPSS:
Open SPSS > File > Open > Data > x/s or S/x or s/m type file name > save > Ok.
4
Lecture 04
Date: 26.07.2019
1. What is mean?
Answer: Mean: Centre of data
Sample Mean = x̄
Population mean = µ
Point of estimation is considering the sample mean as population mean.
Mean = Summation of all data/ Number of data
2. What is Standard Deviation? Wrote down a formula of SD.
Answer: The standard deviation measures the spread of the data about the mean value. It is
useful in comparing sets of data which may have the same mean but a different range. For
example, the mean of the following two is the same: 15, 15, 15, 14, 16 and 2, 7, 14, 22, 30
3. What is Coefficient of variation?
Answer: SD/Mean
The coefficient of variation (CV) is a measure of relative variability. It is the ratio of the standard
deviation to the mean (average). For example, the expression “The standard deviation is 15% of
the mean” is a CV.

***Simplest measurement of data

1. Highest Lowest
2. Maximum Minimum range
***Better measurement of data : All observation/values should be included.

Some arithmetical tools for analyzing data

1. Mean: Average of data


2. Medan: Middle point of data distribution.
3. Mode: The maximum happening of data in the distribution
4. Standard deviation: The difference of every data from the mean. The standard
deviation is a statistic that measures the dispersion of a dataset relative to its mean
and is calculated as the square root of the variance.
N-1: because last one is free.
If arbitrary number remains e.z 50 then only n

5
5. Co-efficient of Variation: The coefficient of variation represents the ratio of the
standard deviation to the mean, and it is a useful statistic for comparing the degree of
variation from one data series to another, even if the means are drastically different
from one another.
CV = SD/M
CV is the variation relative to mean.
It is used to compare.
6. Variance: Variance (σ2) in statistics is a measurement of the spread between numbers
in a data set. That is, it measures how far each number in the set is from the mean and
therefore from every other number in the set.
7. Skewness: The asymmetry in a statistical distribution is skewness, in which the curve
appears distorted or skewed either to the left or to the right. Skewness can be quantified
to define the extent to which a distribution differs from a normal distribution. If skewness
is positive, the data are positively skewed or skewed right, meaning that the right tail of
the distribution is longer than the left. If skewness is negative, the data are negatively
skewed or skewed left, meaning that the left tail is longer.
8. Kurtosis: The sharpness of the peak of a frequency-distribution curve.
Whereas skewness differentiates extreme values in one versus the other tail, kurtosis
measures extreme values in either tail. Distributions with large kurtosis exhibit tail data
exceeding the tails of the normal distribution (e.g., five or more standard deviations from
the mean).

6
Lecture 05 and 06
Date 02.08.2019 and Date 30.08.2019
1. Uni-Variate variable – Qualitative – Frequency
- Quantitative – Descriptive
2. Bi- Variate variable - Qualitative Vs. Qualitative Cross-tab [Association]
- Qualitative Vs. Quantitative Cross-tab [Association]
- Quantitative Vs. Quantitative Correlation
[Relation is the lowest is -1 to highest +1] [If 0, then no relation]
Problem Solution
SPSS > IBM > Samples > English > Employee data
1. Qualitative Vs. Qualitative = Educational quality Vs. Gender
SPSS > Analyze > Descriptive > Cross tab > Row > Job Category > Colum > Gender >
Ok [For % showing Cell].

Output tables
Case Processing Summary
Cases
Valid Missing Total
N Percent N Percent N Percent
Employment
Category * 474 100.0% 0 0.0% 474 100.0%
Gender

Employment Category * Gender Crosstabulation


Count
Gender Total
Female Male
Clerical 206 157 363
Employment Category Custodial 0 27 27
Manager 10 74 84
Total 216 258 474

7
Chi-Square Tests
Value df Asymp. Sig. (2-sided)
a
Pearson Chi-Square 79.277 2 .000
Likelihood Ratio 95.463 2 .000
N of Valid Cases 474
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count
is 12.30.
Relation is significant.

2. Quantitative Vs. Quantitative = Beginning salary Vs. Current salary [employee sample]
SPSS > Analyze > correlate > Bi-variate > Current Salary > Beginning salary> Ok
Output Table
Correlations
Beginning Current Salary
Salary
Pearson Correlation 1 .880**
Beginning Salary Sig. (2-tailed) .000
N 474 474
Pearson Correlation .880** 1
Current Salary Sig. (2-tailed) .000
N 474 474
**. Correlation is significant at the 0.01 level (2-tailed).
Relationship is significant. Relation between Current salary and beginning salary is 88%.
3. Qualitative Vs Quantitative: [Cross tab]
SPSS > Analyze > Descriptive > Cross tab > Row > Current salary > Colum > Job
Category > Ok
Chi-Square Tests
Value df Asymp. Sig. (2-sided)
a
Pearson Chi-Square 758.755 440 .000
Likelihood Ratio 568.197 440 .000
Linear-by-Linear
287.858 1 .000
Association
N of Valid Cases 474
a. 656 cells (98.9%) have expected count less than 5. The minimum expected count is .06.
No relation between Current salary and Job category.

8
Again, Current salary will be recoded into qualitative data;
Current salary can be recoded into different variable into some ranges named “New current
salary”
SPSS > Transform > Recode into different variable > Current salary > Name” Ne Current
salary > Change > Old New Value > Range from 15,000 -29999 = 1 Add
30,000 – 44999 = 2 Add
45000- 59999 = 3 Add
60000 – 74999 = 4 Add
75000 – 89999 = 5 Add
90000 – 104999= 6 Add
105000- 119999 = 7 Add
120000- 135000=8 Add. Ok.
SPSS > Analyze > Descriptive > Cross tab > Row > New_Current salary > Colum > Job
Category > Ok
Output Tables
Ne Current salary * Employment Category Crosstabulation
Count
Employment Category Total
Clerical Custodial Manager
15000-29999 255 3 0 258
30000-44999 98 24 10 132
45000-59999 8 0 29 37
60000-74999 1 0 27 28
Ne Current salary
75000-89999 1 0 9 10
90000-104999 0 0 7 7
105000-119999 0 0 1 1
120000-135000 0 0 1 1
Total 363 27 84 474

Chi-Square Tests
Value df Asymp. Sig. (2-sided)
a
Pearson Chi-Square 405.708 14 .000
Likelihood Ratio 360.873 14 .000
Linear-by-Linear
287.677 1 .000
Association
N of Valid Cases 474
a. 13 cells (54.2%) have expected count less than 5. The minimum expected count is .06.

No relation is existed between current salary and job category

9
Quantitative Vs Quantitative: [Salary Vs Experience]

Correlations
Current Salary Previous
Experience
(months)
Pearson Correlation 1 -.097*
Current Salary Sig. (2-tailed) .034
N 474 474
Pearson Correlation -.097* 1
Previous Experience
Sig. (2-tailed) .034
(months)
N 474 474
*. Correlation is significant at the 0.05 level (2-tailed).
Pearson Correlation is -.097, Significance level is 3.4%. Negatively related Current Salary is with
experience. That means Low salary is of experienced employees. Most of the employees are clerk. Proof
is the frequency of employment category which is the below

Employment Category
Frequency Percent Valid Percent Cumulative
Percent
Clerical 363 76.6% 76.6 76.6
Custodial 27 5.7% 5.7 82.3
Valid
Manager 84 17.7% 17.7 100.0
Total 474 100.0 100.0

Relationship between job category and Job experience


Job experience should be categorized b ranging:
Transform > Recode in to diff. variable> Name > New Exp > 0 to 60 months =1
61 to 120 months = 2
121 to 180 months =3
181 to 240 months = 4
241 to 300 months = 5
301 to 360 months = 6
361 to 420 months = 7
421 to 480 months = 8

10
SPSS > Analyze > Descriptive > Cross tab > column > Experience range > Row > Employment
Category > statistics > chi-square> ok

Output Tables
Experience range * Employment Category Crosstabulation
Count
Employment Category Total
Clerical Custodial Manager
0-60 months 206 0 46 252
61 - 120 months 69 0 16 85
121 - 180 months 34 5 13 52
181 - 240 months 24 4 4 32
Experience range
241 - 300 months 11 3 5 19
301 - 360 months 9 7 0 16
361 - 420 months 8 3 0 11
421 - 480 months 2 5 0 7
Total 363 27 84 474

Chi-Square Tests
Value df Asymp. Sig. (2-sided)
a
Pearson Chi-Square 144.154 14 .000
Likelihood Ratio 102.323 14 .000
Linear-by-Linear
2.076 1 .150
Association
N of Valid Cases 474
a. 11 cells (45.8%) have expected count less than 5. The minimum expected count is .40.

Linear-by-Linear Association is insignificant.


***Salary increase per month
SPSS > Transform > Compute > Target variable > Salary increase per month =
[Current salary – Beginning salary]/Months since hire > Ok.
***New salary with 10% Bonus
SPSS > Transform > Compute > Target variable > New_salary =
[Current salary+ 10 % of Current salary] > Ok.
There is a group of people taking sample Mean and concluding about whole population
Sample Choosing should be randomly, Otherwise chance of biasness.
***Random sample is the best sampling method****

11
Lecture 07
Date: 06.09.2019

Sampling distribution
Distribution of sample mean is called Sampling distribution.
Descriptive statistics is not sufficient to describe objectives. So there is need of inferential
statistics.
Process of Inferential Statistics:
1. Point estimation – Sample mean
2. Interval estimation - µ- 2S to µ+ 2S = 95% data is the most effective inference.
3. Testing the hypothesis H0: Null Hypothesis
H1: Alternative Hypothesis

Normal distribution
Interval estimation - µ- 2S to µ+ 2S = 95% data, 2 is approx, it is rather 1.96.

T – Test
A. One sample T-Test
SPSS >Analyze > Compare Means > One Sample T-Test > Educational Level > Ok

One-Sample Statistics
N Mean Std. Deviation Std. Error
Mean
Educational Level
474 13.49 2.885 .133
(years)

One-Sample Test
Test Value = 0
t df Sig. (2-tailed) Mean Difference 95% Confidence Interval of the
Difference
Lower Upper
Educational
Level 101.819 473 .000 13.492 13.23 13.75
(years)

Significance is 0, T test is significant.

12
Problem 1
Do you think that Average salary is less than 35000?

Solution
1. Null Hypothesis: Average salary is equal and more than 35000, H0: µ>=35000
2. Alternative Hypothesis: Average salary is less than 35000 H1: µ < 35000
[If t value is left then null is rejected]
T - Value at Centre = 0
T - Value at right = (+)
T - Value at left = (-)
Way
SPSS >Analyze > Compare Means > One Sample T-Test > Educational Level > Test Value =
35000O> Ok
Output table

One-Sample Statistics
N Mean Std. Deviation Std. Error
Mean
Current Salary 474 $34,419.57 $17,075.661 $784.311

One-Sample Test
Test Value = 35000
t df Sig. (2-tailed) Mean 95% Confidence Interval of the
Difference Difference
Lower Upper
Current Salary -.740 473 .460 -$580.432 -$2,121.60 $960.73

Not enough evidence of being average salary is less than 35000 as error significant is 23%.
Lower value -2121.60+35000 = 32,787.60
Upper value 960.76 + 35000 = 35,960.73
It will be µ =35000
If lower value is 31,200 and upper value is 34,000 then null will be rejected.

13
Lecture: 08
Date: 13/09/2019

Problem 2.
Do you think that Average current salary is higher than 34000?
Solution
1. Null Hypothesis: Average salary is equal and less than 34000, H0: µ< = 34000
2. Alternative Hypothesis: Average salary is higher than 34000 H1: µ > 35000
Way
SPSS >Analyze > Compare Means > One Sample T-Test > Educational Level > Test value =
34000 > Ok
Output Table:
One-Sample Statistics
N Mean Std. Deviation Std. Error Mean
Current Salary 474 $34,419.57 $17,075.661 $784.311

One-Sample Test
Test Value = 34000
t df Sig. (2-tailed) Mean 95% Confidence Interval of the
Difference Difference
Lower Upper
Current Salary .535 473 .593 $419.568 -$1,121.60 $1,960.73

Significance is .593/2 = 0.2965, 29.65%, Large error, so we are failed to reject null hypothesis.
Findings is Average salary is equal and less than 34000.

14
Problem 3
Do you think that Average job time is less than 96 months?
Solution
1. Null Hypothesis: Average salary is equal and higher than 34000, H0: µ>= 96 months
2. Alternative Hypothesis: Average job time is less than 8 months H1: µ < 96 months
[If t value is left then null is rejected]
[T Value should be (-) null is rejected]
Way
SPSS >Analyze > Compare Means > One Sample T-Test > Job Tine > Test value = 96 > Ok
Output Tables:
One-Sample Statistics
N Mean Std. Std. Error
Deviation Mean
Months since
474 81.11 10.061 .462
Hire

One-Sample Test
Test Value = 96
t df Sig. (2-tailed) Mean 95% Confidence Interval of the
Difference Difference
Lower Upper
Months
since -32.222 473 .000 -14.890 -15.80 -13.98
Hire
t = - 32.22 negative, significance is 0.00. Therefore null is rejected.

To draw inference/conclusion on T Test the steps are


Step 1: Set up Alternative Hypothesis
Step 2: Set up Null Hypothesis
Step 3: Observe T value: Whether T value is (-) left or (+) right.
[If Alt hypo is > then T should have + value then null will be rejected]
[If Alt hypo is < then T should have - value then null will be rejected]
Step 4: Observe Sig. (2-tailed) value, whether [Sig. (2-tailed) value]/2 lower than 0.05 or not.
[If [Sig. (2-tailed) value]/2 is lower than .05 then null is rejected]
[In some cases for Social science [Sig. (2-tailed) value]/2 may be 0.10 tolerable]

15
Then comment and interprets the findings.

***In case of means of means ‘one sample T Test’ is essential.


***Otherwise researchers do not like ‘one sample t-test’ because it needs bch mark.

Now no bench mark, Tten it is


(B). ‘Independent Sample T –test’
Problem 1
In the comparison of air pollution of Dhaka City and Chittagong city, it is observed that pollution
of Dhaka city is greater than that of Chittagong city. Do you agree?
Let,
Air pollution mean of Dhaka city is µm

Air pollution mean of Chittagong city is µc

Null Hypo: Air pollution mean of Dhaka city is less or equal than of Chittagong H0: µm <= µc
Alt. Hypo: Air pollution mean of Dhaka city is greater than that of Chittagong H1: µm > µc

Problem 2
In the comparison of salary of male and female, it is assumed that the average salary of male is
not equal to that of female.
Let,
Average salary of male is µm

Average salary of female is µf

Null Hypo: Average salary of male is equal to that of female H0: µm = µf


Alt. Hypo: Average salary of male is not equal to that of female H1: µm ≠ µf
SPSS >Analyze > Compare Means > Independent Sample T-Test > Test Variable: Current
Salary > Grouping Variable: Gender > Ok

16
Output Tables
Group Statistics
Gender N Mean Std. Deviation Std. Error
Mean
Male 258 $41,441.78 $19,499.214 $1,213.968
Current Salary
Female 216 $26,031.92 $7,558.021 $514.258

Independent Samples Test


Levene's Test for t-test for Equality of Means
Equality of Variances
F Sig. T df Sig. Mean Std. Error 95% Confidence
(2-tailed) Difference Difference Interval of the
Difference
Lower Upper
Equal
variances 119.669 .000 10.945 472 .000 $15,409.862 $1,407.906 $12,643.322 $18,176.401
Current assumed
Salary Equal
variances 11.688 344.262 .000 $15,409.862 $1,318.400 $12,816.728 $18,002.996
not assumed

In case of equal variance assumed as well as not assumed significance is good.


We have taken equal variance not assumed t is 11.68, significance is 0, that means null is
rejected,
Therefore, Average salary of male is not equal to that of female.

Problem 3
In the comparison of salary of male and female, it is assumed that the average salary of male is
greater than that of female.
Let,
Average salary of male is µm

Average salary of female is µf

Null Hypo: Average salary of male is less or equal to that of female H0: µm <= µf
Alt. Hypo: Average salary of male is greater than that of female H1: µm > µf
SPSS >Analyze > Compare Means > Independent Sample T-Test > Test Variable: Current
Salary > Grouping Variable: Gender > Ok
Output Tables
Group Statistics
Gender N Mean Std. Deviation Std. Error Mean
Male 258 $41,441.78 $19,499.214 $1,213.968
Current Salary
Female 216 $26,031.92 $7,558.021 $514.258

17
Independent Samples Test
Levene's Test for t-test for Equality of Means
Equality of Variances
F Sig. T df Sig. Mean Std. Error 95% Confidence
(2-tailed) Difference Difference Interval of the
Difference
Lower Upper
Equal
variances 119.669 .000 10.945 472 .000 $15,409.862 $1,407.906 $12,643.322 $18,176.40
assumed 1
Current
Equal
Salary
variances
11.688 344.262 .000 $15,409.862 $1,318.400 $12,816.728 $18,002.99
not
6
assumed

In case of equal variance assumed as well as not assumed significance is good.


We have taken equal variance not assumed t is positive, 11.68, significance is 0, that means null
is rejected, Right tailed test. T is positive and significance is 0, So null is rejected.
Therefore, Average salary of male is greater than that of female.

Problem 4
In the comparison of previous job experience of male and female, it is assumed that the average
previous job experience of male is greater than that of female.
Let,
Average previous job experience of male is µm

Average previous job experience of female is µf

Null Hypo: Average previous job experience of male is less/equal to that of female H0: µm <= µf
Alt. Hypo: Average previous job experience of male is greater than that of female H1: µ m > µf
SPSS >Analyze > Compare Means > Independent Sample T-Test > Test Variable: previous job
experience > Grouping Variable: Gender > Ok
Output Tables

Group Statistics
Gender N Mean Std. Deviation Std. Error
Mean
Previous Experience Male 258 111.62 109.692 6.829
(months) Female 216 77.04 95.012 6.465

18
Independent Samples Test
Levene's Test t-test for Equality of Means
for Equality
of Variances
F Sig. T df Sig. Mean Std. Error 95% Confidence
(2-tailed) Difference Difference Interval of the
Difference
Lower Upper
Equal variances
Previous 2.582 .109 3.631 472 .000 34.583 9.524 15.869 53.297
assumed
Experience
Equal variances
(months) 3.678 471.444 .000 34.583 9.404 16.105 53.062
not assumed

From the above tables,


In case of Lavene’s F test significance is higher so for assumed equal variance T test is
insignificant and null is rejected. In case of not equal t test result is 3.678, positive and
significance is 0.000, so null is rejected and comment is Average previous job experience of
male is greater than that of female.

C. Paired Sample T -Test


Same variables measured in different two spot is called paired sample. Test with this paired
variable is called Paired sample T-test.
Examples: Current salary and Beginning salary.
Problem 1
Do you think average current salary is greater than average beginning salary?
Solution
Let,
Average current salary is µc

Average beginning salary is µb

Null Hypo: Average current salary is less or equal to average beginning salary H0: µc <= µb
Alt. Hypo: Average current salary is greater than average beginning salary H1: µc > µb
SPSS >Analyze > Compare Means > Paired Sample > current salary: Beginning salary > Ok
Output Tables
Paired Samples Statistics
Mean N Std. Deviation Std. Error Mean
Current Salary $34,419.57 474 $17,075.661 $784.311
Pair 1
Beginning Salary $17,016.09 474 $7,870.638 $361.510

19
Paired Samples Correlations
N Correlation Sig.
Current Salary & Beginning
Pair 1 474 .880 .000
Salary

Paired Samples Test


Paired Differences t df Sig.
Mean Std. Std. Error 95% Confidence (2
Deviation Mean Interval of the -tailed)
Difference
Lower Upper
P Current
ai Salary –
$17,403.481 $10,814.620 $496.732 $16,427.407 $18,379.555 35.036 473 .000
r Beginning
1 Salary

From the above tables, it is observed that T value is positive 35.036, besides, the significance is
0.00, that means null is rejected and comment is Average current salary is greater than average
beginning salary.
D. ANOVA
One way Analysis of variance is ANOVA. In case of many categories of an independent variable
as factor list the ANOVA test is used.
Problem 1
Job Category has three kinds that are manager, custodian and clerk. The researcher wants to
know whether three categories of Job holder have same current salary or not. Prove that.
Let, Average salary of Manger is µm
Average salary of Custodian is µc
Average salary of Clerk is µcl
Null Hypo: Average current salary of Manger, Custodian and clerk are equal H0: µm =µc = µcl
Alt. Hypo: Average current salary of Manger, Custodian and clerk are not equal H0: µm ≠µc ≠ µcl

SPSS >Analyze > Compare Means > ANAOVA > current salary > Factor list > Job Cat. > Ok

Output Tables
ANOVA
Current Salary
Sum of Squares df Mean Square F Sig.
Between Groups 89438483925.943 2 44719241962.971 434.481 .000
Within Groups 48478011510.397 471 102925714.459
Total 137916495436.340 473
Significance is 0.00, So null is rejected, Comment that average current salary of Manger,
Custodian and clerk are not equal.

20
Lecture 09
Date 20.09.2019
Descriptive Statistics:
Qualitative Variable Vs Qualitative Variable: Association > Cross tab
Null Hypo: H0: There is no association between Gender and Job Category
Alt. Hypo: H1: There is association s between Gender and Job Category
Solution: Test can be done by Cross tab and chi-Square.
SPSS > Analyze > Descriptive > Cross tab> Column: Gender and Row: Job Category > Statistics
> Chi - Square > Continue > Ok
Output Tables:

Case Processing Summary


Cases
Valid Missing Total
N Percent N Percent N Percent
Employment Category *
474 100.0% 0 0.0% 474 100.0%
Gender

Employment Category * Gender Crosstabulation


Count
Gender Total
Female Male
Clerical 206 157 363
Employment Category Custodial 0 27 27
Manager 10 74 84
Total 216 258 474

Chi-Square Tests
Value df Asymp. Sig. (2-sided)
Pearson Chi-Square 79.277a 2 .000
Likelihood Ratio 95.463 2 .000
N of Valid Cases 474
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 12.30.
Pearson Chi-Square is 793.277 with 0.000 significance. That means there is strong association between
Gender and Job Category. Relationship is meaningful
Problem
Null Hypo: H0: There is no association between Minority and Job Category
Alt. Hypo: H1: There is association s between Minority and Job Category

21
Solution: Test can be done by Cross tab and chi-Square.
SPSS > Analyze > Descriptive > Cross tab> Column: Minority and Row: Job Category >
Statistics > Chi - Square > Continue > Ok

Case Processing Summary


Cases
Valid Missing Total
N Percent N Percent N Percent
Employment Category *
474 100.0% 0 0.0% 474 100.0%
Minority Classification

Employment Category * Minority Classification Crosstabulation


Count
Minority Total
Classification
No Yes
Clerical 276 87 363
Employment Category Custodial 14 13 27
Manager 80 4 84
Total 370 104 474

Chi-Square Tests
Value df Asymp. Sig. (2-sided)
Pearson Chi-Square 26.172a 2 .000
Likelihood Ratio 29.436 2 .000
Linear-by-Linear
9.778 1 .002
Association
N of Valid Cases 474
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 5.92.
Pearson Chi-Square is 29.172 with 0.000 significance. That means there is strong association between
Minority and Job Category. Relationship is meaningful
Chi square A chi square (χ2) statistic is a test that measures how expectations compare to actual
observed data. The Formula for Chi Square is
χc2 = ∑(Oi−Ei)2
Ei
Where, c = Degrees of freedom, O= Observed value(s), E = Expected value
r = Sample Correlation
P = Population correlation
***If one variable is existed then all‘t test’ can be done.
***If two variables are existed then for two qualitative cross tab and chi-square
If numeric/ quantitative variables are existed, then correlation is done.

22
Lecture: 10
Date: 27/09/2019
Linear regression

Lecture 11
Date 04/10/2019
Binary Logistic Regression
If
Analyze > Regression > Dependent Variable > Dummy Variable: Manager or Not > Independent
Variable: Gender, Minority, Educational Level, Current Salary, Beginning Salary, Job Time,
Previous Experience > Categorical (Gender) > Continue > Ok.
Output Table
Model Summary
Step -2 Log Cox & Snell R Nagelkerke R Square
likelihood Square
a
1 70.965 .544 .895
a. Estimation terminated at iteration number 9 because parameter estimates changed by less than .001.

Classification Tablea
Observed Predicted
Manager Or Not Percentage
.00 1.00 Correct
.00 384 6 98.5
Manager Or Not
Step 1 1.00 9 75 89.3
Overall Percentage 96.8
a. The cut value is .500

23
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
minority -3.629 2.462 2.172 1 .141 .027
prevexp -.003 .007 .259 1 .611 .997
jobtime -.022 .034 .443 1 .506 .978
salbegin .001 .000 14.588 1 .000 1.001
Step 1a
salary .000 .000 10.025 1 .002 1.000
educ .417 .314 1.769 1 .184 1.517
gender(1) 1.466 .739 3.929 1 .047 4.331
Constant -20.839 5.829 12.781 1 .000 .000
a. Variable(s) entered on step 1: minority, prevexp, jobtime, salbegin, salary, educ, gender.

Here
R Square is 89.5%, it is good. [Above of 25%]
Manager not is true 384 and false 6 and manger true 9 false 75, overall fitness is 96.8%, [if above
80% then it is fit].
Error of Significance is higher for job time is..506 and job time is .611
Then, eliminating that two variables the outcome will be the following
Analyze > Regression > Dependent Variable > Dummy Variable: Manager or Not > Independent
Variable: Gender, Minority, Educational Level, Current Salary, Beginning Salary, > Categorical
(Gender, Minority) > Continue > Ok.
Outcome Table
Model Summary
Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square
1 71.813a .543 .894
a. Estimation terminated at iteration number 10 because parameter estimates changed by less than .001.

Classification Tablea
Observed Predicted
Manager or Not Percentage Correct
.00 1.00
.00 385 5 98.7
Manager or Not
Step 1 1.00 7 77 91.7
Overall Percentage 97.5
a. The cut value is .500

24
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
gender(1) 1.482 .734 4.075 1 .044 4.401
educ .510 .282 3.271 1 .071 1.665
salary .000 .000 12.094 1 .001 1.000
Step 1a
salbegin .000 .000 17.425 1 .000 1.000
minority(1) 3.791 2.128 3.172 1 .075 44.281
Constant -27.528 5.721 23.153 1 .000 .000
a. Variable(s) entered on step 1: gender, educ, salary, salbegin, minority.

Here
R Square is 89.4%, it is good.
Overall fitness is 97.5%, [if above 80% then it is fit].
Beta co-efficient is 0 for Current salary and beginning salary.
Then, eliminating that two variables the outcome will be the following
Analyze > Regression > Dependent Variable > Dummy Variable: Manager or Not > Independent
Variable: Gender, Minority, Educational Level > Categorical (Gender, Minority) > Continue >
Ok.

Model Summary
Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square
a
1 167.441 .441 .726
a. Estimation terminated at iteration number 9 because parameter estimates changed by less than .001.

Classification Tablea
Observed Predicted
Manager or Not Percentage
.00 1.00 Correct
.00 380 10 97.4
Manager or Not
Step 1 1.00 17 67 79.8
Overall Percentage 94.3
a. The cut value is .500

25
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
gender(1) -.895 .447 4.013 1 .045 .409
educ 1.773 .275 41.625 1 .000 5.886
Step 1a
minority(1) 2.325 .794 8.573 1 .003 10.230
Constant -30.520 4.559 44.823 1 .000 .000
a. Variable(s) entered on step 1: gender, educ, minority.

Here,
R Square is 72.6%, it is good. [Above of 25%]
Observed and predicted true as ‘not Manager’ is 380 and observed and predicted true as
‘Manager’ is 67.
Overall fitness is 94.3%, [if above 80% then it is fit].
Manager Or Not = Constant + B1 Gender + B2Eucational Background + B3 Minority
Manager or Not = -30.520 - .895X Gender + 1.773 Educational Level + 2.325X Minority

Lecture 12
Date 11/10/2019
Non Linear Regression
Multiple Logistic Regressions

Lecture 13
Date 18/10/2019
Explore and Binary and Linear Regression
IBM > Telco
Do you think months with services have any influence with marital status, age, Level of
education, gender and Number of people of Household? 5 Independent variables.
Linear Regression
Analyze > Regression > Linear > Dependent vriable: Moths With service > Independent
variables are above 5 > Ok.

26
Variables Entered/Removeda
Model Variables Entered Variables Removed Method
Gender,
Level of education,
1 Marital status, . Enter
Age in years,
Number of people in householdb
a. Dependent Variable: Months with service
b. All requested variables entered.

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .518a .269 .265 18.314
a. Predictors: (Constant), Gender, Level of education, Marital status, Age in years, Number of
people in household

ANOVAa
Model Sum of Squares df Mean Square F Sig.
Regression 122409.950 5 24481.990 72.996 .000b
1 Residual 333375.374 994 335.388
Total 455785.324 999
a. Dependent Variable: Months with service
b. Predictors: (Constant), Gender, Level of education, Marital status, Age in years, Number of
people in household

Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta
(Constant) -.503 3.067 -.164 .870
16.64
Age in years .819 .049 .482 .000
5
Marital status 7.498 1.514 .176 4.953 .000
1
Level of education -.614 .480 -.035 -1.280 .201
Number of people in
-.349 .546 -.023 -.639 .523
household
Gender 1.204 1.159 .028 1.039 .299
a. Dependent Variable: Months with service
R Square is 26.5%,
Dependent Variable is explained by 5 independent variable 26.5%. [Not good]
Higher significance bearing variables are three which are i. Level of education
ii. Number of people in household
iii. Gender
These three variables should be eliminated.
Then, Analyze > Regression > Linear > Dependent variable: Moths With service > Independent
variables are Age and marital status > Ok.
27
Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .516a .266 .265 18.314
a. Predictors: (Constant), Marital status, Age in years

ANOVAa
Model Sum of Squares df Mean Square F Sig.
Regression 121384.223 2 60692.111 180.950 .000b
1 Residual 334401.101 997 335.407
Total 455785.324 999
a. Dependent Variable: Months with service
b. Predictors: (Constant), Marital status, Age in years

Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta
(Constant) -2.802 2.097 -1.337 .182
1 Age in years .838 .046 .493 18.155 .000
Marital status 6.887 1.158 .161 5.945 .000
a. Dependent Variable: Months with service
Months with service = -2.802 + .838 Age + 6.887 Marital status.

Lecture 14
Dated 01.11.2019
Topic
1. Missing data
2. Explore
3. Dummy Variable
4. Logistic Regression
5. T- Test Statistics
6. Global Test
7. Co-Efficient of Categorical Variable
8. Logical Set Up

28
Discussion
1. Missing Data: Value absent is called missing data. One is System missing.
Point to understand missing data.
How to handle missing data
Example: Cross tab between gender and Job Categories.
2. Explore: It is descriptive process. Frequency is descriptive. Descriptive is in different
categories. Salary is quantitative, mean, standard deviation for the male and female.
Analyze > Descriptive> Explore > current salary and male female.
Information of explore: a. stem leaf diagram
b. Box Plot
When Explore:
1. Bulk descriptive
2. Each variable is needed to present
3. Compare to each other descriptive information
Explored findings all are not needed to report in the dissertation.

Lecture 15
Dated 08/11/2019
Closing

29

Das könnte Ihnen auch gefallen