Sie sind auf Seite 1von 11

DNSC

4900 – Introduction to Business Analytics – Fall 2017


Assignment 3 Due: Monday Nov 6, 2017 - 11:59 PM ET
-----------------------------------------------------------------------

• Assignment 3 is due Monday Nov 6, 2017 - 11:59 PM ET. Please note that the due date will not be
extended.

• There will be a penalty for late submission (25% reduction) if you submit it after the due date but
before Tuesday Nov 7, 2017 at 11:59 PM ET. No submission will be accepted after that.

• Please use blackboard to submit your assignment HARD COPIES OR EMAIL SUBMISSIONS WILL NOT BE
ACCEPTED!
• You need to submit a final written report. In your report make sure all the questions are answered and
addressed explicitly, and also add your critical evaluation of the situation and the analysis method used
to answer these questions.

• The final report should be uploaded in the PDF format. (If you don't have a PDF printer installed on
your computer, you can search for a free word to PDF converter online).

• You should also submit all your supporting files (Excel, JMP, Tableau etc) with your report. In so doing,
make sure to name your files accordingly. For example, name each sheet in your excel file as Q1, Q2,
Q3, etc.

• You are only allowed a SINGLE attempt to upload your files. So upload them only after you finalize
your solutions. Email the TA if you have any issues doing so.
Question 1 Demand for systems analysts in the consulting industry is greater than ever.
Graduates with a combination of business and computer knowledge—some even from liberal arts
programs—are getting great offers from consulting companies. Once these people are hired, they
frequently switch from one company to another as competing companies lure them away with
even better offers. One consulting company, D&Y, has collected data on a sample of systems
analysts with undergraduate degrees they hired several years ago. The data are in the file
Retention.xlsx. The variables are as follows:

Variable Description

StartSal Employee's starting salary at D&Y

OnRoadPct Percentage of time employee has spent on the road with clients

CISDegree Whether the employee majored in Computer Information Systems or a similar computer-related
area
Stayed3Yrs Whether the employee stayed at least three years with D&Y

D&Y is trying to learn everything it can about retention of these valuable employees. You can help
by solving the following problems and then, based on your analysis, presenting a report to D&Y.
(Note: You can benefit from using Pivot Tables in answering some of the questions)

a. Although starting salaries are in a fairly narrow band, D&Y wonders whether they have
anything to do with retention. Find a 95% confidence interval for the mean starting salary
of all employees who stay at least three years with D&Y. Do the same for those who leave
before three years.

• The 95% confidence interval for the mean starting salary of all employees who
stay at least three years with D&Y has an upper limit of $38,460.09 and a lower
limit of $37,512.64. This interval represents that we are 95% confident the true
mean lies within those parameters.

• The 95% confidence interval for the mean starting salary of employees who leave
before three years with D&Y has an upper limit of $39,214.40 and a lower limit of
$37,897.43. This interval represents that we are 95% confident the true mean lies
within those parameters.

b. Among all employees whose starting salary is below the median ($37,750), what
proportion stay with D&Y for at least three years? Do the same for the proportion of
employees which their starting salary is above median ($37,750) and stay for at least three
years.

• Among all employees whose starting salary is below the median, the proportion
that stay with D&Y for at least three years is 60.61%. On the other hand, among
the employees whose starting salary is above the median, the proportion that
stay for at least three years is 36.36%.

c. D&Y wonders whether the percentage of time on the road might influence who stays and
who leaves. Find a 95% confidence interval for the percentage of time on road of all
employees who stay at least three years with D&Y. Do the same for those who leave
before three years.

• The 95% confidence interval for the percentage of time on the road of all
employees who stay at least three years with D&Y has an upper limit of 70.19%
and a lower limit of 57.99%. This interval represents that we are 95% confident the
true mean lies within those parameters.

• The 95% confidence interval for the percentage of time on the road of all
employees who leave before three years with D&Y has an upper limit of 72.69%
and a lower limit of 62.13%. This interval represents that we are 95% confident the
true mean lies within those parameters.

Based on the 95% confidence interval of those who stay at least three years and those
who leave before three years with D&Y, the percentage of time on the road might
influence who stays and who leaves. The confidence interval is larger for those who
leave before three years, which represents the larger variation of the true mean within
those given bounds. The mean and standard deviation are also higher compared to
those who stay at least three years, which shows that those who have a higher
percentage of time on the road tend to leave before three years.

d. Among all employees whose percentage of time on road is below the median (%54), what
proportion stay with D&Y for at least 3 years. Do the same for all employees whose
percentage of time on road is above the median (%54) and who stay with D&Y for at least
3 years.

• Among all employees whose percentage of time on the road is below the
median, the proportion that stay with D&Y for at least three years is 63.64%. On
the other hand, among the employees whose percentage of time on the road is
above the median, the proportion who stay for at least three years is 33.33%.

e. What proportion of employees with CIS degree leave before three years? Find the same
proportion for employees without CIS degree.

• The proportion of employees with CIS degrees that leave before three years is
56.90%. The proportion of employees without CIS degrees that leave before
three years is 12.50%.

f. Write a short report about employees’ retention at D&Y based on your findings.

Based on the data D&Y has collected on a sample of systems analysts with undergraduate
degrees they hired several years ago, there seems to be common trends in the reason for
some employees leave after three years and why some employees stay beyond three
years. According to the data collected, employees who start with a salary below the median
($37,750) are more likely to stay with the company for at least three years versus those
who start above the median are less likely to stay for at least three years. As shown in part
A, the retention rate for those with a starting salary below the median is higher than those
with a starting salary that is above the median. In order to keep their employees motivated,
D&Y should start their employees at a lower salary and encourage them to stay and make
more money over time instead of giving them a higher salary when they start with the
company. Another factor that determines the retention rate of employees is the percentage
time travelled on the road. The proportion of employees who stayed for at least three years
and traveled less than the median is almost double the proportion of employees who left
before three years and traveled more than the median. The table in part D represents the
high retention rate for the employees who travel less and the low retention rate for the those
who were on the road more frequently. Lastly, the retention rate of an employee also
depends on the whether or not the employee has a CIS degree. The rate of employee who
have CIS degree and leave before three years is significantly higher than those employees
who do not have CIS degrees and leave before three years. Therefore, those without CIS
degrees are more likely to stay with the company for at least three years. Based on my
findings, the overall retention at D&Y is dependent on the employee’s starting salary,
percentage of time on the road, and if he/she has a CIS degree or not. The employees
who start with a salary below the median, spend less time traveling on the road than the
median, and do not have a CIS degree will be much more likely to stay at least three years
with the company as opposed to those who start with a salary above the median, spend
more time on the road than the median, and have a CIS degree.

Question 2 A study is performed in San Antonio to determine whether the average weekly grocery
bill per five-person family in the town is significantly different from the national average. A random
sample of 50 five-person families in San Antonio showed a mean of $133.474 and a standard
deviation of $11.193.

a. Assume that the national average weekly grocery bill for a five-person family is $131. Is
the sample evidence statistically significant at 5% significance level? What about 10%
significance level?

H 0 : µ = 131
Ha : µ ¹ 131
Test statistic: t = 1.563
p-value: 0.124
The sample mean is not statistically significant at the 5% level because the p-value is
greater than 0.05. The sample mean is also not significantly different from 131 at the 10%
level because the p-value is greater than 0.10.

b. For which values of the sample mean (i.e., average weekly grocery bill) would you decide
to reject the null hypothesis at the 10% significance level?

For either p-value (0.01 or 0.10), we find the t-value that would lead to the rejection of the
null hypothesis, and then solve the equation t = ( X - 131) /1.583 for X on either side of
131. This leads to the following results:

a -value t-value Lower limit Upper limit


0.01 2.680 126.758 135.242
0.10 1.677 128.346 133.654

For example, at the 10% level, if X < 128.346 or X > 133.654, we would reject the null
hypothesis.

Question 3: Circuit Systems Case

1. Keeping in mind that reducing the average hourly workers is the goal, carefully define the
variable(s) by which you will measure the effectiveness of the new anti-absenteeism
program. Explain your choice.

With the goal of reducing the average cost of absenteeism by hourly workers, the variable
by which I will measure the effectiveness of the new program will be analyzing the amount
of sick leave taken this year compared to last year. I have made a new column in the data
set labeled Sick Leave Difference, which is taking the Sick Leave of Last Year and
subtracting the Sick Leave from this Year. If the difference is positive, the program was
effective on that certain employee. On the other hand, if the difference is negative, the
program was not effective on that employee. For convenience, I have highlighted the data
in green if the difference was positive and I have highlighted the data in red if the
difference was negative. Based on this new column, the new anti-absenteeism program
was extremely effective because there are only 12 employees out of a total of 233 who
took more sick leave this year than last year. With the overall decrease in sick leave this
year taken by hourly workers, it can be accredited to the implementation of the new
program.

2. Use appropriate statistical techniques to evaluate the effectiveness of the anti-absenteeism


program. Discuss the statistical and the practical significance of the results. Attach all
relevant output.

After analyzing the distributions for sick leave before (SLB) and sick leave after (SLA), the
JMP output shows that the mean for sick leave has decreased by about 5 days after a year
of the program being implemented. The sick leave after data is skewed to the left, which
means the program has encouraged some people to take less sick leave days.
Before the program was put into place, there were a lot of people taking about 18 sick
days, which means they were maxing out their sick leave days before the program was
instituted. On the average, the sick leave has decreased about 5 days after the one year of
the program being put in place.

In order to check the formal significance of the decrease in 5 days of sick leave, we must
conduct a t-Test to check the significance and see if we can reject the null hypothesis.
Since the variables, sick days before and sick days after, are coming from the same group
of employees, they are dependent samples. To account for this dependence, we must run
a hypothesis test on the difference between these two variables. After creating a new
column for the separate measure that shows the difference between sick leave before and
sick leave after, we can run a null hypothesis test on the mean and see if this mean is
equal to zero. The alternative hypothesis test will see if the mean of the difference is not
equal to zero. The test is on the mean of the difference, not the difference of means. Below
is the screenshot for the t-Test on the difference between the SLB and SLA means.

For the information collected on SLB and SLA, there is dependence between every pair of
observations since each observation is in respect to the same employee. In order to
account for this dependence, we must look at the difference in sick leave before and after
from each individual as a new column. The difference seen from these two variables will be
purely based off the program and not individual differences that might have occurred from
outside factors. After creating a new column with the difference between the two means,
we can analyze this new distribution and test the mean. The hypothesized mean is zero
because we are assuming the mean of the difference to be zero. The estimate of the
difference is 4.47 days. After looking at the t-Test, there is clearly a statistically significant
reduction in the number of sick days taken. Since it is statistically significant, we can reject
the null hypothesis that stated the mean of the difference is zero.

Null hypothesis = the mean of the difference = 0 ® mean of SLB – mean of SLA = 0
Alternative = the mean of the difference != 0 ® mean of SLB – mean of SLA != 0

3. Is there any evidence that the exercise program has been effective in reducing paid sick
leave taken by hourly production employees? Use appropriate statistical tools to support
your conclusions. Discuss the results and attach any relevant supporting output.
If the exercise program has been a factor, we would hopefully like to see the average in
the difference between the SLB and SLA in the exercise group to be larger than the non-
exercise group.

The null hypothesis would be the means for the exercise group and the non-exercise group
are equal to each other (the mean of SLB – the mean of SLA for the exercise group = the
means of SLB - the mean of SLA for the non-exercise group). The alternative hypothesis
would be that the means for the exercise group and non-exercise group are not equal to
each other. Since we are comparing the exercise group to the non-exercise group, they
are independent samples (there are those who participated in the exercise program and
those who did not). After running the t-Test, the difference between the two groups is 0.96
days, which can be interpreted as the reduction in the number of sick days is about a day
more for the exercise group than the non-exercise group. Based on the p-value, the
difference is also statistically significant.

The exercise program has been effective because the exercise group has reduced their
sick leave days by 0.96 days, which is almost a day. However, it is important to keep in
mind that not everyone has participated in the exercise group. Roughly 30% of the
employees are participating in the exercise group.

4. How much did the anti-absenteeism program save or lose this year? Construct a table
comparing this year's results under the new program to last year's results before the
program was implemented. Give a breakdown that shows the costs of the unused sick
leave conversion and the exercise program.
Above is the breakdown that shows the costs of unused sick leave conversion and the
exercise program. For the table comparing this year’s results under the new program to
last year’s results before the program was implemented, please refer to the attached JMP
file labeled “Circuit”.

After creating the two new columns that account for costs before the program and the
costs after the program is implemented, we can run a test on the mean of difference
between these two measures. Since this is another paired sample, we want to look at the
mean of the difference in terms of cost is zero. Above is the screenshot for the paired
samples test. The cost before the program is $1153.56 and the cost after is $1095.32, so
the program has reduced the cost by $58.24 per employee. The p-value suggests that this
is a statistically significant difference. The overall savings would be about $58 times the
number of employees in the sample.

5. Make your final recommendations about the anti-absenteeism program. Include any
modifications that you would make to the program and discuss the potential changes, if
any, caused by these modifications.

The anti-absenteeism program was effective in the first year the company implemented it.
Based on the distributions of the sick leave before and sick leave after data (in question 2),
the average of about 5 days less of sick days taken after the program started was
statistically significant which proves that the anti-absenteeism program had some effect on
the amount of sick days taken by employees. In addition to that, the employees who
participated in the exercise program decreased their sick leave day by about one day.
However, it must be noted that only about 30% of employees participated in this exercise
program. A modification I would make to the program would be to add an extra incentive
for employees to part-take in the exercise program and increase the participation rate
above the mere 30%. The extra incentive could be lower the insurance rates by even more
but that would also increase the company’s cost at the same time. To further analyze the
effects of this modification, we would have to conduct multiple test to see just how much
insurance rates would have to decrease in order for the company to keep the program
profitable. Looking at the actual costs of this anti-absenteeism program, it has decreased
the cost for the company by about $58 per employee. Depending on how the company
views this figure, $58 per employee could be a big difference or it could be negligible. The
$58 per employee might not seem to make that much of a difference, but if this is just a
sample of the workforce we can extrapolate to the workforce of thousands of employees, it
can make a huge difference. We must also take into consideration statistically significance
versus practical significance. The statistical significance is quite often based on how large
a sampling is (if you take a large enough sample, you will always get statistical significant
because the N is so large that the even small differences will show statistical significance).
It is more important to look at practical significance because it accounts for outside factors
that affect whether the program can be implemented or not. For example, if the
administrative costs for the program are high, then the savings of $58 per employee are
not worth the hassle. This Circuit Case has only given us costs related to the program and
we were able to prove it is effective and the changes were statistically significant to
implement, but we were not given information outside the program so we cannot conclude
if it is practically significant for the company to administer it. Overall, I would recommend
the company to adopt this anti-absenteeism program for the long-term because it has
decreased the amount of sick days taken by employees and decreased costs to the
company as a whole.

Question 4: A marketing professor is interested in the relationship between hours spent studying
and total points earned in a course. Data collected on 156 students who took the course last
semester are provided in the file MktHrsPts.xlsx:
a. Develop an estimated regression equation showing how total points earned is related to
hours spent studying. What is the estimated regression model?

The estimated regression equation is points = 8.67 + 0.80 Hours (see excel file
MktHrsPts sheet A).
b. Test whether each of the regression parameters 𝛽0 and 𝛽1 is equal to zero at a 0.01 level
of significance. What are the correct interpretations of the estimated regression
parameters? Are these interpretations reasonable?

Since the p-value is less than 0.01 level of significance, it can be interpreted that it has
an overall effect on the regression. These interpretations are reasonable because it the
number of hours spent studying is a significant enough variable to effect total point
earned.

c. How much of the variation in the sample values of total point earned does the model you
estimated in part a explain?

The variation in the sample values of total points earned in the model I estimated in part a
can be explained by the adjusted R-squared value of 0.8266.

d. Mark Sweeney spent 95 hours studying. Use the regression model you estimated in part
a to predict the total points Mark earned.
Regression model = 8.67 + 0.08 Hours
Mark Sweeney studying 95 Hours = 8.67 + 0.80(95 Hours)
= 8.67 + 76
Total points Mark earned = 84.67