Sie sind auf Seite 1von 14

Project 1 Learning about Stat 201 students Spring 2012 Submitted by Derek Harding 1) The last 2 digits of my student

nt ID number are 04, so my sample size is n=300+04=304.

Interpretation: The data is not normally distributed, and the data is right skewed. 50% of the sample spent 10 minutes or less playing games on their phone per week. 10% spent 120 minutes or more playing games on their phone per week.

2) For this question, I chose the categorical variable #13 Law against texting/Email while driving?

Interpretation: 73.3% either strongly support or somewhat support laws against texting/emailing while driving. Only 18.1% of the sample either somewhat oppose or strongly oppose the law. 8.6% neither support nor oppose the law against texting/emailing while driving.

3) #10 Driven Under Influence in Last Year?

Interpretation: 3.52% of males and 0% of females responded that they regularly drove under the influence last year. 14.79% of males and 3.09% of females responded fairly often. 11.97% of males and 16.03% of females responded just once. 28.17% of males and 17.28% of females say they rarely drove while under the influence last year. 41.55% of males and 63.58% of females say they never drove while under the influence last year.

4) For this question I chose #38, Miles on Car

a. The histogram shows that the shape of the data is skewed right. There is 1 severe outlier.

b. Interpretation: Since the data is not normally distributed, I would use the median and the interquartile range to describe the center and spread. The median is 80,498.5 and the interquartile range is 92,250. This implies that 50% of the data would fall in this range of 92,250. The total range of the data is 700,000. This shows the impact of the outlier on the data.

c. Interpretation: Since points fall outside the confidence interval, there is an indication that the data is not normally distributed.

d. Interpretation: Since the p value is very low, the probability that the data came from a normal distribution is highly unlikely.

5) I chose the quantitative data of#28 Followers on Twitter and the categorical data of #24 Passport?

b. Interpretation: Both centers are in the bin between 0 and 1000. The median for the no dataset is 47.5, and the median for the yes dataset is 22.5 6

c. Interpretation: Both datasets have a small interquartile range of 144.5 and 111.5. The dataset that said yes to passport has an extreme outlier that makes the data set have a large total range. d. Because the dataset has an extreme outlier, the outlier needs to be taken out and evaluated again.

Interpretation: Both centers are in the bin between 0 and 150. The median for the no dataset is 47.5, and the median for the yes dataset is 22. Both datasets have a small interquartile range of 144.5 and 110. The IQR and total range are similar on both datasets.

6) I chose #33 Govt. Healthcare USA and #36 driving and out of gas as my 2 categorical variables. a. No, I do not expect for there to be an association between the two variables.

c. Interpretation: The mosaic plot shows that the variables are independent, because the variables line up on the plot. Therefore, there is no association.

7) a. I would expect that GPA and number of hours per semester to be positively correlated.

Interpretation: The correlation of -0.0341 shows that there is a very weak negative correlation. The data is not statistically significant at the 5% level of confidence.

8)

#03 Height (in.) = 67.520128 + 0.1011066*#22Hours per week games on computer, Xbox etc.

a. Interpretation: The slope means that for every hour per week played on games there is an increase of .1011066 inches.

10

Interpretation: Assumptions for linear regression outliers condition is not held up because there are 2 highly influential outliers.

b. Interpretation: 5% of the variation in height is explained by the variation in hours/week of games. c. Redo analysis without the 2 extreme outliers.

11

#03 Height (in.) = 67.137658 + 0.2435117*#22Hours per week games on computer, Xbox etc.

a. Interpretation: The slope says that for every hour of games played per week, there is an increase of .2435117 inches.

Interpretation: 8% of the variation in height is explained for by variation in hours/week of games.

12

Interpretation: The 3 assumptions for linear regression do hold up. d. The lurking variable is weight

13

Interpretation: We know that height and weight have a relationship (as shown in the scatterplot that compares the two variables). Weight and hours/week of games also have a relationship. This can cause it to look like height is related to hours. All 3 variables have p values that are less than 5% that show that they are statistically significant.

9) a. There are 143 levels. b. No, there is not. Good surveys will not have open ended questions that need lots of clean up. The data would have to be cleaned up manually fitting the data into cleaned up levels.

14

Das könnte Ihnen auch gefallen