Sie sind auf Seite 1von 10

STAT151 3rd March 2014 Lecture Assignment 2 Q3.72-page 143 A. In the given regression equation = 209.9 + 25.

5x, which concerns a recent analysis of data for the 50 U.S. states, our y value which describes de violent crime rate (measured as number of violent crimes per 100, 000 people in the state), the x value portrays the poverty rate (percent of people in the state living below the poverty level) and the slope, represented by the value of (+25.5) equals the amount that will change when the x value increases by one unit. Thus we can predict that an a 1% increase in the poverty rate as a percent of people in the state living at or below the poverty level will result in an 25.5% increase in the violent crime rate, measured as a number of violent crimes per 100,000 people in the state. B. Hawaii -> = 209.9 + 25.5 (8.0) = 413.9% Mississippi -> = 209.9 + 25.5 (24.7) = 839.75% Thus, using the given range of state poverty rates we can predict that the values for the state violent crime rate will range from 413.9% (Hawaii) to 839.75% (Mississippi). C. The linear association (correlation) between the two variable is positive as the increase in the poverty rate (x value) results in an increase in the violent crime rate (y value). Q3.92-page 145 B. A lurking variable is a variable, usually unobserved, that influences the association between the variables of primary interest. In the context of the data gathered from several countries for the United Nations, the found negative correlation between the birth rate and the per capita television ownership can be explained by the change in the per capita income. In the case of a low per capita income, the individual (and partner) use all their income in order to satisfy basic needs (such as 1

hunger and shelter) and are left with little or no expendable income to spend on luxury goods such as TV or contraception methods. Thus because the lurking variable of per capita income is associated with both the response variable and the explanatory variable and affects the relationship between the two variables, we can infer that the resulted negative correlation is not statistically significant. Q3.94-page 146 A. Table 1. A contingency table summarizing the number of individuals hired or not for a whitecollar position in relation to gender Hired Gender Male Female Total: Yes 30 40 70 No 170 160 330 Total Applicants: 200 200 400

Table 2. A contingency table summarizing the number of individuals hired or not for a blue-collar position in relation to gender Hired Yes 300 85 385 No 100 15 115 Total Applicants: 400 100 500

Gender Male Female Total:

B.

Table 3. A contingency table summarizing the number of individuals hired or not for a position in relation to gender Hired Gender Male Female Total: Yes 330 125 455 No 270 175 445 Total Applicants: 600 300 900

Thus, by carrying out the previous calculations, it can be verified that the denial rate percentages for both male and female applicants are indeed the ones quoted by the federal Equal Employment Opportunity Commission (EEOC) enforcement official. C. Simpsons paradox serves as a caution when interpreting association between variables as it does not automatically imply causation. After analyzing the two contingency tables present in part A, the single contingency table in part B as well as the denial rate percentages for both men and women for the total number of positions as well each individual position, the presence of a potential lurking variable can be noticed. In the case of the white-collar positions (considered less "physically laborious") which presented an equal number of applicants for each sex the denial rate (calculated by using the following formula: Number of applicants 3

(male/female) not hired for white-collar position/Total number of applicants (male/female) who applied for a white-collar position) is 85% for males and only 80% for women. In the case of the blue-collar positions, potentially due to the nature of the job (traditionally considered more physically laborious) a lurking variable can be represented by the difference between the number of applicants who are male (400) and the number of applicants who are female (100) and the denial rate is 25% for males and 15% for females. Due to the difference in the number of applicants for each sex, the probability of finding a higher percentage of skilled workers for the positions in higher within the bigger pool of male applicant. Thus, this example can be considered an example of Simpsons paradox as it illustrates the dramatic influence that a lurking variable can have when interpreting association between two variables. Q R1.16-page 204 C.

By calculating the z-score of the given observation we can determine how many standard deviation the observation falls from the mean. Considering that our observation has a z-score of approximately 3.4, we can determine that in falls 3.4 standard deviations above the mean. By the 3 standard deviation criterion, this observation represents a potential outlier thus implying that the value of 11,067 would be considered unusually high in the context of a bell shaped distribution of EU energy values. Q R1.20-page 205 B. By observing the given graph we can infer that the variety of Iris blossom which exhibits the most variation in petal length is the Versicolor as the spread of the observation points (represented on

the graph by the blue capital letter V) in relation to length is noticeably wider when compared to that of the Setosa petals (represented on the graph by the blue capital letter S). c. After analyzing the given graph we can infer that the variety of Iris blossom for which the correlation between petal length and petal width is the strongest is the Setosa as the spread of observation points (represented on the graph by the blue capital letter S) is noticeably not as wide for both length as width, when compared with the Versicolor (represented on the graph by the blue capital letter V). Thus, by observing the highly clustered distribution of the Setosa petals we can determine that the correlation between their length and their width is very strong. D. By analyzing the given graph, we can approximate that the length of a Setosa petal (represented on the graph by the blue capital letter S) ranges between 1 cm and 2 cm, while the length of a Versicolor petal (represented on the graph by the blue capital letter V) can be approximated to range between 3 cm and 5 cm. Thus, if only given the length of the petal one can easily determine what variety of Iris blossom it is: if between the values of 1 and 2 centimeters it is a Setosa, if between 3 and 5 centimeters it is a Versicolor. By further observing the given graph , we can approximate that the length of a Setosa sepal (represented on the graph by the red lowercase s) ranges between 4 cm and 6 cm, while the length of a Versicolor sepal (represented on the graph by the red lowercase v) can be approximated to range between 5 cm and 7 cm. Thus, if only given the length of the sepal, one can determine with certainty that if the length is less than 5 centimeters it is a Setosa and if the length is more than 6 centimeters it is a Versicolor. An uncertainty in identifying the variety of Iris blossom occurs if given a sepal length which ranges between 5 centimeters and 6 centimeters as both species can exhibit this length, and may lead to an erroneous identification. Further information is required in order to correctly identify the species.

Q5.26-page 229 A. Table 4. A contingency table displaying the occurrence probability of each outcome (ordering something or not) after receiving a Fall or Winter catalog Winter Yes Fall Yes No Total: 0.30 0.05 0.35 No 0.10 0.55 0.65 Total: 0.40 0.60 1

D. Let F denote buying from the fall Catalog: P(F)= P({YY,YN})= 0.30+0.10= 0.40 Let W denote buying from the Winter Catalog: P(W)=P({YY,NY})= 0.30+0.05= 0.35 If F and W are independent, the P(F and W)= P(F) X P(W) P({YY}) = 0.40 X 0.35 0.30 0.14 Thus by proving that P(F and W) does not equal P(F) X P(W) we can determine that F and W are not independent events. A customer who has ordered something after receiving the Fall catalog is more likely to place another order after receiving a Winter catalog [P(YY)=0.30], while a customer who did not place an order after receiving a fall catalog is considerably less likely to do so after receiving the winter catalog [P(NY)= 0.05]. Thus it is not normally expected for the customer choices to be independent.

Q5.28-page 239

Binge Drinker Gender Male Female Total: Yes 0.50 0.34 0.84 Total:

A (Binge Drinker = yes) and B (Male)

P(AB)=

=>

Q5.88-page 258 A. In the context of screening for Acute Myocardial Infarctions (AMI) using the Creatine Kinease (CK) as a biochemical marker, the resulted sensitivity of 37% is the measurement of the proportion of actual positives which are correctly identified as such (e.g. the percentage of individuals suffering from AMI, who are correctly identified as having the condition). Using notation, the resulted sensitivity can be expressed as P(POSAMI). B. When screening for Acute Myocardial Infarctions (AMI), also commonly referred to as heart attack, using the Creatine Kinease (CK) as a biochemical marker, the resulted specificity of 87% measures the proportion of negatives which are correctly identified as such (e.g. the percentage of healthy individuals who are correctly identified as not suffering from AMI). Using notation, the resulted specificity can be expressed as P(NEGAMI c).

C. Fig. 1 A tree diagram for Acute Myocardial Infarction (AMI) diagnosis screening using Creatine Kinease (CK) as a biochemical market

Q6.20- page 287 A. Taking into consideration that the given distribution is a standard normal distribution with a mean = 0 and a standard deviation of =1, we can find the z-core of the shaded region by looking up the given probability of 0.20 in Table A (the Standard Normal Cumulative Probabilities Table). By doing so, we find a z-score of -0.84. By observing the given distribution and acknowledging the fact that we are dealing with a bell-shaped symmetrical distribution we can understand that since the shaded area and the given probability fall to the right of the mean (= 0 ) the z-score of the right-tale probability equals the left-tale z-score (of -0.84) by symmetry, but has a positive value of 0.84.

B. i) A standard normal distribution has a mean of = 0 and a standard deviation of =1, we can find the z-core of the right-tale probability of 0.05, by looking up the given probability in Table A (the Standard Normal Cumulative Probabilities Table). By doing so, we find a z-score of -1.64, and acknowledging the fact that we are dealing with a bell-shaped symmetrical distribution we can understand that since given probability falls to the right of the mean (= 0 ) the z-score of the right-tale probability equals the left-tale z-score (of -1.64) by symmetry, but has a positive value of 1.64. C. ii) The z-core of the right-tale probability of 0.005 can be found by looking up the given probability in Table A (the Standard Normal Cumulative Probabilities Table). By doing so, we find a z-score of -2.57, and acknowledging the fact that we are dealing with a bell-shaped symmetrical distribution with a mean of = 0 and a standard deviation of =1 we can understand that since given probability falls to the right of the mean (= 0) the z-score of the right-tale probability equals the left-tale z-score (of -2.57) by symmetry, but has a positive value of 2.57. Q6.30- page 288 A. By acknowledging the fact that we are dealing with a standard normal distribution we understand that the right-tale probability equals the left-tale probability and also that the zscore of the right-tale probability equals the left-tale z-score (of -0.33) by symmetry, but has a positive value of 0.33. By looking up the z-score of -0.33 in Table A (the Standard Normal Cumulative Probabilities Table) we find a probability of approximately 0.37. Thus, the cumulative probability of 0.3707 means that a proportion of 37% 10-year-old boys are tall enough to ride the roller coaster.

B. Acknowledging the fact that we are dealing with a standard normal distribution we understand that the right-tale probability equals the left-tale probability and also that the z-score of the right-tale probability equals the left-tale z-score (of -1.00) by symmetry, but has a positive value of 1.00. By looking up the z-score of 1.00 in Table A (the Standard Normal Cumulative Probabilities Table) we find a probability of approximately 0.84. Thus, the cumulative probability of 0.8413 means that a proportion of 84% 10-year-old boys are tall enough to ride the roller coaster. C. In order to find out the probability between 50 in (tall enough to ride roller coaster in part B.) and 56 in (tall enough to ride roller coaster in part A.) we need to take the difference between the two separate cumulative probabilities.

Therefore, about 47% of 10-year-old boys are tall enough to ride the coaster in part B. (50 in), but not tall enough to ride the coaster in part A. (56 in).

10

Das könnte Ihnen auch gefallen