# STAT 151 31 June 2014 Lecture Assignment 1 Q1.28-page 21 A.

The population of the survey is the schools entire student body, while the sample consists of the 100 students who participated in the survey. B. i) descriptive statistics refers to methods of summarizing the collected data

In the given study, descriptive statistical analysis is used in order to organize the data collected from the sample in simple visual summaries (such as graphs) without distorting or losing information; By summarizing the entire set of data in a graphical way, it can be used more effectively as it is easier to comprehend and interpret . ii) inferential statistics refers to methods of making decisions or predictions about the population, based on data obtained from a sample of that population. In the given study, inferential statistical analysis is used in order to extrapolate the results obtained from the 100 students sample into an inference about the preference of the entire student body in regards to a several-day period between the end of classes and the start of final exams as a means of reducing student stress levels. Q 2.28-page 46 A. By examining the data displayed in Fig, 1 and observing the 1945-1970 downward sloping trend of data curve, we can infer that the whooping cough vaccination proved to be very effective in reducing the incidence of whooping cough

occurrences in the United States following its development in 1940. B. Considering that the incidence rate of whooping cough in the United States since 1993 seems to be fluctuating around the value of 3, it noticeably higher than the incidence rate of 2.1 observed in 1970. Taking that into consideration and also looking at the increased incidence rates of whooping cough manifested in the years 2006-2008 we can determine that the United States, although keeping the outburst of whooping cough under control, is not close to eradicating it. C. A histogram of the incidence rates since 1935 would not address the question about the success of the vaccination for whopping cough in the United States as it would not connect the data points over time and would thus make it impossible to determine any time trends such as the sudden fall of whooping cough incidence starting with the introduction of the vaccine which shows its success.

## Q 2.60- page64 Judging from the given

information and calculating the median [(73+75)/2=74] of the data set we can determine that the distribution of the data set is left skewed (the median =74 > the mean =70.4)

Q 2.68- page 72 The 5 number summary: Min = 2.5 Q1 = 36 Median = 60 Q3 = 100 Max = 205 A. i) 75% of the states have cigarette taxes greater than 36 cents. ii) 25% of the states have cigarette taxes greater than 1 dollar (100 cents) B. The middle 50% of the found observations are between Q1= 36 and Q3= 100. C. Q3 - Q1 = IQR 100 - 36 = 64 IQRx1.5 = 96

(Lower boundary)> Q1 - (IQRx1.5) =36-96 =-60 < min =2.5 not an outlier (Upper boundary)> Q3 + (IQRx1.5) =100+96= 196 < max=205 outlier Thus we can infer that the maximum = 205 is an outlier in our dataset as it falls more than 1.5 x IQR above the third quartile. D. Based on the 5 number summary we can understand that the distribution of the data curve is not bell shaped. Due to the presence of the outlier (cigarette tax of 205 cents) the distribution of the of the data would be expected to be skewed to right instead. Q 2.78-page74

A. The z-score of 2.28 indicates that Canada, with an observation of 16.5 (carbon dioxide emissions) falls 2.28 standard deviations above the mean of carbon dioxide emissions for the EU nations. Therefore, although not being an outlier, being close to 3 standard deviations above the mean, we can determine that Canadas carbon dioxide emissions are quite high compared to those of nations in the EU. B. The z-score of -0.92 indicates that Sweden, with an observation of 5.0 (carbon dioxide emissions) falls 0.92 standard deviations below the mean of carbon dioxide emissions for the EU nations. Therefore we can determine that Swedens carbon dioxide emissions are quite similar compared to those of nations in the EU. Q 2.124- page 85

The z-score of -2.16 indicates that the cereal with a sodium value of 0 falls 2.16 standard deviations below the mean. Therefore, although not being an outlier, being close to 3

standard deviations below the mean, we can determine that a sodium value of 0 is quite unusual in cereal. Q 4.8-page158 A. The presented study is an observational one as women were not randomly assigned to benefit from the screening or not (the screening does not play the role of experimental treatment) and the study also uses a historical-comparison group as a control group. B. The studys explanatory variable is whether the subjects (the women) were offered a mammography screening or not; the response variable is represented by whether the screening reduces the incidence of the death rates C. I believe the study does not prove that being offered mammography screening causes a reduction in death rates associated with breast cancer as the sample frame was not randomly assigned and the use of a historical- comparison group allows for various lurking variables to influence the data. Q 4.30- page170 A. Due to the undercoverage resulted from poor sampling design; the sampling frame lacks representation from parts of the population. The sampling frame does not include individuals who might not have access to the internet or simply individuals that might not be aware of the studys existence. Thus, the responses of the subjects that volunteer are part of the study, might differ from the answers of those who were not included in the sampling frame. B. The sample used in the study is a nonrandom convenience sample. Since the sampling design used (volunteer sample- online responses) was not random, not all members of the population are likely to be represented. The subjects who volunteered, in the case of the study, preponderantly men or individuals with at least a bachelors degree represent only one restricted segment of the population and may be more likely to volunteer than other segments such as women or individuals who have not acquired a bachelors degree because they have a stronger opinion about the issue or are more likely to visit the MSNBC site. Due to the

sampling design not all the members of the population have equal chances of taking part of the study and are thus not equally represented in the results. C. Due to the nature of the study (online survey) there is a potential for response bias. Thus, the subjects of the study may have distorted their answer in a way they think would be socially acceptable or pleasing to the researcher (in accordance with what they believe to be the surveys purpose). The questions of the onli ne survey could be confusing, long or leading or they could proceed in an order which can dramatically influence the subjects answer which would ultimately lead to biased results.

Q 4.34- page176 A. In the conducted experiment, the experimental unit is represented by the subjects (the 51 patients between the ages of 3 and 22). The 2 types of treatment used in the study are duct-tape therapy (covering the wart with a piece of duct-tape) and cryotherapy (freezing the wart by applying a quick, narrow blast of liquid nitrogen). The explanatory variable of the study is whether one type of treatment is used over the other and the response variable is represented by whether the used treatment was successful in treating the wart or not. B. By using randomization, we attempt to balance the treatment groups by making them similar with respect to their distribution of potential variables. The researchers should randomize to assign subjects to the two treatments. They could use random numbers following the procedure: Number the study subjects (patients) from 01 to 51. Pick a two digit random number between 01 and 51. If the number is 32, then the subject numbered 32 is put in the duct-tape therapy treatment group. Continue to pick two digit numbers until 25 distinct values between 01 and 51 are chosen. The remaining 26 subjects will form the cryotherapy treatment group.

This enables the researchers to attribute any difference in their relapse rates to the treatments they are using, not to lurking variables or to researcher bias. Due to the nature of the experimental treatment, neither the subjects nor whoever has contact with them can be blinded, thus the researchers must acknowledge this as a potential source of bias when inferring from the results of the experiment.

Q 4.54- page188 A. In the given experiment an alternative experimental design is used considering the fact that each treatment is observed for each subject. Thus, the given experiment uses a randomized block design where each person (volunteer) represents a block. B. In order to avoid biased results, the study should be conducted as a double-blind experiment, where neither the subjects (volunteers) nor those having contact with the subjects know the type of treatment administered. C. In order to reduce possible bias, the treatments should be randomly assigned within the blocks so that the subjects do not all receive the same treatment in the same order. Thus, each subject will receive a placebo, a low dose and a high dose of the drug (all blocks will be on an equal footing in any subsequent comparisons), but their order will be randomly assigned.