Sie sind auf Seite 1von 7

z5115443

MATH1041 COMPUTING ASSESSMENT- SEMESTER 2/2018.

QUESTION 1
The sample mean of the 31 observations; x̄ = 57.578 kgs (1)

The sample standard deviation of the 31 observations; s = 5.4159 (2)

(3)

The above quantile plot is able to demonstrate that the sheep-body weights are approximately
normally distributed. The observed values are approximately linear and there is a grouping of values
around the sample mean. Although there appear to be 2 outliers, the size of the sample (n =31)
ensures that these will not significantly skew the results of the data analysis performed. Therefore, we
can obtain confidence that the values of sheep bodyweight observed on the drought affected farm are
normally distributed.

QUESTION 2

In this scenario, the null hypothesis (H0) is that the true mean body weight of the sheep recorded is 60
kilograms, consistent with the observation by scientists five years ago, prior to the drought.

H0 : µ = 60

As the null hypothesis is µ = 60kg, the alternative hypothesis attempts to prove that the drought being
experienced does affect the weight of sheep. It is not limited to a simple decrease or increase in
weight from the true mean, but a change in the true mean (either up or down). Hence:

Ha : µ ≠ 60 kilograms.

The next step in the hypothesis test is computing the value of the test statistic. In this instance, as the
population's standard deviation is not known, the t-statistic using the sample standard deviation will be
used. In order to calculate the t statistic value, its components must be identified.

Rohit Louis
z5115443

x̄ = 57.578

µ = 60

s = 5.4159

n = 31

t statistic = -2.4899

P(4) value = 0.01854562

Using the formula, the t value obtained from this set of data is t = -2.4899. The appropriate sampling
distribution for this statistic then, is T~t(n-1), in this case a t distribution with 30 degrees of freedom.

Next, the P value is obtained using the t statistic generated. In order to reach a conclusion with
respect to the alternative hypothesis Ha : µ ≠ 60, the P value is acquired through a 2-sided test (steps
outlined below).

When this P value (0.0185) is tested against a significance level of α = 0.05 to determine if the null
hypothesis is accepted or rejected, it can be seen to fall below this value. Therefore, based on the
sample data, there is moderate to strong evidence to reject the null hypothesis and conclude that the
true mean of sheep bodyweight today does not equal the figure of 60 kilograms observed five years
ago. In summary, the drought has had an adverse impact on the µ of sheep body weight.

2P ( T ≥ 2.4899 )

2 * ( 1 - P( Z < 2.4899 ) )

2 * ( 1 - pt ( 2.4899, 30 ) )

P = 0.01854562

However, before we accept this conclusion, certain assumptions about the integrity of the data need
to be made and accepted. These assumptions are:

1. Observations (X1…X31) are approximately normal.

2. The data provided consists of 31 independent observations”

In the case of point 2, we can say that the assumption is satisfied because the weight of one sheep
on the farm does not directly influence the weight of another sheep. Therefore, the assumption of
sample data independence holds.

In the case of point 1, it has been proven by the above quantile plot (Q1), that the observations
provided are distributed normally with the exception of 2 outliers. With a large sample, these are not
significant enough to skew the analysis. Therefore, the assumption of normal distribution holds.

The next step in the process is the calculation of a confidence interval for the true mean of sheep
bodyweight. This will ensure 95% certainty that the true mean of the sheep weight will lie between the
two values we compute.

The formula is given by:

Rohit Louis
z5115443

x̄ = 57.578

µ = 60

s = 5.4159

n = 31

t*(5) = 2.0422

t* is a quantile value from t ( n – 1 ). In this case it is t ( 30 ) as there are 31 observation and 30


degrees of freedom. In order to calculate t*, in R, the confidence value we seek (0.975) and degrees
of freedom (30) must be entered into the qt function. In this instance, t* = 2.0422

Therefore, the values for the 95% confidence interval are (55.59 , 59.56). This means that we can
assume that the true mean of sheep bodyweight falls between these two values. We can see that the
confidence interval does not include the value of 60. This is consistent with our earlier findings and
rejection of the null hypothesis.

QUESTION 3

(6)

(7)

From the comparative boxplot, it can be observed that male sheep have larger minimum and
maximum weights than female sheep. In addition, they have a larger inter-quartile range. Female
sheep recorded a lower bodyweight median and a higher standard deviation than male sheep.
Female sheep also have a larger spread between the minimum and maximum weight by 1.41
kilograms. From both graphs, there appear to be no significant outliers in the data set.

Rohit Louis
z5115443

(8)

The scatterplot above shows a very weak but positive linear association between the two quantitative
variables, bodyweight (kgs) and head to tail length (mtrs). There appears to be two significant outliers,
a sheep with an HTL in excess of 3 metres and a bodyweight less than 55kg and a sheep weighing
more than 50kg with an HTL less than 1 metre. It can be concluded based on the output of the
scatterplot that there is no meaningful relationship between BW and HTL.

A statistical method to most appropriately quantify the relationship between two quantitative variables
is the correlation coefficient 'r'. In essence, the higher the coefficient the stronger the relationship that
exists between the compared quantitative variables. In this case, comparing bodyweight to HTL
lengths, the correlation coefficient (r) (9) value generated = 0.1791. As this number is close to 0, it can
be observed that there is no meaningful relationship between HTL and bodyweight.

Linear regression aims to model the relationship between an explanatory and a response variable. It
is not recommended in this case because it can be clearly observed that there isn't an observable
linear relationship between head to tail length and bodyweight, as observed in the below regression
attempt. Furthermore, as linear regression should only be used to predict the weight of the sample
data we have, we can see that there are no values correctly predicted, meaning that a regression
analysis in this situation is not effective and unnecessary.

Rohit Louis
z5115443

(10)

QUESTION 4

The five-number summary for the HTL measurements is as follows:

Min. 1st Qu. Median Mean 3rd Qu. Max.(11)

0.750 1.405 1.520 1.592 1.710 3.310

(12)

Rohit Louis
z5115443

(13)

(14)

It can be observed for the normal HTL histogram that is extremely right skewed and it is
asymmetrically shaped. It has a mean of 1.592 metres, a median of 1.520 metres and a mode (15) of
1.49 metres. Finally, the spread of data (max-min) from these observations is 2.56 metres

For the Log transformed histogram, it can be described as an almost normal (even) distribution that
has a fairly symmetric shape. It has a mean of 0.188 metres, a median of 0.181 metres and a mode
(16) of 0.173 meters. Finally, the spread of these observations is 0.644 metres

In the case of the Square-Root transformed histogram, it is observed that it has less of a right skew
than the standard HTL histogram, but a skill is still present. It is asymmetrically shaped. It has a mean
of 1.251 metres, a median of 1.232 metres and a mode (17) of 1.220 metres. Finally, the spread of
these observations is 0.953 metres

It can be seen that the log transformation has done a good job in reducing skewness of the data. The
resulting output has taken an almost symmetric shape. This is because the comparison of the means
of log-transformed data is actually a comparison of geometric means. While it is not perfect, it is much
better to work with for the purposes of data analysis. In contrast, the Square-Root transformation is
weaker than the log transformation and does little to correct the skewness of the data. Therefore, the
log transformation would be the method that has reduced skewness most effectively.

Rohit Louis
z5115443

R- Studio Workings (see reference numbers)


(1) > mean(`z5115443[1576]`$BW)

(2) > sd(`z5115443[1576]`$BW)

(3) >qqnorm(`z5115443[1576]`$BW, ylab = "Weight (Kgs)", xlab =


"Distribution of Weight", main = "Sheep Weight Quantile Plot")
>qqline(`z5115443[1576]`$BW)

(4) >2*(1-pt(2.4899, 30))

[1] 0.01854562

(5) >qt(0.975,30)

[1] 2.042272

(6) > m<-`z5115443[1576]`[`z5115443[1576]`$SEX > 0, ]


> f<-`z5115443[1576]`[`z5115443[1576]`$SEX < 1, ]
> boxplot(m$BW,f$BW, names = c("Male", "Female"), xlab = "Gender",
ylab= "Weight (kg)", main = "Comparative Bodyweight Box-Plot")

(7) > summary(m$BW)


> summary(f$BW)

(8) > plot(`z5115443[1576]`$BW,`z5115443[1576]`$HTL, ylab = "HEAD TO TAIL


LENGTH (MTRS)", xlab = "BODYWEIGHT (KGS)", main = "BW AND HTL SCATTERPLOT")

(9) > cor(`z5115443[1576]`$BW,`z5115443[1576]`$HTL)

(10) > plot(`z5115443[1576]`$BW,`z5115443[1576]`$HTL, ylab = "HEAD TO TAIL


LENGTH (MTRS)", xlab = "BODYWEIGHT (KGS)", main = "BW AND HTL SCATTERPLOT")
> abline(lm(BW~HTL, data = `z5115443[1576]`), col = 'red')

(11) > summary(`z5115443[1576]`$HTL)

(12) > hist(`z5115443[1576]`$HTL, main = "Head to Tail Length Histogram",


xlab = "HTL Length (Metres)")

(13) >logtrans_HTL<-log10(`z5115443[1576]`$HTL)
>hist(logtrans_HTL, main = "Head to Tail Length Histogram- Log
Transformed", xlab = "log(HTL Length (Metres))")

(14) >trans_sqrt<-sqrt(`z5115443[1576]`$HTL)
>hist(trans_sqrt, main = "Head to Tail Length Histogram- Square Root
Transformed" , xlab = "sqrt(HTL Length (Metres))")

(15) >getmode <- function(v) {


uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
>getmode(`z5115443[1576]`$HTL)

(16) >getmode(logtrans_HTL)

(17) >getmode(trans_sqrt)

Rohit Louis

Das könnte Ihnen auch gefallen