Sie sind auf Seite 1von 10

BES Tutorial Sample Solutions, S1/13

WEEK 3 TUTORIAL EXERCISES (To be discussed


in the week starting March 18)
1. Using the car data from Week 2, Question 3:
(a) Redo Q3(c) using EXCEL to confirm that the
frequency histogram is given by Figure 3.1.

Frequency

Figure 3.1: Revised histogram for age


of cars
10
9
8
7
6
5
4
3
2
1
0
2

10

14

18

22

Age

(b)Calculate the mean, median and mode for this sample


of data and use them to further describe the
distribution of ages.
5 5 6 ... 24 11
7.3
20
Ordering the data from lowest to highest:

Mean

2, 2, 3, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 7, 9, 10, 11, 11, 14, 24,

Median = (6+6)/2=6
Mode = 6
The sample mean is to the right of mode and median,
suggesting that the sample distribution is skewed
towards the right. The cause seems to be the large outlier
one car had an age of 24, which appeared to be very
different to the age of other cars. Given the skewness and
the outlier, the median is possibly a better measure of
central tendency. Hence a typical second-hand car is 6
years old.
Alternatively the EXCEL output is:
Age
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count

7.3
1.126476
6
6
5.037752
25.37895
5.712234
2.0983
22
2
24
146
20

(c) If the largest observation were removed from this data


set, how would the three measures of central tendency
you have calculated change?
5 5 6 ... 6 11
6. 4
19
median)
Median = (6+6)/2= 6 (unchanged)
Mode = 6 (unchanged)

Mean

(Now

closer

to

2. For the following statistical population, compute the


mean, range, variance and standard deviation: 3, 3, 5,
12, 13, 14, 17, 20, 21, 21.
3 3 5 12 13 14 17 20 21 21
12.9
10
Range 21 3 18
( xi ) 2 (3 12.9) 2 .... (21 12.9) 2

2
Variance

N
10
45.89
Standard deviation 45.89 6.7742
Mean

3. For the population in Q2 above, what would happen to


each of the measures you have calculated if :
(a) 4 were added to each data point (observation)?
The mean would increase by 4, but the range, variance
and standard deviation would be unchanged.
(b) Each data point was multiplied by 2?
The mean, range and standard deviation would be
multiplied by 2, whilst the variance would be multiplied
by 4.
4. Calculate the 90th percentile for the following set of
data:
-2.4, -1.34, 3.4, 3.5, 4.01, 6.5, 6.7, 7.25, 7.9, 8.46, 9.7,
9.8, 10.45
For a value of p 90 , we have
p
90
L p (n 1)
(14)
12.6 .
100
100
Implying the 90th percentile is 60% of the distance
between the 12th & 13th observation. Then:
90th percentile 9.8 (0.6)(10.45 9.8) 10.19

5. SIA: Migrant wealth.


Suppose the Minister for Immigration is interested in
research on the assimilation of migrant households (a
household where the chief income-earner is foreign
born). The Household, Income and Labour Dynamics
in Australia (HILDA) survey is a representative
survey of Australian households. Using 4,669
household observations for 2002 from HILDA, we
find there are 3,567 households classified as
Australian-born and 1,102 classified as migrants. One
key consideration is how migrant households are
doing in terms of wealth compared with Australianborn households. Using these data, we find the
following:
Summary statistics for net household wealth ($A)

236,064

Median
90th
10th
percen
percen
tile
tile
1,545 123,020 560,006

248,970

1,720

Mean

Australianborn
Migrant

131,152 524,372

(a) What can you say about the distribution of net


household wealth for both Australian-born and
migrant households by looking at just the mean and
the median figures?
The wealth distribution is skewed quite heavily towards
the right for both Australian-born and migrant
households. The mean is much larger than the median,
suggesting that more than 50% of each sample have less
than average wealth, while less than 50% of each sample
5

have more than average wealth. In other words, there is


a fair amount of wealth inequality in both samples.
(b)More generally what can you say about the
distribution of wealth for migrant households
compared to that for Australian-born households? In
particular, which type of household has greater
variation in wealth?
Based on just the mean and the median measures, a
typical migrant family appears to be slightly wealthier
than a typical Australian-born family. Both figures are
larger for the migrant sample than the Australian-born
sample. This is also the case for the 10th percentile
figure. By contrast, the 90th percentile is greater for the
Australian-born sample than the migrant sample. These
figures suggest that, while typical migrant families are
better off than typical Australian families in terms of
wealth, migrant families are less likely to be very poor or
very rich compared with Australian-born families. In
other words, Australian-born families have greater
variation in household wealth than migrant families.
(c) Suppose the minister has net household wealth of
$600,000. What can you say about their financial
circumstances relative to other Australian-born
households?
The ministers household has greater wealth than at least
90% of Australian-born households in Australia. They
are amongst the wealthiest 10% of Australian
households.

6. SIA: Sydney housing prices.


Figure 3.2 depicts a scatter plot of Sydney housing
prices versus distance from Sydney. The unit of
observation is a suburb, price is the mean of the
median price of houses sold in each suburb for two
quarters (September and December 2002) and
distance is measured in kilometers from Sydneys
CBD.
(a) What would you expect the correlation to be between
price and distance?
There is an inverse relationship between Distance to
CBD and Price so expect correlation to be negative.
(b)Does it appear that there is a linear relationship
between the two variables?
Relationship does not look linear largely because of the
large variability in prices for suburbs close to the CBD.
(These observations also tend to distort what the
relationship looks like for the bulk of the data. If you
were to eliminate these outliers, it is not clear what the
relationship would look like for the remainder of the
data.)
(c) What other key features of these data can be
determined from the plot?

Figure 3.2: House prices in Sydney suburbs versus distance to


CBD
6000000

5000000

Price $

4000000

3000000

2000000

1000000

0
0

10

20

30

40

50

60

70

80

Distance to CBD (kms)

Have already mentioned the large variability in


prices for suburbs close to the CBD. Could say this
more formally - the variance of prices close to the
CBD (conditional variance) is much larger than
the variance of prices further away from the CBD.
Other outliers around 30kms from CBD
(Clareville, Palm Beach and Whale Beach).
There is no suspicion that these outliers are due to
errors. All are feasible observations.
Can see that the price and distance variables are
both skewed to the right.
There are numerous suburbs where there were no
sales. Most of these are suburbs relatively close to
the CBD.
What should we do with the zero sales
observations when we analyse the data? They are
not data errors as sometimes occur. But they are
not real zeros as we dont know what the price
would have been had there been sales for the
period in question.
8

7. Anzac Grange wants to develop guidelines for setting


prices of cars according to the cars age. They hire a
business consultant who chooses a sample of 117
second-hand passenger car advertisements collected
from www.drive.com.au and retrieves data on age and
price of the cars.
(a) The business consultant first calculates the correlation
coefficient between age and price and finds it to be 0.278. Interpret this result.
Correlation coefficients lie between -1 and 1. A negative
value suggests an inverse relationship between the
variables (which makes sense). A magnitude of 0.278
suggests that the relationship is not very strong.
(b)Sketch what you think the scatter diagram from which
it was calculated might look like. Suppose the
business consultant constructs a simple linear
regression model using price as the dependent
variable, and age as the independent variable. What
do you think the estimated regression line might look
like here? (Note: We will return to this particular
example in week 12 and address this question more
formally.)
Below is a possible scatter diagram with a linear
regression model superimposed. Scatters that answer
the question will have the key feature of being
consistent with a negative correlation, ie a negatively
sloped line of best fit.

PriceagescatterwithOLSregressionlinesuperimposed
60000

50000

Price

40000

30000

Price
Linear(PredictedPrice)

20000

10000

0
0

10

12

14

16

Age

10

Das könnte Ihnen auch gefallen