Sie sind auf Seite 1von 5

# Unit 1 – Bivariate Distributions

1) A local wine tasting contest took fifty different wines from the near towns. For each wine it
was annotated the number of fermentation years and the alcohol content as a pair (x, y). The
data on the fifty contestants is the following:

(3,11) (4,13) (3,11) (3,12) (3,12) (2,13) (3,11) (2,13) (2,13) (2,12) (4,12) (2,12) (3,12) (3,11)
(2,12) (4,12) (4,12) (4,13) (4,13) (4,12) (3,13) (3,12) (4,12) (4,12) (2,13) (2,12) (3,13) (3,11)
(3,13) (2,11) (3,11) (3,13) (2,12) (2,12) (4,12) (3,12) (2,11) (3,11) (3,13) (3,11) (3,12) (3,12) (3,12)
(3,12) (2,12)

## (a) Write the table of the joint distribution (X, Y).

(b) Write the table of the marginal distribution of X and of Y.
(c) Write the distribution of fermentation years for the wines with 12% alcohol.
(d) Write the distribution of the alcohol content for the wines with at least 3 years of
fermentation.

2) The table below has the number of pickup vans X and trucks Y of four different transport
companies. Determine whether the two variables are independent or not.

xi yi nij

1 2 3

2 2 2

1 3 9

2 3 6

3) The table below has the information about the number of workers of a certain company organized
by the number of weekly hours of work (X) and their (month) salaries (Y), in euros.

## X \Y 1750-2750 2750-3750 3750-4250 4250-4750 4750-6250

31-35 5 4 2 1 0

35-37 1 2 4 3 3

37-41 0 3 4 2 6

(a) Find the average of weekly work hours. If every employee works two extra hours per week,
what will be the new average?
(b) Find the most common salary for those employees working more than 35 hours a week.
(c) Determine the quartiles for the number or work hours of those employees whose salaries are
between 2750 and 4750 euros.
(d) Study the concentration of salaries.
(e) Determine the distribution of work hours for the salaries below 3750 euros, in relative
frequencies.

4) The government of Spain requires people with one of the following diseases to report it to their
local health center: tuberculosis, typhus, measles and meningitis. The information has been
collected and presented in the following table, for patients at most 25 years-old. The data is in
thousands of people.
Age

## Meningitis 0.6 0.7 0.8 0.7 1.5

(a) Determine the marginal distributions of the variables age and disease.
(b) Determine the age distribution of the disease “tuberculosis”, and the disease distribution
of the age group from 10 to 15 years.
(c) Compute the average age, as well as the average age of those patients suffering from
meningitis.

5) Let be the variables X and Y, with the following joint absolute frequencies:

X 2 13 15 20 23 25
Y
4 5 13 28 1 4 18
7 4 20 33 3 2 6
14 3 16 11 4 3 7
15 14 8 13 16 5 2
17 8 14 24 21 3 3
Calculate:
(a) The marginal distributions of variables X and Y.
(b) The conditional distributions X/Y=13 and Y/X=15.
(c) The means and variances of X, Y, X/Y=13 and Y/X=15.

6) An economist studies the relation between two variables: gold price along the XXth century (X) and
the net benefits of stock markets (in constant € of 2017) during the same years (Y). Now the
economist wants to include the following transformations on its variables:
(a) The gold prices, expressed on dollars, have to be expressed on constant euros of 2017. We call
the new variable W.
(b) The net benefits have to be increased in a p% and add an extra benefit of E € from an additional
investment. We call the new variable Z.

## We have the following information:

S2X = 123
S2W = 78.72
SXY = 66
SWZ = 58.08

Obtain p.

7) A company has surveyed 200,000 people, asking them about the kind of beer consumed. The results
are classified by consumer age in the following table (in thousands):

Kind of beer

## Age Low fermentation High fermentation Gluten free

0-25 55 12 3

26-45 33 30 2

46-60 30 19 1

61-85 5 9 1

(a) Obtain the distribution in relative terms of the different analyzed beers.
(b) Calculate the average age of the low fermentation beer consumers. Is it representative?
(c) ¿Which kind of beer is the most commonly consumed by 26 to 60 years old consumers?
(d) Which is the age of the youngest consumer among the oldest 20%?

8) In a survey in which we interviewed 480 families, we obtained the following data about monthly
incomes (X) and saving accounts deposits (Y) in banks.
X
0-200 200-500 500-2000 2000-10000
Y
50-100 40 12 8 -
100-150 16 48 12 4
150-250 8 80 92 20
250-500 4 40 72 24

Assuming that the class marks are representative for each interval:
(a) Calculate the values of n1 , n2 , n2 and n3.
(b) Express, in so much per one, the values of: f12, f23, f34, f42, f2, f3, f2 and f4.
(c) Express, in percentage, the values obtained by: f13, f21, f32, f44, f1, f3, f3 y f4.
(d) Express, in so much per one, the values obtained by:
1. f(X1/Y=350) f(X2/Y=1250) f(Y1/X=375) f(Y2/X=200)
(e) Express, in percentage, the values obtained by
2. f(X3/Y=1250) f(X4/Y=6000) f(Y2/X=375) f(Y3/X=200)
(f) Calculate the marginal averages of X and Y.

9) A company has started selling a new product. The results of a survey in which 10 people were asked
to evaluate from 1 to 5 (worse to better) the following variables appear in the following table. The
meaning of each variable is as follows:
X1 = ‘’product global valuation’’ X2 = ‘’price-quality relation’’
X3 = ‘’product capability’’ X4 = ‘’advertising campaign’’

X1 X2 X3 X4
1 2 2 1
3 4 3 4
5 3 4 5
3 4 3 4
2 3 2 2
3 4 3 4
4 5 5 5
1 2 2 1
3 4 2 3
4 2 4 4
Calculate:
(a) The four one-dimensional marginal distributions.
(b) The distribution of the advertising campaign for punctuations 3 in the global valuation of the
product.

10) The joint distribution of the work surface in hectares (variable X) in a certain province, and the
wheat production in tons (variable Y) for year 2000, is showed in the following table:

X
Y [1,5; 2,5) [2,5; 3,5) [3,5; 4,5) [4,5; 5,5)
(1, 2] 3 4 6 9
(2, 3] 4 5 8 11
(3, 4] 5 8 11 13
(4, 5] 4 7 9 10
(a) Obtain the averages and variances of the marginal variables.
(b) Obtain the averages and variances of variable X conditioned to 3.5Y<4.5.
(c) Find the covariance.

11) The following data shows the results of a survey: for each territory in Spain, we have number of
unemployed per 1000 active people (X) and the number of civil servants per 1000 people (Y). Use
the data to:

## X\Y # of # civil servants

unemployed
Andalucía 283 58
Aragón 135 65.8
Asturias 146 53.8
Canarias 249 55.6
Cantabria 129 55.4
Castilla la M. 221 57.3
Castilla y L. 148 66.6
Cataluña 149 40.7
C. Valenciana 192 45.4
Galicia 163 54.7
Islas Baleares 138 48
La Rioja 109 52
Murcia 186 55
Navarra 100 51.9
País Vasco 123 52.5

## (a) Build the scatter plot. Comment the graph.

(b) Build the frequency table.
(c) Analyze the independence of the involved variables.
(d) Calculate the covariance between both variables.
(e) Which trends do we seem to detect on this distribution?

12) A survey studies the relation between incomes in millions of euros (X) and number of employees
(Y) per company.

## Company Incomes Nº of employees

ACME 95.877 124
Soilent Green 89.328 136
Spektra 84.482 117
Tyco 66.536 130
Umbrella Coorp. 51.272 122
Capsule Coorp. 49.179 125
Talgo 46.659 106
EcoFinance 36.969 69
BestInverse 34.095 60
Carnap SA 32.187 101
Nox SRL 29.652 55
Forge World 29.564 90
Ombuddies 29.402 53

In class we have seen how to obtain the covariance from a bivariate distribution frequency table. You can
build the table from this dataset and apply the formula.
An alternative is to use the formula that calculates the covariance directly from the raw data. This formula
can only be used when no pair of values (X,Y) is obtained more than once:
𝑁
1
𝑆𝑥𝑦 = ∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)
𝑁
𝑖=1

## where xi and yi represent the different values in the table.

Find:
(a) The covariance between variables X and Y. Interpret the result.
(b) A local currency (LC) has the following exchange rate: 1 LC = 2 €. If we convert our
incomes into this local currency, how will it affect the covariance?
(c) If all companies decide to increase in 10 the number of employees, how will it affect the
covariance?

13) We throw simultaneously 24 pairs of dice (X,Y), obtaining the following result:
(1,2) (2,3) (2,1) (3,1) (4,6) (1,6) (4,1) (5,2) (3,6) (3,4) (5,3) (4,2)
(2,5) (5,1) (4,2) (1,6) (6,2) (5,1) (1,6) (3,4) (3,5) (4,1) (4,2) (6,5)

## (a) Represent the scatter plot

(b) Which formula do we have to use if we want to calculate the covariance
(c) Obtain the covariance
(d) Which is the average result of die Y when the die X was even?
(e) Which is the percentage of even results of die Y when die X was 3?