Sie sind auf Seite 1von 11

STATISTICS – APPLICATIONS

Application 1

The number of part-time employees, for 9 randomly selected firms in the tourism domain is presented below:

4 10 12 9 16 18 18 22 8

a. Describe the population, the sample, the statistical unit, and the variable. Classify the variable and specify
its measurement scale.
b. Determine the mean, the median and the modal number of “part-time” employees and interpret the results
obtained.
c. Analyze the homogeneity of the data-set.
d. Determine and interpret the quartiles of the data.
e. Identify the outliers in the data-set.
f. Analyze the skewness and the kurtosis of the data-set.
g. Compute the mean and the variance of a binary variable, if its favorable case is given by firms with at least
16 part-time employees.
h. Fill in the “Descriptive Statistics” table:

Number of part-time employees

Mean …
Median …
Mode …
Standard Deviation …
Sample Variance …
Kurtosis -0,95
Skewness -0,65
Range …
Minimum …
Maximum …
Sum …
Count …

Solution:

a. total population: all the firms in the tourism domain.


sample: the 9 firms selected
statistical unit: one firm
variable: the number of part-time employees.
It’s an attributive, numerical, non-binary, discontinuous variable.
Measurement scale: ratio scale.

b. The following notations will be used:


X – the variable (number of part-time employees)
n = 9 the sample size
n<=30 small sized sample
xi, 𝑖 = ̅̅̅̅
1,9 are the values of the variable
It is required to determine the mean, the median and the mode.

1
The simple arithmetic mean (used with ungrouped data):

∑ 𝑥𝑖 4 + 10 + 14 + ⋯ + 8 117
𝑥̅ = = = = 13 𝑒𝑚𝑝𝑙𝑜𝑦𝑒𝑒𝑠
𝑛 9 9
Interpretation: On average, one firm in the sample has 13 part-time employees.

The Median (Me):

The following steps will be taken:

1. Rank ascendingly (increasingly) the values:

4 8 9 10 12 16 18 18 22

2. Identify the Median position (location):


𝑛+1 9+1
𝑀𝑒 𝑝𝑜𝑠 = = =5
2 2
th
3. The 5 value is the middle value. This is the Median. Me=12 employees
Interpretation:

50% of the firms in the sample have less than 12 part-time employees and 50% - more.

The Mode (Mo):

It’s the value that occurs most often. (or the value with the maximum frequency) . This is 18, the only value which
appears twice. Thus, Mo=18 employees.

Interpretation: Most of the firms in the sample have 18 part-time employees.

c. The homogeneity of the series is analyzed using the coefficient of variation. Thus, the series is
homogeneous (and the mean is representative) if the value of the coefficient of variation is lower than 35%.
The coefficient of variation is given by:
𝑠
𝑣 = ∙ 100
𝑥̅
For this, first we have to determine the variance (s2), then the standard deviation (s):

∑(𝑥𝑖 − 𝑥̅ )2 (4 − 13)2 + (8 − 13)2 + ⋯ + (22 − 13)2


𝑠2 = = = 34
𝑛−1 9−1
We worked on ranked data-series.

𝑠 = √𝑠 2 = √34 = 5,83 employees


5,83
𝑣= 13
∙ 100 = 44,85% > 35%, so the series is not homogeneous, and the mean is not representative.

d. The quartiles of the data-set are values which divide the ranked data-series into 4 equal parts. There are 3
quartiles: Q1, Q2 (=Me) and Q3.
We can determine the quartiles (Q1 and Q3) by following the same steps as in the Median case.
Once we have ranked the data, we identify the Q1 position:
𝑛+1 9+1
𝑄1 𝑝𝑜𝑠 = = = 2,5
4 4
nd rd
Q1 is the average of the 2 and the 3 value in the ranked data-set.

2
8+9
𝑄1 = = 8,5
2
Interpretation: 25% of the firms have less than 8,5 ~9 part-time employees, and 75% of the firms have
more than 9 part-time employees.
Q2 = Me = 12 employees.
Then we identify the Q3 position:
3(𝑛 + 1) 3(9 + 1)
𝑄3 𝑝𝑜𝑠 = = = 3 ∙ 2,5 = 7,5
4 4
th th
Q3 is the average between the 7 and the 8 values in the ranked data set.
18+18
𝑄3 = 2
= 18 employees
Interpretation: 75% of the firms have less than 18 part-time employees, while 25% of the firms have more
than 18 part-time employees.

e. Outliers are values that meet one of the following two conditions:
xi  Q1  1,5  IQR or xi  Q3  1,5  IQR
IQR = Q3 – Q1 = 18-8,5=9,5, where IQR is the interquartile range.
Q1-1,5 x IQR = 8,5 – 1,5 x 9,5 = -5,75
Q3+1,5 x IQR = 18 + 1,5 x 9,5 = 32,25
There are no values in the data-series lower than -5,75 or higher than 32,25, so we’ll conclude that there
are no outliers in the data set.

f. The skewness is analyzed using the Pearson’s coefficient of skewness or the Fisher’s coefficient of
skewness.
The Pearson’s coefficient of skewness is given by the following relation:
𝑥̅ − 𝑀𝑜 13 − 18
𝑠𝑘(𝑃) = = = −0,85
𝑠 5,83
𝑠𝑘(𝑃) <0 so there is a negative skewness, large values predominate in the data set.
As 𝑠𝑘(𝑃) < ~ -1 there is a strong (negative) skewness.
Or we can use the Fisher’s coefficient of skewness, which is displayed in the Descriptive Statistics table:
Skewness=-0,65<0 so there is a negative skewness, large values prevail in the data series.
As 0,5 < |𝑠𝑘| < 1 there is a medium skewness.
The kurtosis is analyzed using the Fisher’s coefficient of kurtosis, which is displayed in the Descriptive
Statistics table: kurtosis = k = -0,95<0, which means that the distribution of firms by the number of part-
time employees is less curved (flatter) than the normal distribution, and the values are less concentrated
around the mean than in the normal distribution.

g. We create the following binary variable:


- favorable case: firms with at least 16 part-time employees;
- unfavorable case: firms with more than 16 part-time employees.

We find out the number of firms that meet this condition (of having at least 16 part-time employees). Let m be this
number

m=4 (there are four values equal to 16 or lower than 16: 16, 18, 18, 22)

The mean of the binary variable is given by:

𝑚 4
𝑓= = = 0,44 (44%)
𝑛 9

The variance of the binary variable is given by: 𝑠𝑏2 = 𝑓 ∙ (1 − 𝑓) = 0,44 ∙ (1 − 0,44) = 0,24

3
h. We fill in the Descriptive Statistics table with the values of the indicators previously determined.

Descriptive Statistics table Notations


The variable: Number of part-time employees
Mean 13 𝑥̅
Standard Error 1,94 𝜎𝑥̅
Median 12 Me
Mode 18 Mo
Standard Deviation 5,83 s
Sample Variance 34 s2
Kurtosis -0,95 k
Skewness -0,65 sk
Range 18 R= xmax - xmin
Minimum 4 xmin
Maximum 22 Xmax
Sum 117 Σxi
Count 9 n

Application 2

For 150 clients of a cosmetics store, randomly selected, the monthly amounts of money spent on acquiring a certain
product were recorded (lei):

Amount spent (lei) 40 50 60 70 80 90


Number of clients 8 12 24 60 30 16
a) Analyze the shape of the clients’ distribution by the amount spent, using an appropriate chart.
b) Compute the relative frequencies and the ascending relative frequencies, interpret the third value.
c) What is the average monthly amount spent by one client in the sample? Is it representative? Why?
d) Fill in the following statements:
- Half of the clients spent less than …….. lei on acquiring the product.
- Most of the clients spent …. lei on acquiring the product.
e) Analyze the skewness of the data, using an appropriate indicator.
f) Determine the mean and the variance for a binary variable, knowing that its favorable case is defined by
the clients who spent at most 60 lei on acquiring the product.

Solution:

X = the variable = the amount of money spent

n=150 sample size

n>150 large sized sample

r=6 (number of groups)

xi, 𝑖 = ̅̅̅̅
1,6 the values of the variable (distinct variable)

4
a) The frequency polygon:

The clients' distribution by the amount


spent (lei)
70
60
50
40
30
20
10
0
0 20 40 60 80 100

It’s an approximately normal distribution, with a negative skewness towards large values, so large values prevail in
the data set.

b) We determine the relative frequencies ni* (%) = ni/n*100. The results are shown in the 3rd column of the
table below.
We compute the ascending cumulative relative frequencies Fai* (%) (column 4)
The third value shows that 29,33% of the clients in the sample spent at most 60 lei on acquiring the product
(meaning 40 or 50 or 60 lei)

c) The mean is determined as the weighted arithmetic mean (used for grouped data)
We use column no. 5 in the table below.
∑ 𝑥𝑖 ∙ 𝑛𝑖 10400
𝑥̅ = = = 69,33~69 𝑙𝑒𝑖
∑ 𝑛𝑖 1509

Interpretation: On average, one client in the sample spent 69 lei per month on acquiring the cosmetic product.

Amount Number Fai


(xi) (lei) of clients
(ni) Fai*
ni*(%) (%) xi ni xi-𝑥̅ (xi-𝑥̅ )2 (xi-𝑥̅ )2· ni
1 2 3 4 5 6 7 8 9

40 8 5,33 5,33 320 -29 841 6728 8


50 12 8 13,33 600 -19 361 4332 20
60 24 16 29,33 1440 -9 81 1944 44
70 60 40 69,33 4200 1 1 60 104
80 30 20 89,33 2400 11 121 3630 134
90 16 10,67 100,00 1440 21 441 7056 150
Total 150 100 - 10400 - - 23750
We analyze the representativeness of the mean using the coefficient of variation:
𝑠
𝑣= ∙ 100
𝑥̅
For this, first we determine the variance (s2), then the standard deviation (s):

∑(𝑥𝑖 − 𝑥̅ )2 ∙ 𝑛𝑖 23750
𝑠2 = = = 158,33
𝑛 150
5
There were determined columns 6,7,8 in the above table.

𝑠 = √𝑠 2 = √158,33 = 12,58 lei


12,58
𝑣= 69
∙ 100 = 18,23% < 35%, the series is homogeneous, the mean is representative.

d) In order to fill in the first statement, we determine the Median of the data set (Me).

The Median is determined by performing the following steps:

- we compute the ascending absolute cumulative frequencies: see column 9 in the above table (Fai)
- we determine the Median position in the data-set:

∑ 𝑛𝑖 + 1 𝑛 + 1 151
𝑀𝑒 𝑝𝑜𝑠 = = = = 75,5
2 2 2
- we find the first Fai > Me pos. This is 104.
- we determine the values of the variable (in the first column of the table) corresponding to the
previously determined cumulative frequency. This value is the Median.
Me = 70 lei

Interpretation: 50% of the clients spent less than 70 lei on acquiring the product, and 50% - more. We fill in the
first statement with“70”.

The second statement will be filled in with the Mode of the data set.

The Mode is the value “xi” with the highest frequency. The highest frequency is 60 (see the column with ni),
Mo=70 lei.

Interpretation: Most of the clients spent 70 lei on acquiring the product.

e) The skewness is analyzed using the Pearson’s coefficient of skewness, given by the relation:
𝑥̅ − 𝑀𝑜 69 − 70
𝑠𝑘(𝑃) = = = −0,08
𝑠 12,58
𝑠𝑘(𝑃) <0 there is a negative skewness, large values predominate in the data series.

As 𝑠𝑘(𝑃) ~ 0 there is a weak skewness.

f) We create the following binary variable:


- favorable case: clients who spent at most 60 lei on acquiring the product;
- unfavorable case: clients who spent more than 60 lei on acquiring the product

We find out the number of clients who meet this condition (of spending at most 60 lei on acquiring the product).
Let m be this number.

m=8 + 12 + 24 = 44 clients (8 clients who spent 40 lei + 12 clients who spent 50 lei + 24 de clients who spent 60
lei)

The mean of the binary variable is given by:

𝑚 44
𝑓= = = 0,29 (29%)
𝑛 150

The variance of the binary variable is given by: 𝑠𝑏2 = 𝑓 ∙ (1 − 𝑓) = 0,29 ∙ (1 − 0,29) = 0,20

6
Application 3.

For 45 randomly selected firms, the number of employees in the previous year was recorded. After processing the
data, the following results were recorded:
Number of employees a. Describe the central tendency, the variability and the shape of
Mean …. the data series, using appropriate indicators.
Median 80 b. Knowing that:
Mode 72 - 25% of the firms in the sample have less than 78
Standard Deviation ….. employees
Sample Variance 244.42 - interquartile range is 8,
Kurtosis -0.33 specify if the minimum and the maximum values are outliers.
Skewness 0.28
Range 65
Minimum 50
Maximum ….
Sum 3735
Count …

Solution:

a. X – the variable– number of employees


n=45 the sample size
n>30 large sized sample
xi – the values of the variable, 𝑖 = ̅̅̅̅̅̅
1,45

I. Central tendency:

Mean:
∑ 𝑥𝑖 𝑆𝑢𝑚 3735
𝑥̅ = = = = 83 𝑒𝑚𝑝𝑙𝑜𝑦𝑒𝑒𝑠
𝑛 𝐶𝑜𝑢𝑛𝑡 45
Interpretation: On average, a firm in the sample has 83 employees.

Median:

Me=88 (see the table Descriptive Statistics)

Interpretation: 50% of the firms have less than 88 employees, while 50% - more.

Mode:

Mo=72 employees

Interpretation: most of the firms have 72 employees.

II. Variability:

It is characterized using the following indicators:

- Range: R = xmax-xmin=65 employees

Interpretation: the difference between the maximum and the minimum number of employees is 65 employees.

- Sample variance: s2 = 244,42


7
- Standard deviation: s = √244,42 = 15,63 employees

Interpretation: the number of employees in a firm differs, on average, by 15,63 ~ 16 employees from the sample
mean.

- Coefficient of variation:
𝑠 15,63
𝑣= ∙ 100 = ∙ 100 = 18,83% < 35%
𝑥̅ 83
Interpretation: the series is homogeneous, the mean is representative.

III. The shape of the distribution of firms by the number of employees

Skewness Fishers’ coefficient of skewness: skewness = sk = 0,18>0 si 0< |𝑠𝑘|<0,5

Interpretation: the series has a positive weak skewness, (the value of the coefficient is positive and ranges between
0 and0,5), small values prevail in the data set.

Kurtosis: coefficient of kurtosis: kurtosis = k = -0,33 < 0

Interpretation: the distribution is platykurtic (lower, less curved, flatter than the normal distribution), so the values
are less concentrated around the mean than in the normal distribution.

b. From the statement: “25% of the firms in the sample have less than 78 employees” we can say that Q1 = 78.

From the statement: “The interquartile range is 8” we can say that IQR = 8
IQR=Q3-Q1 so Q3 = Q1 + IQR = 78+8=86.
Outliers are values that meet one of the following two conditions:
xi  Q1  1,5  IQR or xi  Q3  1,5  IQR

Q1-1,5 x IQR = 78 – 1,5 x 8= 66


Q3+1,5 x IQR = 86 + 1,5 x 8 = 98
x_min = 50
x_max = A + x_min = 50 + 65 = 115
As x_min=50 < 66 and x_max = 115 > 98, the minimum and the maximum values are outliers (extreme
values).

Application 4.

For 10 supermarkets located in two ares of a town (A1, A2) it was recorded the profit obtained in the
previous year (million lei). The data – grouped by the location area of each supermarket – are presented in
the following table:
Location area Profit in previous year (million lei)
A1 20; 23; 26; 23; 28
A2 18; 15; 21; 16; 20

a) Compute the average profit value for each location area and identify the most representative.
b) To what extent the variability in the profit value is explained by random factors?

a) The two variables are:


- the grouping variable : location area (qualitative variable)
- variable of interest (X) – the profit (quantitative variable)
8
n =10 (sample size)
n1 = 5
n2 = 5
r=2 (number of group)

We compute the group-means, the group-variances, the group-standard deviation, the group-coefficient of
variation:

Group 1 (Area1)

20+23+26+23+28 120
𝑥̅1 = = = 24 mill. lei
5 5

(20 − 24)2 + (23 − 24)2 + (26 − 24)2 + (23 − 24)2 + (28 − 24)2
𝑠12 = = 9,5
5−1

𝑠1 = √𝑠12 = √9,5 = 3,08 mill. lei

𝑠1 3,08
𝑣1 = ∙ 100 = ∙ 100 = 12,83%
𝑥̅1 24

Group 2 (Area 2)

18+15+21+16+20 90
𝑥̅2 = = = 18 mill. lei
5 5

(18 − 18)2 + (15 − 18)2 + (21 − 18)2 + (16 − 18)2 + (20 − 18)2
𝑠22 = = 6,5
5−1

𝑠2 = √𝑠22 = √6,5 = 2,55 mill. lei

𝑠2 2,55
𝑣2 = ∙ 100 = ∙ 100 = 14,16%
𝑥̅2 18

As v1 and v2 < 35% both groups are homogeneous, both means are representative.

Because v1 < v2 the first group is more homogeneous, the first mean is more representative.

The results previously determined are summarized in the following table:

SUMMARY
Standard
Count Average Variance
Groups Sum Deviation vi(%)
(ni) (𝑥̅𝑖 ) (𝑠𝑖2 )
(𝑠𝑖 )
A1 5 120 24 9,5 3,08 12,83
A2 5 90 18 6,5 2,55 14,16

b) The overall mean:


∑ 𝑥̅𝑖 ∙ 𝑛𝑖 24 ∙ 5 + 18 ∙ 5 120 + 90
𝑥̅ = = = = 21 𝑚𝑖𝑙𝑙. 𝑙𝑒𝑖
∑ 𝑛𝑖 10 10

The Sum of Squares Between Groups:

𝑆𝑆𝐵 = ∑(𝑥̅𝑖 − 𝑥̅ )2 ∙ 𝑛𝑖 = (24 − 21)2 ∙ 5 + (18 − 21)2 ∙ 5 = 90

9
The Sum of Squares Within Groups:

𝑆𝑆𝑊 = ∑ 𝑠𝑖2 ∙ (𝑛𝑖 − 1) = 9,5 ∙ 4 + 6,5 ∙ 4 = 64

The Total Sum of Squares: SST = SSB + SSW = 90 + 64 = 154

The coefficient of determination:

𝑆𝑆𝐵 90
𝑅2 = = = 0,58 (58%)
𝑆𝑆𝑇 154
58% of the total variability in the profit is explained by the location area.

100-58=42%

42% of the total variability in the profit is explained by random factors (others than the location area).

Application 5 (variant of application 4)

For 10 supermarkets located in two ares of a town (A1, A2) it was recorded the profit obtained in the
previous year (million lei). The data – grouped by the location area of each supermarket – are processed
and the results are presented in the following table:

Groups Count (ni) Sum Variance (𝑠𝑖2 )


A1 5 120 9,5
A2 5 90 6,5

a) Which group is most homogeneous with respect to the profit? Why?


b) Determine the % influence of the location area on the profit variability.
c) Analyze the representativeness of the overall mean.

a) The two variables are:


- the grouping variable : location area (qualitative variable)
- variable of interest (X) – the profit (quantitative variable)
n =10 (sample size)
n1 = 5
n2 = 5
r=2 (number of group)

We compute the standard deviations and the coefficients of variation for the two groups:

𝑠1 = √𝑠12 = √9,5 = 3,08 mill. lei

𝑠1 3,08
𝑣1 = ∙ 100 = ∙ 100 = 12,83%
𝑥̅1 24

𝑠2 = √𝑠22 = √6,5 = 2,55 mill. lei

𝑠2 2,55
𝑣2 = ∙ 100 = ∙ 100 = 14,16%
𝑥̅2 18

As v1 and v2 < 35%, both groups are homogeneous, both means are representative.
10
Because v1 < v2 the first group is more homogeneous, the first mean is more representative.

b) The overall mean:


∑ 𝑥̅𝑖 ∙ 𝑛𝑖 24 ∙ 5 + 18 ∙ 5 120 + 90
𝑥̅ = = = = 21 𝑚𝑖𝑙𝑙. 𝑙𝑒𝑖
∑ 𝑛𝑖 10 10

The Sum of Squares Between Groups:

𝑆𝑆𝐵 = ∑(𝑥̅𝑖 − 𝑥̅ )2 ∙ 𝑛𝑖 = (24 − 21)2 ∙ 5 + (18 − 21)2 ∙ 5 = 90

The Sum of Squares Within Groups:

𝑆𝑆𝑊 = ∑ 𝑠𝑖2 ∙ (𝑛𝑖 − 1) = 9,5 ∙ 4 + 6,5 ∙ 4 = 64

The Total Sum of Squares: SST = SSB + SSW = 90 + 64 = 154

The coefficient of determination:


𝑆𝑆𝐵 90
𝑅2 = = = 0,58 (58%)
𝑆𝑆𝑇 154
58% of the total variability in the profit is explained by the location area.

100-58=42%

42% of the total variability in the profit is explained by random factors (others than the location area).

c) We determine the total variance:


𝑆𝑆𝑇 154
𝑠𝑇2 = = = 17,11
𝑛−1 9
The total standard deviation (at sample level):

𝑠𝑇 = √𝑠𝑇2 = √17,11 = 4,13


We determine the total coefficient of variation:
𝑠𝑇 4,13
𝑣𝑇 = ∙ 100 = ∙ 100 = 19,67%
𝑥̅ 21
𝑣𝑇 < 35% the overall mean is representative.

11

Das könnte Ihnen auch gefallen