Beruflich Dokumente
Kultur Dokumente
Application 1
The number of part-time employees, for 9 randomly selected firms in the tourism domain is presented below:
4 10 12 9 16 18 18 22 8
a. Describe the population, the sample, the statistical unit, and the variable. Classify the variable and specify
its measurement scale.
b. Determine the mean, the median and the modal number of “part-time” employees and interpret the results
obtained.
c. Analyze the homogeneity of the data-set.
d. Determine and interpret the quartiles of the data.
e. Identify the outliers in the data-set.
f. Analyze the skewness and the kurtosis of the data-set.
g. Compute the mean and the variance of a binary variable, if its favorable case is given by firms with at least
16 part-time employees.
h. Fill in the “Descriptive Statistics” table:
Mean …
Median …
Mode …
Standard Deviation …
Sample Variance …
Kurtosis -0,95
Skewness -0,65
Range …
Minimum …
Maximum …
Sum …
Count …
Solution:
1
The simple arithmetic mean (used with ungrouped data):
∑ 𝑥𝑖 4 + 10 + 14 + ⋯ + 8 117
𝑥̅ = = = = 13 𝑒𝑚𝑝𝑙𝑜𝑦𝑒𝑒𝑠
𝑛 9 9
Interpretation: On average, one firm in the sample has 13 part-time employees.
4 8 9 10 12 16 18 18 22
50% of the firms in the sample have less than 12 part-time employees and 50% - more.
It’s the value that occurs most often. (or the value with the maximum frequency) . This is 18, the only value which
appears twice. Thus, Mo=18 employees.
c. The homogeneity of the series is analyzed using the coefficient of variation. Thus, the series is
homogeneous (and the mean is representative) if the value of the coefficient of variation is lower than 35%.
The coefficient of variation is given by:
𝑠
𝑣 = ∙ 100
𝑥̅
For this, first we have to determine the variance (s2), then the standard deviation (s):
d. The quartiles of the data-set are values which divide the ranked data-series into 4 equal parts. There are 3
quartiles: Q1, Q2 (=Me) and Q3.
We can determine the quartiles (Q1 and Q3) by following the same steps as in the Median case.
Once we have ranked the data, we identify the Q1 position:
𝑛+1 9+1
𝑄1 𝑝𝑜𝑠 = = = 2,5
4 4
nd rd
Q1 is the average of the 2 and the 3 value in the ranked data-set.
2
8+9
𝑄1 = = 8,5
2
Interpretation: 25% of the firms have less than 8,5 ~9 part-time employees, and 75% of the firms have
more than 9 part-time employees.
Q2 = Me = 12 employees.
Then we identify the Q3 position:
3(𝑛 + 1) 3(9 + 1)
𝑄3 𝑝𝑜𝑠 = = = 3 ∙ 2,5 = 7,5
4 4
th th
Q3 is the average between the 7 and the 8 values in the ranked data set.
18+18
𝑄3 = 2
= 18 employees
Interpretation: 75% of the firms have less than 18 part-time employees, while 25% of the firms have more
than 18 part-time employees.
e. Outliers are values that meet one of the following two conditions:
xi Q1 1,5 IQR or xi Q3 1,5 IQR
IQR = Q3 – Q1 = 18-8,5=9,5, where IQR is the interquartile range.
Q1-1,5 x IQR = 8,5 – 1,5 x 9,5 = -5,75
Q3+1,5 x IQR = 18 + 1,5 x 9,5 = 32,25
There are no values in the data-series lower than -5,75 or higher than 32,25, so we’ll conclude that there
are no outliers in the data set.
f. The skewness is analyzed using the Pearson’s coefficient of skewness or the Fisher’s coefficient of
skewness.
The Pearson’s coefficient of skewness is given by the following relation:
𝑥̅ − 𝑀𝑜 13 − 18
𝑠𝑘(𝑃) = = = −0,85
𝑠 5,83
𝑠𝑘(𝑃) <0 so there is a negative skewness, large values predominate in the data set.
As 𝑠𝑘(𝑃) < ~ -1 there is a strong (negative) skewness.
Or we can use the Fisher’s coefficient of skewness, which is displayed in the Descriptive Statistics table:
Skewness=-0,65<0 so there is a negative skewness, large values prevail in the data series.
As 0,5 < |𝑠𝑘| < 1 there is a medium skewness.
The kurtosis is analyzed using the Fisher’s coefficient of kurtosis, which is displayed in the Descriptive
Statistics table: kurtosis = k = -0,95<0, which means that the distribution of firms by the number of part-
time employees is less curved (flatter) than the normal distribution, and the values are less concentrated
around the mean than in the normal distribution.
We find out the number of firms that meet this condition (of having at least 16 part-time employees). Let m be this
number
m=4 (there are four values equal to 16 or lower than 16: 16, 18, 18, 22)
𝑚 4
𝑓= = = 0,44 (44%)
𝑛 9
The variance of the binary variable is given by: 𝑠𝑏2 = 𝑓 ∙ (1 − 𝑓) = 0,44 ∙ (1 − 0,44) = 0,24
3
h. We fill in the Descriptive Statistics table with the values of the indicators previously determined.
Application 2
For 150 clients of a cosmetics store, randomly selected, the monthly amounts of money spent on acquiring a certain
product were recorded (lei):
Solution:
xi, 𝑖 = ̅̅̅̅
1,6 the values of the variable (distinct variable)
4
a) The frequency polygon:
It’s an approximately normal distribution, with a negative skewness towards large values, so large values prevail in
the data set.
b) We determine the relative frequencies ni* (%) = ni/n*100. The results are shown in the 3rd column of the
table below.
We compute the ascending cumulative relative frequencies Fai* (%) (column 4)
The third value shows that 29,33% of the clients in the sample spent at most 60 lei on acquiring the product
(meaning 40 or 50 or 60 lei)
c) The mean is determined as the weighted arithmetic mean (used for grouped data)
We use column no. 5 in the table below.
∑ 𝑥𝑖 ∙ 𝑛𝑖 10400
𝑥̅ = = = 69,33~69 𝑙𝑒𝑖
∑ 𝑛𝑖 1509
Interpretation: On average, one client in the sample spent 69 lei per month on acquiring the cosmetic product.
∑(𝑥𝑖 − 𝑥̅ )2 ∙ 𝑛𝑖 23750
𝑠2 = = = 158,33
𝑛 150
5
There were determined columns 6,7,8 in the above table.
d) In order to fill in the first statement, we determine the Median of the data set (Me).
- we compute the ascending absolute cumulative frequencies: see column 9 in the above table (Fai)
- we determine the Median position in the data-set:
∑ 𝑛𝑖 + 1 𝑛 + 1 151
𝑀𝑒 𝑝𝑜𝑠 = = = = 75,5
2 2 2
- we find the first Fai > Me pos. This is 104.
- we determine the values of the variable (in the first column of the table) corresponding to the
previously determined cumulative frequency. This value is the Median.
Me = 70 lei
Interpretation: 50% of the clients spent less than 70 lei on acquiring the product, and 50% - more. We fill in the
first statement with“70”.
The second statement will be filled in with the Mode of the data set.
The Mode is the value “xi” with the highest frequency. The highest frequency is 60 (see the column with ni),
Mo=70 lei.
e) The skewness is analyzed using the Pearson’s coefficient of skewness, given by the relation:
𝑥̅ − 𝑀𝑜 69 − 70
𝑠𝑘(𝑃) = = = −0,08
𝑠 12,58
𝑠𝑘(𝑃) <0 there is a negative skewness, large values predominate in the data series.
We find out the number of clients who meet this condition (of spending at most 60 lei on acquiring the product).
Let m be this number.
m=8 + 12 + 24 = 44 clients (8 clients who spent 40 lei + 12 clients who spent 50 lei + 24 de clients who spent 60
lei)
𝑚 44
𝑓= = = 0,29 (29%)
𝑛 150
The variance of the binary variable is given by: 𝑠𝑏2 = 𝑓 ∙ (1 − 𝑓) = 0,29 ∙ (1 − 0,29) = 0,20
6
Application 3.
For 45 randomly selected firms, the number of employees in the previous year was recorded. After processing the
data, the following results were recorded:
Number of employees a. Describe the central tendency, the variability and the shape of
Mean …. the data series, using appropriate indicators.
Median 80 b. Knowing that:
Mode 72 - 25% of the firms in the sample have less than 78
Standard Deviation ….. employees
Sample Variance 244.42 - interquartile range is 8,
Kurtosis -0.33 specify if the minimum and the maximum values are outliers.
Skewness 0.28
Range 65
Minimum 50
Maximum ….
Sum 3735
Count …
Solution:
I. Central tendency:
Mean:
∑ 𝑥𝑖 𝑆𝑢𝑚 3735
𝑥̅ = = = = 83 𝑒𝑚𝑝𝑙𝑜𝑦𝑒𝑒𝑠
𝑛 𝐶𝑜𝑢𝑛𝑡 45
Interpretation: On average, a firm in the sample has 83 employees.
Median:
Interpretation: 50% of the firms have less than 88 employees, while 50% - more.
Mode:
Mo=72 employees
II. Variability:
Interpretation: the difference between the maximum and the minimum number of employees is 65 employees.
Interpretation: the number of employees in a firm differs, on average, by 15,63 ~ 16 employees from the sample
mean.
- Coefficient of variation:
𝑠 15,63
𝑣= ∙ 100 = ∙ 100 = 18,83% < 35%
𝑥̅ 83
Interpretation: the series is homogeneous, the mean is representative.
Interpretation: the series has a positive weak skewness, (the value of the coefficient is positive and ranges between
0 and0,5), small values prevail in the data set.
Interpretation: the distribution is platykurtic (lower, less curved, flatter than the normal distribution), so the values
are less concentrated around the mean than in the normal distribution.
b. From the statement: “25% of the firms in the sample have less than 78 employees” we can say that Q1 = 78.
From the statement: “The interquartile range is 8” we can say that IQR = 8
IQR=Q3-Q1 so Q3 = Q1 + IQR = 78+8=86.
Outliers are values that meet one of the following two conditions:
xi Q1 1,5 IQR or xi Q3 1,5 IQR
Application 4.
For 10 supermarkets located in two ares of a town (A1, A2) it was recorded the profit obtained in the
previous year (million lei). The data – grouped by the location area of each supermarket – are presented in
the following table:
Location area Profit in previous year (million lei)
A1 20; 23; 26; 23; 28
A2 18; 15; 21; 16; 20
a) Compute the average profit value for each location area and identify the most representative.
b) To what extent the variability in the profit value is explained by random factors?
We compute the group-means, the group-variances, the group-standard deviation, the group-coefficient of
variation:
Group 1 (Area1)
20+23+26+23+28 120
𝑥̅1 = = = 24 mill. lei
5 5
(20 − 24)2 + (23 − 24)2 + (26 − 24)2 + (23 − 24)2 + (28 − 24)2
𝑠12 = = 9,5
5−1
𝑠1 3,08
𝑣1 = ∙ 100 = ∙ 100 = 12,83%
𝑥̅1 24
Group 2 (Area 2)
18+15+21+16+20 90
𝑥̅2 = = = 18 mill. lei
5 5
(18 − 18)2 + (15 − 18)2 + (21 − 18)2 + (16 − 18)2 + (20 − 18)2
𝑠22 = = 6,5
5−1
𝑠2 2,55
𝑣2 = ∙ 100 = ∙ 100 = 14,16%
𝑥̅2 18
As v1 and v2 < 35% both groups are homogeneous, both means are representative.
Because v1 < v2 the first group is more homogeneous, the first mean is more representative.
SUMMARY
Standard
Count Average Variance
Groups Sum Deviation vi(%)
(ni) (𝑥̅𝑖 ) (𝑠𝑖2 )
(𝑠𝑖 )
A1 5 120 24 9,5 3,08 12,83
A2 5 90 18 6,5 2,55 14,16
9
The Sum of Squares Within Groups:
𝑆𝑆𝐵 90
𝑅2 = = = 0,58 (58%)
𝑆𝑆𝑇 154
58% of the total variability in the profit is explained by the location area.
100-58=42%
42% of the total variability in the profit is explained by random factors (others than the location area).
For 10 supermarkets located in two ares of a town (A1, A2) it was recorded the profit obtained in the
previous year (million lei). The data – grouped by the location area of each supermarket – are processed
and the results are presented in the following table:
We compute the standard deviations and the coefficients of variation for the two groups:
𝑠1 3,08
𝑣1 = ∙ 100 = ∙ 100 = 12,83%
𝑥̅1 24
𝑠2 2,55
𝑣2 = ∙ 100 = ∙ 100 = 14,16%
𝑥̅2 18
As v1 and v2 < 35%, both groups are homogeneous, both means are representative.
10
Because v1 < v2 the first group is more homogeneous, the first mean is more representative.
100-58=42%
42% of the total variability in the profit is explained by random factors (others than the location area).
11