Sie sind auf Seite 1von 15

1

DESCRIPTIVE STATISTICS
MEASURES OF CENTRAL TENDENCY

ROUNDING OF DATA

Rounding a number to the nearest unit (tenth, or other decimal place) reduces it to the
number of significant digits warranted in the particular computation. When the remainder
to be rounded off is “exactly 5,” the conversion is to round to the nearest even number.
By this practice the additions due to rounding will tend to counterbalance the subtractions
due to rounding in the long run.

Examples:
1. 18.758 rounded to the nearest tenth = 18.8
2. 15.449 rounded to the nearest hundredth = 15.45
3. 15.449 rounded to the nearest tenth = 15.4
4. 18.05 rounded to the nearest tenth = 18.0
5. 89.1750 rounded to the nearest hundredth = 89.18
6. $63.50 rounded to the nearest dollar = $64 since 3 is odd and 5 is followed by
zero.
7. $64.50 rounded to the nearest dollar = $64 since 4 is even.
8. $64.52 rounded to the nearest dollar = $65 since 5 is not followed by zero.
9. 27.27 to the nearest tenth = 27.3
10. 27.27 to the nearest unit = 27
11. 188.549 to four significant digits = 188.5
12. 325.455 to the nearest hundredth = 325.46
13. 325.455 to the nearest tenth = 325.5 since 5 (hundredth position is not
followed by zero.
14. 325.455 to the nearest unit = 325
15. 0.05049 to two significant digits = 0.050
16. 0.05050 to two significant digits = 0.050 (zero before second 5 is considered
even)
17. 0.05050 to one significant digit = 0.05

The result of rounding a number such as 72.8 to the nearest unit is 73 since 72.8 is closer
to 73 than to 72. Similarly, 72.8146 rounded to the nearest hundredth or to two decimal
places is just as far from 72.81, since 72.8146 is closer to 72.18 than to 72.82.

In rounding 72.456 to the nearest hundredth, however, we are faced with a dilemma since
72.465 is just as far from 72.46 as from 72.47. It has become the practice in such cases to
round to the even integer preceding the 5. Thus 72.465 is rounded to 72.46, 183.575 is
rounded to 183.58, 116,500,000 rounded to the nearest million is 116,000,000 or can also
be written as 116 million. This practice is especially useful in minimizing cumulative
rounding errors when a large number of operations is involved.
2

POSITION OF DIGITS AT THE LEFT AND RIGHT OF THE DECIMAL POINT

DECIMAL POINT

LEFT RIGHT

Unit Tenth
Ten Hundredth
Hundred Thousandth
Thousand Ten Thousandth
Ten Thousand Hundred Thousandth
Hundred Thousand Millionth
Million Ten Millionth
Ten Million Hundred Millionth
Hundred Million

SIGNIFICANT DIGITS

As a general rule, leading zeros, such as in $00389 or in 0.00389 are never considered to
be significant digits. Both have only 3 significant figures. Embedded zeros which are
followed by at least one significant digit, such as in $3800.57, are all considered to be
significant. (We have 6 significant figures). Trailing zeros may or may not be significant
According to the level of accuracy in the original data.
Examples:
1. If “$3800” has two significant digits, then the measurement was made to the
nearest hundred dollars, the maximum error is $50, and the value is located
between $3750 and $3850.
2. If “$3800 has three significant digits, then the measurement was made to the
nearest ten dollars, the maximum error is $5, and the true value is located
between $3795 and $3805.
3. If “$3800 has four significant digits, then the measurement was made to the
nearest dollar, the maximum error is 50 cents, and the true value is located
between $3799.50 and $3800.50.

If the height is accurately is accurately recorded as 65.4 inches, it means that the true
height lies between 65.35 and 65.45 inches. The accurate digits, apart from zeros needed
to locate the decimal point, are called the significant digits or significant figures of the
number.
Examples:
1. 65.4 has 3 significant figures
2. 4.5300 has 5 significant figures.
-3
3. .0018 = 0.0018 = 1.8 x 10 has 2 significant figures.
-3
4. .001800 = 0.001800 =1.800 x 10 has 4 significant figures.
3

Numbers associated with enumerations or countings as opposed to measurements, are of


course exact and so have an unlimited number of significant figures. In some of these
cases, however, it may be difficult to decide which figures are significant without further
information. For example, the number 186,000,000 may have 3, 4, …….9 significant
figures. If it is known to have 5 significant figures, it would be better to record the
number as 186.00 million or 1.8600 x 10 raised to the eight power.

SCIENTIFIC NOTATION

1. 10 raised to power 1 = 10; 10 square = 10 x 10 = 100


2. 10 raised to power 0 = 1; 10 raised to –1 = 0.1; 10 raised to –2 = 0.01; 10
raised to –5 = 0.00001
3. 864,000,000 = 8.64 x 10 raised to eight.
4. 0.00003416 = 3.416 x 10 raised to –5

Note that multiplying a number by 10 raised to +8 has the effect of moving the decimal
point of the number 8 places to the right. Multiplying a number by 10 raised to –6 has the
effect of moving the decimal point of the number 6 places to the left.

COMPUTATIONS

In performing calculations involving multiplication, division and extrapolation of roots of


numbers, the final result can have no more significant figures than the numbers with the
fewest significant figures.
Examples:
1. 73.24 x 4.52 = 331
2. 1.648/0.023 = 72
3. Square root of 38.7 = 6.22
4. 8.416 x 50 = 420.8 if 50 is exact.

In performing additions and subtractions of numbers, the final result has no more
significant figures after the decimal point than the numbers with the fewest significant
after the decimal point.
Examples:
1. 3.16 + 2.7 = 5.9
2. 83.42 – 72 = 11
3. 47.816 – 25 = 22.816 if 25 is exact.

INDEX OR SUBSCRIPT NOTATION

Let the symbol Xj (read “X sub j) denote any of the N values X1, X2, X3,…..Xn assumed
by a variable X. The letter j in Xj, which can stand for any of the numbers 1, 2, 3, 4, ….N
is called a subscript or index. Clearly any letter other than j, such as I, k, p, q could have
been used as well.
4

SUMMATION NOTATIONS

n
The symbol E is used to denote the sum of all the Xj’s from j = 1 to j = n, i. e. by
j=1
n
definition E(Xj) = X1 + X2 + X3 +…..Xn.
j=1

When no confusion can result, we shall denote this sum simply by E X, E Xj or E Xj.
j

The symbol E is the Greek capital letter sigma denoting sum.

MEASURES CENTRAL TENDENCY

By definition, the measures of central tendency are commonly referred to as averages.


They are values that imply certain representative and significant characteristics of a set of
data.

In statistics, the term average is quite precise. It is a single figure that represents a group
of data values. It groups these values into one value. It is the center point where the
values group together to typify a data of individualized information.

The measures of central tendency include:


1. Mean or arithmetic mean or arithmetic average. This is described as a
summary value. It is a computational average that is determined by the
solution of all the values in a given data and is subject to further
manipulations.
a. Weighted arithmetic mean. Sometimes we associate with the numbers
x1, x2,…..xn certain weighing factors or weights w1, w2, …wn
depending on the significance or importance attached to the numbers.
In this case,
x’ = w1x1 + w2x2 +….+ wnxn

x’ = Ewx / Ew

b. Properties of the arithmetic mean


1. The algebraic sum of the deviations of a set of numbers from
their arithmetic mean is zero.
2. The sum of the squares of the deviations of a set of numbers
xj from any number a is minimum if and only if a = x’
3. If f1 numbers have mean m1, f2 numbers have mean m2, …
fn numbers have mean mn, then the mean of all numbers is:
x’ = (f1m1 + f2m2 +….+ fnxn) / (f1 + f2 + ….+ fn)
5

:
4. If A is any guessed or assumed arithmetic mean (which
maybe a number) and if dj = xj – A are the deviations of xj
from A, then :

x’ = A + Ed / n

x’ = A + Efd / n

2. Median. This is a positional value. It is dependent on the location of a single


value in a given data and therefore cannot be further manipulated. If the
median divides a data into two equal parts a data can likewise be divided into:
a. Quartiles, Q. Four equal parts, or 25% each.
b. Deciles, D. Ten equal parts, or 10% each.
c. Percentiles, P. One hundred equal parts, or 1% each.
3. Mode. This is a frequency value. It represents the most likely or most frequent
average value of a given data.

Ungrouped Data

Mean

Example 1: The following data shows the mathematics grade of 7 first year high school
students taking tutorial classes. They are 72%, 69%, 89%, 65%, 76%, 76%, and 83%.
Compute for the grade mean, median and mode of the given data.

To compute for the mean, the following formula is being given:

x’ = Ex / n

where
x’ = mean
n = the number of data values
Ex = the sum of the data values

x’ = 530/7 = 75.71%

Therefore, the average mathematics grade of the 7 first year high school students who are
taking tutorial classes is 75.71%. This value summarizes and represents the grades of all
the students in this example. Although widely used, the mean is too sensitive in that it is
affected by extreme high or low values.
6

Weighted Mean

Example 2: If a college student got a grade of 1.25 in a 5-unit chemistry course, a 1.5 in a
3-unit algebra course, a 1.0 in a 1-unit P.E. course a 2.25 in a 3-unit history course and a
1.5 in a 3-unit course in English, find his average grade.

x’ = E(fx)/n
f = the respective weights of each individual observation.

x’ = [1.25(5) + 1.5(3) + 1.0(1) + 2.25(3) + 1.5(3)] /(5 + 3 + 1 + 3 + 3)


= 1.54 (average grade)

Median

To compute for the median, the following steps are given:


1. Arrange the data from lowest to highest or vice versa.
2. Determine the middle value. If the number of the data value is odd like 5, 7,
and so on, the middle value is the median. However, if the number of data
values is even like 4, 6, and so on, then the median is the average of the two
middle values, or

md = (n + 1) / 2

Median is simply the value of the middle item (or the mean of the values of
the two middle items) when the data are arranged in an increasing or
decreasing order of magnitude.

If we have an odd number of items, there is always a middle item whose value
is the median. For example, the median of the five numbers 5, 10, 2, 7, and 8
is 7. and the median of the nine numbers 3, 5, 6, 9, 9, 10, 10, 12, and 13 is 9.
Note that there are two 9’s in this last example and that we do not refer to
either of them as the median. The median is a number and not an item,
namely, the value of the middle item. Generally speaking, if there are n items
and n is odd, the median is the value of the (n + 1)/2 th largest item. Thus, the
median of 25 numbers is given by the value of the (25 + 1)/2 = 13 th largest, the
median of 49 numbers is given by the value (49 + 1)/2 = 25th largest.

If we have an even number of items, there is never a middle item and the
median is defined as the mean of the values of the two middle items. For
instance, the median of the six numbers 3, 6, 8, 10, 13, and 15 is (8 + 10)/2 =
9. It is halfway between the two middle values (here the 3rd and the 4th) and if
we interpret it correctly, the formula (n + 1)/2 again gives the position of the
median. For the six given numbers the median is thus, the value of the (6 +
7

1)/2 = 3.5th largest and we interpret this as “halfway between the values of the
third and the fourth.” Similarly, the median of 100 numbers is given by the
value of the (100 + 1)/2 = 50.5th largest item, or halfway the values of the 50 th
and the 51st.

It is important that the formula (n + 1)/2 is not a formula for the median itself;
it merely tells us the position of the median, namely, the number of items we
have to count until we reach the item whose value is the median (or the two
items whose values have to be averaged to obtain the median).

Arranging the data values in Example 1 from lowest to highest, we have,


65%, 69%, 72%, 76%, 76% 83%, 89%. Since the number of the data values is
7, there is a middle value, 76%. This is the median.

Therefore, about 50% of the students got grades above 76%, while 50% of
them got scores below 76%. Take note that the median is considered
positional because it is only concerned with the middle or midpoint value. It is
not affected by extreme high or low values compared to the mean.

3. If the median divides the data into two parts, a data can likewise be divided
into quartiles, deciles or percentiles.
4. To compute for the first quartile:

Q1 = 1(n + 1) / 4
= 1(7 + 1) / 4
= 2, hence the 2nd data or 69%. Thus, Q1 which is the second data is
69%. Take note that if the answer has a decimal part, then we have to
interpolate.

INTERPRETATION OF QUARTILE, DECILES AND PERCENTILES

Result of Q, D, and P are always less than computed value (if data arranged in
ascending order). Ex. for Q3 where the grade is = 83%. Therefore, 75% (which is
= to Q3) scored 83% or lower (or 25 % scored 83 % or higher).

For D2, where the grade is 67%. Therefore, 20% (which is = D2) scored 67% or
lower (or 80% scored 67% or higher).

For P90, where the grade is = to 83%. Therefore, 90% (which is = to 90%) scored
83% or lower (or 10% scored 83% or higher).

REFER TO SAMPLE PROBLEM (GROUPED DATA).


8

5. To compute for the third quartile, Q3, we have:

Q3 = 3(n + 1) / 4
= 3(7 + 1) / 4
= 24 / 4
= 6, hence the 6th data or 83%.

6 To compute for decile, D, say D8:

D8 = 8(n + 1) / 10
= 8(7 + 1) / 10
= 6.4, hence data is between the 6th and 7th data. Interpolate.

7. To compute for percentile, P, say P23:

P23 = 23(n + 1) / 100


= 23(7 + 1) / 100
= 184 / 100
= 1.84, hence data is between the 1st and 2nd data. Interpolate.

8. To determine the mode, no formula is needed. The process is done by


inspection. In example 1, determine which data value appear the greatest
number of times. The modal value here is 76%. It appears twice. Take note
that a given data can have more than one mode. If it has two modes, it is
specifically called bimodal. If it has more than two, then it is referred to as
polymodal. If no data appears more than one, there is no mode.

In summary, therefore, the mean in example 1 is equal to 75.71%, while the


median and mode are both 76%.

GROUPED DATA

To compute for the mean, median and mode given a grouped data, let us use the
frequency distribution we earlier arrived at. Take note that there are other formulas for
solving these measures. However, since we want to make statistics as simple as possible,
then we have the following:

Mean

To compute the mean, the following formula is being given:

x’ = assumed mean + {[E(fd)] / n}(i)

where x’= the mean


assumed mean = the midpoint of the class interval where d = 0
9

f = frequency value
d = deviation coded value
i = the class size
n = the number of data values

Example 1:
Cumulative
Class interval Class Mark Frequency (f) Frequency (F) Deviation (d) fd
25-31 28.0 4 4 -2 -8
32-38 35.0 7 11 -1 -7
39-45 42.0 3 14 0 0
46-52 49.0 2 16 1 2
53-59 56.0 2 18 2 4
60-66 63.0 2 20 3 6
------ ------
20 -3

1. In computing for the mean, deviation coded values, d, are assigned to each
class interval.
2. More specifically, the deviation coded value, 0 is assigned to the middle class
interval. Since the number of class intervals in the example is even (6), then
there is no middle class interval. Instead, move one class interval up from the
middle, and assign 0. The number consecutively –1, -2, upwards and 1, 2, and
3 downwards.
3. Multiply the corresponding deviation coded values with the respective
frequencies to get fd. Add the sum, Efd = -3.
4. Get the assumed mean, 42.0. This is the midpoint of the class interval, 35-45
where d = 0.
5. Compute for the mean where i = 7 and n = 20.
Thus, x’ = 42.0 + (-3 / 20 )(7) = 42.0 – 1.05 = 40.95

Therefore, the average of the 20 students pursuing graduate studies is 41.0 years.

Other solution for the mean


(1).
Class Marks(x) Frequency (f) fx
28.0 4 112.0
35.0 7 245.0
42.0 3 126.0
49.0 2 98.0
56.0 2 112.0
63.0 2 126.0
---- -------
20 819.0
10

x’ = Efx / Ef

= 819 / 20
= 40.95
= 41

(2).
Class Marks(x) Deviation, x - A Frequency (f) fd
28.0 -14 4 -56.0
35.0 - 7 7 -49.0
42.0* 0 3 0.0
49.0 7 2 14.0
56.0 14 2 28.0
63.0 21 2 42.0
---- ------

20 -21.0

*Assumed Mean, A

x’ = A + Efd / n
= 42.0 + (-21) / 20
= 42.0 - 1.05 = 40.95
= 41.0
Average age of 20 students pursuing graduate studies is 41.0 years old.

Median

To compute the median, the formula is:

md = L.L. + [(n/2 – F) / f] (i)


where,
md = the median
L. L. = the real lower limit of the class interval
f = the frequency value
F = the cumulative frequency value
i = the class size
n = the number of data values
11

Class Interval Real Lower Limits Frequency (f) Cumulative Frequency (F)

25-31 24.5-31.5 4 4
32-38 31.5-38.5 7 11
39-45 38.5-45.5 3 14
46-52 45.5-52.5 2 16
53-59 52.5-59.5 2 18
60-66 59.5-66.5 2 20
------
20

1. In computing for the median, first divide n by the number 2. This number is
constant.
2. Using the resulting quotient, 10, as a basis, refer to the cumulative frequency
column. Find a value that is either equal or less than but nearest to 10. This
will represent F, 4 in your formula.
3. Then move one class interval down, get the frequency value, 7, and its real
lower limit, 31.5.
4. Compute for the median where i = 7. Thus,

md = 31.5 + [(10-4) / 7](7)


= 31.5 + 6 = 37.5
Therefore, about 50% of the students pursuing graduate studies are more than 37 years
old and 6 months (37.5 years) while 50% of them are below 37.5 years.

Similarly, we can compute for quartiles, Q1 and Q3 as shown in the following formula:

Q1 = L.L. + [(n/4 – F) / f ](i)

and
Q3 = L.L. + [(3n/4 – F) / f ](i)

The formulas for finding the respective deciles and the percentiles of a given data are as
follows:

D8 = L.L. + [(8n/10 – F) / f ](i)


and
P23 = L.L. + [23n/100 – F) / f ](i)

Refer to Dean Young’s Statistics Made Simple for Illustrations pages 36-37.

Take note that all these statistical positional formulas are extensions of the median. They
differ only with respect to how the data values are divided, whether 25%, 75%, 10%, or
1%.
12

Mode

When we want to find the mode of a frequency distribution, we just specify the modal
class, which is defined as the class interval containing the largest number of values

To determine the mode, the formula is given as follows:

mo = L.L. + [du / (du + dl)] (i), where

mo = the mode
L.L. = the real lower limit of the class interval
du = the difference of the highest frequency with the frequency above it
dl = the difference of the highest frequency with the frequency below it
i = the class size

Class Interval Real Lower Limits Frequency (f)

25-31 24.5-31.5 4*
32-38 31.5-38.5 @7 * * du = 7 – 4 = 3
39-45 38.5-45.5 @3
46-52 45.5-52.5 2 @ dl = 7 – 3 = 4
53-59 52.5-59.5 3
60-66 59.5-66.5 2
------
20

1. To complete for the mode, refer to the frequency column and determine the
highest value, 7.
2. Get the difference of this value, 7 and the frequency above it. This will yield,
du = 3.
3. Get the difference of this value, 7 and the frequency below it. This will yield,
dl = 4.
4. Compute for the mode with i = 7.

mo = 31.5 + [3 / (3 + 4)](7)
= 31.5 + 3
= 34.5

Therefore, a greater number of the graduate students tends towards the average age of
34.5 years. In summary the mean is 41 years, the median is 37.5 years and the mode is
34.5 years.

EMPIRICAL RELATION AMONG MEAN, MEDIAN AND MODE


13

Mean – Mode = 3(Mean – Median)

OTHER MEASURES OF CENTRAL TENDENCY

THE ROOT MEAN SQUARE (R.M.S.)

R.M.S or quadratic mean of a set of numbers x1, x2, x3, ….xn is sometimes denoted by:

________
! 2
R.M.S = ! E(xj)
! ----------
V n

Example: The R.M.S. of the set numbers 1, 3, 4, 5, 7 (n = 5) is:

1 square = 1
3 square = 9
4 square = 16
5 square = 25
7 square = 49

(1 + 9 + 16 + 25 + 49) / 5 = 20

square root of 20 is 4.47 (R.M.S.)

GEOMETRIC MEAN

The geometric mean G of a set of number n numbers, x1, x2, x3,….xn is the nth root of
the product of the numbers.

For ungrouped data


____________
G= n! x1x2x3….xn
V

Example: The geometric mean of the numbers 2, 4, 8 (n = 3) is:


14

(2)(4)(8) = 64

cube root of 64 is 4.

Grouped data:

___________________
n ! f1 f2 f3 fn
G= V (x1) (x2) (x3)…..(xn)

This is normally solved by logarithm method.


f1 f2 fn
Log G = [1/n]log[(x1)(x2) ….(xn) ]

= [1/n][(f1)log x1 + (f2)log x2….(fn)log xn]

= [Ef(log x)] / n where x1, x2,…. xn = class marks (midpoints; f1, f2,….fn
as the corresponding frequencies
n = E(f)

To solve for G, get the antilog of log G.

HARMONIC MEAN

The harmonic mean H of a set of n numbers x1, x2, x3,…xn is the reciprocal of the
arithmetic mean of the reciprocals of the number.

For ungrouped data:


1
H= --------- H = ( n ) /E ( f / x )
n
(1/n)[E(1/xj)]
j

= n / E(1/x)

or
1/H = [(1 / n)][E(1/x)}

Example:
The harmonic mean of the numbers 2, 4, 8 (n = 3) is:
15

H = 3 / [(1/2) + (1/4) + (1/8)]


= 3 /(7/8)
= 3.43

For grouped data:

1/H = [(1 / n)][E(f/X)] or H = n / E(f /x)


where
n = f1 + f2 + ……Ef
f = frequencies
X = class marks

RELATIONS AMONG ARITHMETIC (X’), GEOMETRIC (G) AND HARMONIC (H)


MEAN

G = or < THAN X’

G = or > THAN H

H = < G = < X’

Das könnte Ihnen auch gefallen