Sie sind auf Seite 1von 73

Mathematical Sciences

Foundation
www.mathscifoun
d.org

Copyright © Mathematical Sciences


Foundation
1
Descriptive
Statistics
Descriptive statistics includes statistical
methods involving collection, presentation,
characterization of a set of data in order to
describe the various features of that set of
data.
In general, methods of descriptive
statistics include graphic methods and
numerical measures. Bar charts, line
graphs etc. comprise the graphic methods,
whereas numerical measures include
Copyright © Mathematical Sciences
2
Foundation
Measures of Central
Tendency/ Statistical
Averages

Copyright © Mathematical Sciences


3
Foundation
Averages condense the information
contained in a data set into a single number.

• This number is helpful in taking overview


of statistical data.

• This number is helpful in making


comparison between two or more data
sets.

Copyright © Mathematical Sciences


4
Foundation
Characteristics of a Good
Average
• It should be easy to calculate.

• It should be easy to comprehend.

• It should not be affected too much by


fluctuations of the sample.

Copyright © Mathematical Sciences


5
Foundation
Statistics
Statistics is the science of collecting,
describing and interpreting data.

Copyright © Mathematical Sciences


6
Foundation
Measures of Central Tendency

Mean Median Mode

Copyright © Mathematical Sciences


7
Foundation
Mean (Arithmetic Mean)
It is the sum of observations divided by the
total number of observations.

Mathematical
ly,
x1 + x2 + ... + xn
x=
n

Copyright © Mathematical Sciences


8
Foundation
Arithmetic mean is affected by extreme values.

Consider a situation where two samples differ in


only one value

Samples differing in
Data Set 1 Data Set 2 one value
6 6
10 10
5 38
7 7
4 4
8 8
Arithmetic Mean 6.667 12.167

Copyright © Mathematical Sciences


9
Foundation
If we delete the extreme value: 38 from Data
Set 2, then
the new arithmetic mean is 27.000.
Data Set
6
10
7
4
8
Arithmetic Mean 7.000

Summary:
Data Set 1 Data Set 2 after deleting an extreme value
from the Data Set 2
Mean 6.667 12.167 7.000

This gives motivation for another measure


called
Copyright © Mathematical Sciences
“Trimmed Mean” Foundation 10
Trimmed Mean
It is the mean taken by excluding a
percentage
of data points from the top and bottom tails
of a data
set.
Note:
Trimmed Mean should be calculated when
one
wishes to exclude outlying data from the
analysis.

Copyright © Mathematical Sciences


11
Foundation
Median
It is the value of the data that occupies the
middle position
when the data is arranged in increasing or
decreasing
order.

Copyright © Mathematical Sciences


12
Foundation
Median
Consider a data of size n. Arrange the data in
increasing or decreasing order. The median
is calculated in the following way:
If n is odd: Median will n +1  th term.
 
be the  2 
If n is even: Median will be the  n  th and
mean of  
2
 n 
 + 1  th terms.
2 

Copyright © Mathematical Sciences


13
Foundation
For example if we need to find the median for
the data set: 20,13,16,17,11,19,12,18

Ranked data: 11, 12, 13, 16, 17, 18 ,19, 20

No. of terms in
data = 8  8 8 
Median =   th and  + 1  th terms
2 2 
mean of16 + 17
= = 16.5
2

Copyright © Mathematical Sciences


14
Foundation
Not
e:
The median is the number in the middle of
an ordered
set of numbers (observations); that is, half
the numbers
have values that are greater than the
median, and half
have values that are less.

Copyright © Mathematical Sciences


15
Foundation
Median is not affected by
extreme values.
Example: Consider two sets of data

Data set 1: 6, 7, 8, 9, 9, 10
Data set 2: 6, 7, 8, 9, 9, 1100

In both cases the median is 8.5

Copyright © Mathematical Sciences


16
Foundation
Mod
e
It is the value which occurs most
frequently in a set of observations.

Copyright © Mathematical Sciences


17
Foundation
Characteristic of Mode
• Mode is not affected by extreme values.

Limitation of Mode
• Sometimes mode may not be a true
representative of a central value of a data
set.

For example: 2, 3, 4, 5, 6, 10, 10

Copyright © Mathematical Sciences


18
Foundation
Comparison: Mean, Median
and Mode
Mean and Median of a data are unique,
whereas a data can have more than one
Mode.
Exampl
e:
Consider the data set 1,1,1,2,2,2,3,4,5. The
mean is 2.333, median is 2, but there are
two modes namely 1 and 2.

Copyright © Mathematical Sciences


19
Foundation
Comparison: Mean, Median
and Mode
Consider the following data:
100, 100, 100, 421, 422, 423,424, 425.

Mean = 301.875 Median = 421.5


Mode = 100
In such data median is the best measure of
central tendency among the three
measures.

Copyright © Mathematical Sciences


20
Foundation
Averages in Open
Office Calc

Copyright © Mathematical Sciences


21
Foundation
What is Open Office Calc?
A powerful spreadsheet program that

• performs numerical computations

• can organize/summarize huge data sets

• carries out advanced statistical and financial


analysis by solving complicated mathematical

models Copyright © Mathematical Sciences


22
Foundation
How to access Open
Office Calc

Applications Office Openoffice.org


Spreadsheet

Copyright © Mathematical Sciences


23
Foundation
A First Look at Open
Office Calc

Input Line

Name Box

Copyright © Mathematical Sciences


24
Foundation
Points to note

• Rows are numbered as 1,2,3,…

• Columns are marked as A,B,C,…

• Name box always displays the current selected

cell

Copyright © Mathematical Sciences


25
Foundation
Open Office Calc as a desk
calculator
You can perform simple operations like
addition, multiplication and division.

Simply select a cell and in the


Input Line enter the expression.
Remember to begin the
expression with an equal to (=)
sign.

To compute 5+7
Copyright you have
© Mathematical Sciences to enter
26
Foundation
Various Arithmetic
Operations
Operation Symbol
Addition +
Subtraction -
Multiplication *
Division /
Raise to power ^

Copyright © Mathematical Sciences


27
Foundation
Averages using Calc
functions

Copyright © Mathematical Sciences


28
Foundation
AVERAGE
Calculates the arithmetic mean of numeric
arguments

Syntax: AVERAGE (number1,


number2,...)
number1, number2, ...   are numeric
arguments for
Example: Refer to the worksheet
which you want the average
“Averages”

Copyright © Mathematical Sciences


29
Foundation
Remarks : AVERAGE

• The arguments must either be numbers,


arrays or references that contain numbers.

• If an array or reference argument contains


text, or empty cells, those values are
ignored; however, cells with the value zero
are included.
• Arguments that contain TRUE evaluate as
1; arguments that contain FALSE evaluate
as 0 (zero).
Copyright © Mathematical Sciences
30
Foundation
AVERAGEA
Calculates the arithmetic mean of the
values in the list of arguments.
In addition to numbers, text and logical values
such as TRUE and FALSE are also included in the
calculation.

Syntax: AVERAGEA (value1,


value2,...)
value1, value2, ...   are arguments for
Example:
which Refer to the worksheet
you want
“AverageA the”average
Copyright © Mathematical Sciences
31
Foundation
Remarks: AVERAGEA

• The arguments must be numbers, arrays


or references.
• Array or reference arguments that contain
text evaluate as 0 (zero). If the calculation
does not include text values in the
average, use the AVERAGE function.
• Arguments that contain TRUE evaluate as
1; arguments that contain FALSE evaluate
as 0 (zero).

Copyright © Mathematical Sciences


32
Foundation
TRIMMEAN
Returns the mean of the interior of a data
set.
Syntax: TRIMMEAN(array, alpha)
Array is the array or range of values to trim
and average.

Alpha is the fractional number of data points


to exclude from the calculation.
For example,
if percent =Refer
Example: 0.2, 4 points
to the areworksheet
trimmed from a data
set of 20 points ( 2 from the top and 2 from the
“Trimmean Copyright

bottom of the set). © Mathematical Sciences
33
Foundation
Remarks: TRIMMEAN

• If percent < 0 or percent > 1, TRIMMEAN returns


an error value.

• TRIMMEAN rounds the number of excluded data


points down to the nearest multiple of 2. If
percent = 0.1, 10 percent of 30 data points
equals 3 points. For symmetry, TRIMMEAN
excludes a single value from the top and bottom
of the data set.

Copyright © Mathematical Sciences


34
Foundation
MEDIAN
Returns the median of the given numbers.

Syntax: MEDIAN (number1,


number2,...)
number1, number2, ...   are numerical
arguments for
which you want the median
Example: Refer to the worksheet
“Averages ”

Copyright © Mathematical Sciences


35
Foundation
MODE
Returns the most frequently occurring or repetitive
value in an array or range of data.

Syntax: MODE (number1,


number2,...)
number1, number2, ...   are arguments for
which you
Example: Refer
want to thethe
to calculate worksheet
mode.
“Averages ”

Copyright © Mathematical Sciences


36
Foundation
Measures of
Dispersion

Copyright © Mathematical Sciences


37
Foundation
Let’s look at an example of three
data sets, Observations Mean
Data set 1 7 8 10 11 9 9

Data set 2 4 6 9 12 14 9

Data set 3 2 5 9 13 16 9

2 3 4 5 6 7 8 9 1 1 1 1 1 1 1
0 1 2 3 4 5 6

2 3 4 5 6 7 8 9 1 1 1 1 1 1 1
0 1 2 3 4 5 6

2 3 4 5 6 7 8 9
1 1 1 1 1 1 1
Copyright © Mathematical
0 Sciences
1 2 3 4 5 638
Foundation
To capture the sense of the data, we need
to measure the central location as well as
the spread. This is carried out by the
various measures of dispersion.

The numerical value of the various


measures of dispersion describe the
amount of spread, or variability, in the
data:

These measures will give large


values for data which is more spread
out and small values for data which
is less spread out.
Copyright © Mathematical Sciences
Foundation
39
Characteristics for an Ideal
Measure of Dispersion
• It should be easy to calculate and easy to
understand.

• It should be affected as little as possible by


fluctuations of sampling.

Copyright © Mathematical Sciences


40
Foundation
Common Measures of
Dispersion
 RANGE
 MEAN
DEVIATION

VARIANCE
 STANDARD
DEVIATION

Copyright © Mathematical Sciences


41
Foundation
Ran
ge
Range is the difference between
the largest and the smallest value in
the data.

It can be determined by:

Range = Highest value –


Lowest value

It gives a quick measurement


Copyright © Mathematical Sciences
of
the spread. Foundation
42
Limitations of
Range
It does not measure the spread of the
majority of
data – it only measures the spread
between
highest and lowest values.

Copyright © Mathematical Sciences


43
Foundation
600

500

400

300

200

100

0 Range in both these


0 5 10 15 distributions is the
same i.e. 300.
600
500
400
300
200
100
0
0 2 4 6 8 10 12 14
Copyright © Mathematical Sciences
44
Foundation
Deviations from a Central
Value
One way to measure the spread of a data set is to
measure the distance of each data xi point from
a central
value, say A (which could xi be meanxior − Amedian or
mode).
We define the deviation of from A to be .
Note:
The sum of the deviations about mean is zero and
consequently the mean deviation about mean
is also zero,
which is not a useful statistic.
One way to remove
Copyright ©this neutralizing
Mathematical Sciences effect is to45
ignore the Foundation
Mean Absolute Deviation
Mean absolute deviation is mean of the
absolute values of the deviations from
mean of the data.
N
xi − x
i.e. Mean absolute deviation = ∑i =1 N
, where x is

mean of the data.

Copyright © Mathematical Sciences


46
Foundation
Varian
ce
The mean of the squares of deviation
about mean is called the variance.
N

∑( xi −x )
2

i.e variance = i =1
N
wherex is the mean and N is the size of the
population

Copyright © Mathematical Sciences


47
Foundation
Standard
Deviation
The positive square root of the variance
is called standard deviation.

i.e. standard deviation =variance

∑( xi −x )
2

or standard deviation =
N

wherex is the mean and N is the size of the


population

Copyright © Mathematical Sciences


48
Foundation
Measures of dispersion
using Calc functions

Copyright © Mathematical Sciences


49
Foundation
Rang
e
There is no built in function to calculate
range directly. We can calculate range by
taking the difference of the maximum
value and the minimum value of the data
set.
Following formula can be used to calculate
range:
= MAX(value1,value2,…) -
MIN(value1, value2…)
Example: Refer to the worksheet
“Dispersion ”
Copyright © Mathematical Sciences
50
Foundation
AVEDEV
Returns the average of the absolute
deviations of data points from their mean.

Syntax: AVEDEV ( number1,


number2 , …. )
number1, number2, ...   are 1 to 30
arguments for which
you want the
Example: average
Refer of the
to the absolute
worksheet
deviations
“Dispersion ”
Copyright © Mathematical Sciences
51
Foundation
VARP
Calculates variance based on the
entire population.

Syntax: VARP ( number1, number2,


……. )
number1, number2, ...   are 1 to 30
number arguments corresponding to a
population.
Example: Refer to the worksheet
“Dispersion ”
Copyright © Mathematical Sciences
52
Foundation
VARPA
Calculates variance based on the entire population.
In addition to numbers, text and logical values such
as TRUE and FALSE are included in the calculation.

Syntax: VARPA ( value1, value2, …….


)
value1, value2, ...   are 1 to 30 value
arguments
corresponding to a sample of a
population
Copyright © Mathematical Sciences
53
Foundation
STDEVP
Calculates standard deviation based on the
entire population given as arguments.

Syntax: STDEVP (number1,


number2, ……. )
number1, number2, ...   are 1 to 30
number arguments corresponding to a
population.
Example: Refer to the worksheet
“Dispersion ”
Copyright © Mathematical Sciences
54
Foundation
MEASURES OF
POSITION

Copyright © Mathematical Sciences


55
Foundation
PERCENT
ILE
Consider the data xset
, x ,..., x .
1 2 n
Percentiles are the numbers which divide
the ordered data set in 100 equal sized data
subsets.
For any data set, there are 99 percentiles
denoted
P1 , P2 ,..., P99by
.

P2
For instance, ,the second percentile, is a
number such that at most 2% of the data
points are less than it and at most 98% of
the data points are greater than it.
Copyright © Mathematical Sciences
56
Foundation
How to find percentile of a
data set?
Supposex1 , x2 ,..., x101 is a data set arranged
x1 ≤ xorder,
in increasing 2 ≤ ... ≤ x100 ≤ x101
i.e.,
. Here
P1 = x2 because at most 1% of the data
points arex2less than and at most 99% of
the data points x2 are more than .

P20 = x21 because at most 20% of the data


points arex21less than and at most 80% of
x21
the data points are more than .

Copyright © Mathematical Sciences


57
Foundation
How to find percentile of a
data set?
Supposex1 , x2 ,..., x10 is a data set arranged
x1 ≤ xorder,
in increasing 2 ≤ ... ≤ i.e.,
x10 .
Here we do not have data points that can
divide the data set into 100 equal parts. In
such a situation, percentiles are calculated in
the following way:

Copyright © Mathematical Sciences


58
Foundation
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

Here we have 9 intervals. The complete data


constitutes 100%. We distribute this 100%
over 9 intervals so that each interval
contains 100%
≈ 11.1%
9
11.1% 11.1% 11.1% 11.1% 11.1% 11.1% 11.1% 11.1% 11.1%

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

Copyright © Mathematical Sciences


59
Foundation
Hence,
x2 = P11.1 , x3 = P22.2 , x4 = P33.3 ,...

Suppose we want to find P20


.
As x2 = P11.1 x3 = P22.2
and P20
, therefore,
lies between
x2 and x3

P20 of
To find the exact value we follow the
following steps:

Copyright © Mathematical Sciences


60
Foundation
Step 1: Count the number of intervals
between the data points. If there are n
data points, then there will be n-1
intervals. In above example there are
10 – 1 = 9 intervals.

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

Copyright © Mathematical Sciences


61
Foundation
Pm
Step 2: To find we calculate the number
( n − 1)
p= m
100
p as sum of its integer part i and
and write
fractional part f.p = i + f
In our example, we wish toP20
find . ( 10 − 1)
Henc p= × 20
100
e,
= 1.8 = 1 + 0.8

Thus, i = 1, f = 0.8
Copyright © Mathematical Sciences
62
Foundation
m th
Step 3: The Pm
percentile is given by
Pm = xi +1 + f ( xi + 2 − xi +1 )

Thus, in our example

P20 = x1+1 + 0.8 ( x1+ 2 − x1+1 )

= x2 + 0.8 ( x3 − x2 )

Copyright © Mathematical Sciences


63
Foundation
Example: Find 20th percentile of the data
set
12, 13, 15, 18, 19, 20, 23,
24, 29
Step 1: There are 9 data points. Thus
number of intervals = 9-1 = 8.

P20
Step 2: To calculate we find the
number ( 9 − 1)
p= × 20 = 1.6 = 1 + 0.6
100
Thus,i = 1 f = 0.6
and
.

Copyright © Mathematical Sciences


64
Foundation
Example: Find 20th percentile of the data
set
12, 13, 15, 18, 19, 20, 23,
24,P29
Step 3: Thus 20 is
given by
P20 = x1+1 + 0.6 ( x1+ 2 − x1+1 )

= x2 + 0.6 ( x3 − x2 )

= 13 + 0.6 ( 15 − 13 )

= 14.2

Copyright © Mathematical Sciences


65
Foundation
Quartile
Consider 25th , 50th and 75th percentiles P25 , Pi.e.,
50
andpercentiles divide the ordered
P75 . These
fourdata
equalsetparts.
into These percentiles are
known as Quartiles.
P25 is known as first quartile and is denoted Q1
by .
P50 is known as second quartile and is Q2
denoted by . It is also equal to the
median.
P75 is known as third quartile and is denoted Q3
by . Copyright © Mathematical Sciences
66
Foundation
Percentiles using Open
Office Calc functions

Copyright © Mathematical Sciences


67
Foundation
PERCENTILE
Returns the kth percentile of values in a range.

yntax: PERCENTILE ( data, alpha )


data is the range of data
alpha is the percentile value in the range 0…1, incl
Note: For 1st percentile, alpha = 0.01, for
15th percentile, alpha = 0.15 and so on.

Example: Refer to the worksheet


“Percentile ”
Copyright © Mathematical Sciences
68
Foundation
QUARTILE
Returns the quartile of a data set.

yntax: QUARTILE (array, quart)


array is the array or cell range of numeric values for
which the quartile value is to be calculated.
quart indicates the quartile to be calculated.
Note: For 1st quartile, quart = 1; for 2nd
quartile, quart = 2 and for 3rd quartile,
quart = 3.
Example: Refer to the worksheet
“Percentile ” © Mathematical Sciences
Copyright
69
Foundation
Histogram
A histogram is a graphical display based on
the frequency table.

Copyright © Mathematical Sciences


70
Foundation
FREQUENCY function in OPEN OFFICE CALC

Class Interval Classes


<=7000 7000
Frequency function
7000-7500 7500
can be used to
7500-8000 8000 construct
8000-8500 8500 frequency table.
8500-9000 9000
9000-9500 9500
9500-10000 10000
10000-10500 10500
10500-11000 11000
11000-11500 11500
11500-12000 12000
12000-12500 12500

Copyright © Mathematical Sciences


71
Foundation
FREQUENCY function (cont..)

2
3. Select the
data range and
class range

4
Copyright © Mathematical Sciences
72
Foundation
FREQUENCY function (cont..)
Class Classes Frequen
Interval cy
<=7000 7000 2
7000-7500 7500 1
7500-8000 8000 3 Bar graph of frequency table
25
8000-8500 8500 0
8500-9000 9000 12 20
9000-9500 9500 15
9500-10000 10000 23 15

10000-10500 10500 20
10

10500-11000 11000 15 5

11000-11500 11500 4 0
7000 7500 8000 8500 9000 9500 100001050011000115001200012500

11500-12000 12000 3

12000-12500 12500 2
Copyright © Mathematical Sciences
73
Foundation

Das könnte Ihnen auch gefallen