Beruflich Dokumente
Kultur Dokumente
Objectives
Review Week1 Measures of Central Tendency Measures of Dispersion Sample Statistics Frequency Distribution Mean and Variance from Frequency Table
Set : Introduction
A set is a well-defined list, collection or class of objects.
The objects could be anything : numbers, names, people, cities. These objects are called the elements or members of the set.
Example 1: The numbers 1,3,5,7,9,11,13, Example 2: The solutions of the equation x2 4x+3=0 Example 3 : The rivers in Australia
Set Notation
Sets are usually denoted by capital letters A, B, P, X, .. The elements are usually represented by lowercase letters a, b, p, x, .. There are two forms for presentation of a set : Tabular form , A = {1,3,5,7,9,11,} Set builder form, A = {x | x is odd}
March 20, 2012
Subsets
If every element in a set A is also a member of a set B, then A is called a subset of B In other words, if x A x B for all x, then A is a subset of B It is written as AB or BA A is called a proper subset of B, if A B and A is not equal to B.
March 20, 2012
U
A B
U
S R
Set Operations
Let A and B represent two sets. We have the definitions in a compact manner 1. A U B ={ x | x A or x B or x both} 2. A B ={ x | x A and x B } 3. A B ={ x | x A and x B } 4. A/ ={ x | x A } 5. A B={ x | x A or x B but x both} 6. #A = Number of elements in set A
March 20, 2012
Introduction
Statistics is the medium to describe the center spread and shape of a data set. Two components
Statistical Methods are employed to make judgements in the face of uncertainty and variation.
Copyright Box Hill Institute
Mean
For a given set of n numbers x1,x2,x3,.....xn. The mean denoted by x1+x2+x3+.....+xn = -----------------------n
March 20, 2012
Example : Consider the following set of numbers S = {1, 2, 3, 4, 5, 6, 7, 8, 9} The mean of the set S is 1+2+3+4+5+6+7+8+9 = ------------------------------- = 5 9
March 20, 2012
Median
For a given set of n numbers x1,x2,x3,.....xn Median is a value where half the values are of x1,x2,x3,.....xn are larger than the median and the other half are smaller than the median. In other words, Median is the middlemost number
March 20, 2012
Median
Example : Consider the following set of numbers S = {1, 6, 3, 8, 2, 4, 9} To find the median, we need to order the list S = {1, 2, 3, 4, 6, 8, 9} The middlemost number is 4 which is the median of the set.
March 20, 2012
What happens when we have to find the median of a set with an even number of elements For example: Find the median of S = {1, 6, 3, 8, 2, 12, 4, 9}
Mode
Mode of a data set is the value that occurs most often If there are two, three or multiple values the data is bimodal, trimodal or multimodal Example: R = {2, 8, 1, 9, 5, 2, 7, 2, 7, 9, 4, 7, 1, 5, 2} The number that appears most is 2, which is the mode of R.
Measures of Dispersion
Consider two sets S={5, 5, 5, 5, 5, 5} R={0, 0, 0, 10, 10, 10} for both the above sets, mean = 5 But the above sets are two different data sets. Is it a good practice to use mean, median or mode to describe them?
Measures of Dispersion
We
use another descriptive statistic to evaluate the data called Measure of Dispersion. It is a measure of scatter or dispersion. It is a measure of scatter about the mean.
March 20, 2012
Measures of Dispersion
What happens to the values of dispersion
If they are concentrated near the mean ? If they are distributed far from the mean?
Measures of Dispersion
If the values are concentrated near the mean of the data set, the measure is small. If they are distributed far from the mean of the data set, the measure will be large.
Variance (method 2)
= ( x2 / n) - ( x / n)2
=
x = x1 + x2 + x3........+ xn
March 20, 2012
The Variance is a non negative number The positive square root of the variance is standard deviation. The simplest spread of variability is Sample Range. Xmax - Xmin
Copyright Box Hill Institute
2 = ---------------------------------------------------5
= 50.8
Sample Space
Set of all possible outcomes of a statistical experiment is called a sample space or sample Each outcome is called an element or a member or sample point A group of samples is called population
Sample Statistics
Any quantity obtained from a sample for the purpose of estimating a population parameter is called a sample statistic
A sample along with inferential statistics allow us to draw conclusions about population, with inferential statistics making clear use of elements of Probability.
Copyright Box Hill Institute
Sample Mean
For a given sample of n numbers x1,x2,x3,.....xn. The sample mean denoted by X x1+x2+x3+.....+xn X = -----------------------n
March 20, 2012
Weighted Mean
For a given set of data, X = { x1, x2, ..., xn} and corresponding non-negative weights, W = { w1, w2, ..., wn} the weighted mean/average, is given by w1x1+w2x2+w3x3+.....+wnxn X = --------------------------------------w1+w2+w3++wn
March 20, 2012
Sample Variance
For a given sample of n numbers x1,x2,x3,.....xn, the Variance, denoted by S2 is given by
Frequency Distributions
For large samples (or populations) it is difficult to observe various characteristics or to compute statistics Therefore it is useful to organize or group the raw data The data is arranged in intervals of equal width.
Frequency Distributions
The intervals are called classes or categories. The number of individuals or elements in each class is determined, called class frequency. The resulting arrangement is called frequency distribution or frequency table.
Frequency Distribution
Example : Height of students in XYZ university (frequency table)
Height (cm) 155-159 160-164 165-169 170-174 175-179 Total Number of Students 5 18 42 27 8 100
Frequency Distribution
In the previous example The first category 155-159 is called class interval The corresponding class frequency is 5. The mid point of the class interval is called the class mark.
Frequency Histogram
Height (cm) 155-159 160-164 165-169 170-174 175-179 Total
March 20, 2012
45
40
35
30
25 Height (cm) 20
15
10
Frequency Polygon
Height (cm) 155-159 160-164 165-169 170-174 175-179 Total
March 20, 2012
Frequency Graphs
In
a histogram, the sum of the rectangular areas is 100. A frequency polygon is a graph connecting the midpoints of the tops of the histogram. In a bar graph, the sum of the ordinates is 1.
March 20, 2012
Relative Frequency
In relative frequency, the class frequency is replaced by percentage rather than the number. In the histogram the vertical axis will be replaced with relative frequency instead of frequency.
In the previous example, what happens if we have a student with a height of 159.7 cm.
Number of Students
The class intervals are chosen such that they are continuous as shown
Number of Students
a0- a1
a1- a2 an-1 an All
x1
x2 xn
f1
f2 fn Total f
f1.x1
f2.x2 fn.xn Total f.x
f1.x1.x1
f2.x2.x2 fn.xn.xn Total f.x.x
1.5 1.9 2.0 2.4 2.5 2.9 3.0 3.4 3.5 3.9 4.0 4.4 4.5 4.9
March 20, 2012
2 1 4 15 10 5 3
Class interval Class midpoint, x 1.5 1.9 1.7 2.0 2.4 2.2 2.5 2.9 2.7 3.0 3.4 3.2 3.5 3.9 3.7 4.0 4.4 4.2 4.5 4.9 4.7
Frequency, f 2 1 4 15 10 5 3 40
136.5 484.75
Summary
There are three main measures of central tendency : Mean, Mode and Median. There are two main measures of dispersion : Variance and Standard Deviation. The organization or grouping of raw data in a table is called Frequency distribution.
References
M R Spiegel : Theory and Problems of Statistics, Schaum's Outline Series, McGraw Hill. http://mathworld.wolfram.com