You are on page 1of 32

Descriptive statistics

Dr. N.K.Dashora narandrad@gmail .com

Definition
A set of brief descriptive coefficients that summarizes a given data set, which can either be a representation of the entire population or a sample. The measures used to describe the data set are measures of central tendency and measures of variability or dispersion. Measures of central tendency include the mean, median and mode, while measures of variability include the standard deviation (or variance), the minimum and maximum variables, kurtosis and skewness.

Definition
Mathematical methods (such as mean, median, standard deviation) that summarize and interpret some of the properties of a set of data (sample) but do not infer the properties of the population from which the sample was drawn

What is data ?
Descriptive statistics are numbers that are used to summarize and describe data. The word "data" refers to the information that has been collected from an experiment, a survey, an historical record, etc. "data" is plural. One piece of information is called a "datum.

Example
If we are analyzing population data of Indian states , a descriptive statistic might be the percentage of people below 5 years of age in different states of India. Several descriptive statistics are often used at one time, to give a full picture of the data.

No Inference
Descriptive statistics are just descriptive. They do not involve generalizing beyond the data at hand. Generalizing from our data to another set of cases is the business of inferential statistics, . Here we focus on (mere) descriptive statistics.

Median
To find the median, we arrange the observations in order from smallest to largest value. If there is an odd number of observations, the median is the middle value. If there is an even number of observations, the median is the average of the two middle values. Thus, in the sample of five persons the median value would be 130 pounds; since 130 pounds is the middle weight.

Data set
HIGH
LOW CLOSE

18542.20 20664.80 20552.03 21108.64


20854.55

Feb Jan Dec Nov


Oct

17926.8
18038.48 19074.57

18008.15
18327.76 20509.09

18954.82
19768.96

19521.25
20032.34

20267.98

Sept

18027.12

20069

15. BSE Sensitive Index and NSE Nifty Index of Ordinary Share Prices - Mumbai 2010 2011 1 2 3 4 5 6 7 BSE SENSEX (1978-79=100) 17051.14 18882.25 19092.05 18978.32 19046.54 19007.53

BSE SENSEX (1978-79=100

S & P CNX NIFTY (3.11.1995=1000)

Jan. 17 Jan. 18

17051.14 18882.25

5094.15 5654.75

Jan. 19

19092.05

5724.05

Jan. 21

19046.54 19007.53

5711.60 5696.50

Mode
The mode is the most frequently appearing value in the population or sample. Suppose we draw a sample of five persons and measure their weights. They weigh 100 pounds, 100 pounds, 130 pounds, 140 pounds, and 150 pounds. Since more persons weigh 100 pounds than any other weight, the mode would equal 100 pounds.

Mean
The mean of a sample or a population is computed by adding all of the observations and dividing by the number of observations. Returning to the example of the five persons, the mean weight would equal (100 + 100 + 130 + 140 + 150)/5 = 620/5 = 124 pounds. = sum of X1 + X2 + X3 + X4 + X.n/ n Where X is the values of the variable and n is the number of observations.

Data and presentation

Find Mean , Median and Mode


1 2 3 4 5 6 7 8 26 25 23 12 15 18 09 12

Proportion and Percentage


A proportion refers to the fraction of the total that possesses a certain attribute. For example, we might ask what proportion of persons in our sample weigh less than 135 pounds. Since 3 persons weigh less than 135 pounds, the proportion would be 3/5 or 0.60. A percentage is another way of expressing a proportion. A percentage is equal to the proportion times 100. In our example of the five persons , the percent of the total who weigh less than 135 pounds would be 100 * (3/5) or 60 percent.

Notation
Of the various measures, the mean and the proportion are most important. The notation used to describe these measures appears below: X: Refers to a population mean. x: Refers to a sample mean. P: The proportion of elements in the population that has a particular attribute. p: The proportion of elements in the sample that has a particular attribute. Q: The proportion of elements in the population that does not have a specified attribute. Note that Q = 1 - P. q: The proportion of elements in the sample that does not have a specified attribute. Note that q = 1 - p. Note that capital letters refer to population parameters and lower-case letters refer to sample Statistics

Exercise
Please rewrite these notations by your memory This is self checking exercise

Measures of Variability
Some parameters attempt to describe the amount of variation between random variables. For example, consider a population of four random variables {6,6,6,6,}. Here, each of the random variables are equal, so there is no variation. The set {3, 5, 5, 7}, on the other hand, has some variation since some random variables are different. The three parameters that are used to quantify the amount of variation in a set of random variables - the range, the variance, and the standard deviation. Though there are other measures also such as mean deviation.

Notation
: The variance of the population. : The standard deviation of the population. 2 s : The variance of the sample. s: The standard deviation of the sample. : The population mean. x: The sample mean. N: Number of observations in the population. n: Number of observations in the sample. P: The proportion of elements in the population that has a particular attribute. p: The proportion of elements in the sample that has a particular attribute. Q: The proportion of elements in the population that does not have a specified attribute. Note that Q = 1 - P. q: The proportion of elements in the sample that does not have a specified attribute. Note that q = 1 - p.
2

Exercise
Rewrite the notations given in the last slide. This is self checking exercise .

The Range
The range is the simplest measure of variation. It is difference between the biggest and smallest random variable. Range = Maximum value - Minimum value Therefore, the range of the four random variables (3, 5, 5, 7} would be 7 - 3 or 4.

Variance of a Random Variable


It is important to distinguish between the variance of a population and the variance of a sample. They have different notation, and they are computed differently. 2 The variance of a population is denoted by ; and the 2 variance of a sample, by s . The variance of a random variable is the average squared deviation from the population mean, as defined by the following formula: 2 2 = ( Xi - ) / N 2 where is the population variance, is the population mean, Xi is the ith element from the population, and N is the number of elements in the population.

Variance of a Sample
The variance of a sample is defined by slightly different formula: 2 2 s = ( xi - x ) / ( n - 1 ) where s2 is the sample variance, x is the sample mean, xi is the ith element from the sample, and n is the number of elements in the sample. Using this formula, the sample variance can be considered an unbiased estimate to the true population variance. Therefore, if you need to estimate the unknown population variance, based on known data from a sample, this is the formula to use.

Example 1
A population consists of four observations: {1, 3, 5, 7}. What is the variance? Solution: First, we need to compute the population mean. =(1+3+5+7)/4=4 Then we plug all of the known values in to formula for the variance of a population, as shown below: 2 2 i-) /N =(X 2 2 2 2 2 =[(1-4) +(3-4) +(5-4) +(7-4) ]/4 2 2 2 2 2 = [ ( -3 ) + ( -1 ) + ( 1 ) + ( 3 ) ] / 4 2 = [ 9 + 1 + 1 + 9 ] / 4 = 20 / 4 = 5

Example 2
A sample consists of four observations: {1, 3, 5, 7}. What is the variance? Solution: This problem is handled exactly like the previous problem, except that we use the formula for calculating sample variance, rather than the formula for calculating population variance. 2 2 s = ( xi - x ) / ( n - 1 ) 2 2 2 2 2 s =[(1-4) +(3-4) +(5-4) +(7-4) ]/(4-1) 2 2 2 2 2 s = [ ( -3 ) + ( -1 ) + ( 1 ) + ( 3 ) ] / 3 2 s = [ 9 + 1 + 1 + 9 ] / 3 = 20 / 3 = 6.667

Difference
Is there any difference between the two ? Which is more ? Why it is more ? What if the sample size increases ?

Variance of a Proportion
The variance formulas introduced in the previous section can be used with confidence for any random variable - even proportions. However, for proportions the formulas can be expressed in a form that is easier to compute. With an infinite population or when sampling with replacement, the variance of a population proportion is defined by the following formula: 2 = PQ / n where P is the population proportion, Q equals 1 - P, and n is sample size.

Given the same constraints (infinite population or sampling with replacement), the variance of the sample proportion is defined by slightly different formula: 2 s = pq / (n - 1) where n is the number of elements in the sample, p is the sample estimate of the true proportion, and q is equal to 1 - p. Using this formula, the sample variance can be considered an unbiased estimate of the true population variance. Therefore, if you need to estimate the unknown population variance, based on known data from a sample, this is the formula to use.

Many Books on statistics texts present only the formula for the variance of the population proportion. If the sample size is very large, both formulas give similar results; but when the sample size is small, it is better to use the correct formula

Standard Deviation of a Random Variable The standard deviation is the square root of the variance. It is important to distinguish between the standard deviation of a population and the standard deviation of a sample. They have different notation, and they are computed differently. The standard deviation of a population is denoted by ; and the standard deviation of a sample, by s. The standard deviation of a random variable is defined by the following formula: 2 = sqrt * ( Xi - ) / N ] where is the population standard deviation, is the population mean, Xi is the ith element from the population, and N is the number of elements in the population.

S.D of sample
The standard deviation of a sample is defined by slightly different formula: 2 s = sqrt * ( xi - x ) / ( n - 1 ) ] where s is the sample standard deviation, x is the sample mean, xi is the ith element from the sample, and n is the number of elements in the sample. Using this formula, the sample standard deviation can be considered an unbiased estimate to the true population standard deviation. Therefore, if you need to estimate the unknown population standard deviation, based on known data from a sample, this is the formula to use.

Find out Mean , Median , Standard Deviation and Variance and range.
Roll Number 1 2 3 4 5 6 7 8 9 10 Marks 12 15 08 24 17 27 15 17 12 13