Basics For Understanding

Basics for Understanding Statistics
Data
The word “data” refers to the information that has been collected from an
experiment, a survey, an historical record, etc. (By the way, “data” is plural.
One piece of information is called a “datum.”)
Statistical data are usually obtained by counting or measuring items. Most data
can be put into the following categories:
 Qualitative - data are measurements that each fail into one of
several categories. (hair color, ethnic groups and other attributes of
the population)
 Quantitative - data are observations that are measured on a
numerical scale (distance traveled to college, number of children in
a family, etc.)
Qualitative data are generally described by words or letters. They are not as
widely used as quantitative data because many numerical techniques do not
apply to the qualitative data. For example, it does not make sense to find an
average hair color or blood type.
Qualitative data can be separated into two subgroups:
 Dichotomic (if it takes the form of a word with two options (gender -
male or female)
 Polynomic (if it takes the form of a word with more than two options
(education - primary school, secondary school and university).
Quantitative data are always numbers and are the result of counting or
measuring attributes of a population.
Quantitative data can be separated into two subgroups:
 Discrete (if it is the result of counting (the number of students of a
given ethnic group in a class, the number of books on a shelf, etc.)
 Continuous (if it is the result of measuring (distance traveled,
weight of luggage etc.)
Frequency Distribution
Frequency distribution shows the frequency, or number of occurences, in each
of several categories. Frequency distributions are used to summarize large
volumes of data values.
When the raw data are measured on a quantitative scale, either interval or ratio,
categories or classes must be designed for the data values before a frequency
distribution can be formulated.
Eg: The ages of twenty five students of Class B
21,22, 23, 24, 23,24,23, 23,24, 23, 25, 22,25, 21,21, 25,26, 26, 27,28,
26,27,27,22, 22
Frequency Distribution
No. Age (x) frequency

1 21 3
2 22 4
3 23 5
4 24 3
5 25 3
6 26 3
7 27 2
8 28 2
Note: Here the number of observation or the ‘n’ is 25 (which is nothing but the
cumulative frequency or the total frequency. IT IS NOT 8!!!!!).
Frequency distributions are good ways to present the essential aspects of data
collections in concise and understable terms
Normally distributed data

The histogram is bell-shaped, like the probability density function of a Normal
distribution. It appears, therefore, that the data can be modelled by a Normal
distribution. (Other methods for checking this assumption are available.)
Similarly, the histogram can be used to see whether data look as if they are
from an Exponential or Uniform distribution.
Skewed data
The relatively few large observations can have an undue influence when
comparing two or more sets of data. It might be worthwhile using a
transformation e.g. taking logarithms.
Bimodality
This may indicate the presence of two subpopulations with different
characteristics. If the subpopulations can be identified it might be better to
analyze them separately.
Outliers
The data appear to follow a pattern with the exception of one or two values. You
need to decide whether the strange values are simply mistakes, are to be
expected or whether they are correct but unexpected.
Population and Sample

In statistics, we often rely on a sample --- that is, a small subset of a larger set
of data --- to draw inferences about the larger set. The larger set is known as
the population from which the sample is drawn
 Population –The entire set of individuals or objects of interest or the
measurements obtained from all individuals or objects of interest
 Sample – A portion, or part, of the population of interest
Categories of Variables: Independent and Dependent Variables

Variables are properties or characteristics of some event, object, or person that
can take on different values or amounts (as opposed to constants such as π that
do not vary).
Types of variables
STATISTICS
Statistics is the science of collectiong, organizing, presenting, analyzing, and
interpreting data to assist in making more effective decisions. Or
“statistics” refers to a range of techniques and procedures for analyzing,
interpreting, displaying, and making decisions based on data.
There are two types of Statistics:

Descriptive Statistics and Inferential Statistics
Descriptive statistics are numbers that are used to summarize and describe
data. Descriptive statistics are just descriptive. They do not involve generalizing
beyond the data at hand. Descriptive Statistics are used by Researchers to
Report on Populations and Samples.
Generalizing from our data to another set of cases is the business of inferential
statistics
DESCRIPTIVE STATISTICS
 Central Tendency (or Groups’ “Middle Values”)
 Mean
 Median
 Mode
 Variation (or Summary of Differences Within Groups)
 Range
 Interquartile Range
 Variance
 Standard Deviation
Measures of Central Tendency
 Central tendency is the middle point of a distribution.
 Measures of Central Tendency are also called as measures of location
Three Measures of Central Tendency are
 Mean
 Median
 Mode
Measures of Dispersion
 Range
 Semi Interquartile Ranage
 Mean Absolute Deviation
 Variance and Standard Deviation
Arithmetic Mean: Properties

 Every set of interval-level and ratio-level data has a mean.
 All the values are included in computing the mean.
 A set of data has a unique mean.
 The mean is affected by unusually large or small data values.
 The arithmetic mean is the only measure of central tendency where the
sum of the deviations of each value from the mean is zero.
Arithmetic Mean: Advantages

1. It is rigidly defined
2. It is based on all the values in the distribution
3. It is least affected by fluctuations of sample
4. It is easily calculated
5. It is most easily understood
6. It is most amenable to algebraic treatment
Arithmetic Mean: Disadvantages

1. Means can be badly affected by outliers (data points with extreme values
unlike the rest)
2. Outliers can make the mean a bad measure of central tendency or
common experience.
3. Cannot compute mean for open ended classes.
Median: Properties
1. It is an average of position
2. It is affected by the number of items than by the extreme values
3. The sum of deviations about the median, signs ignored, will be less than
the sum of deviations taken from any other point
Median: Advantages
1. It is easily calculated and is not disturbed by the extreme values
2. It is more typical of the series
3. The median may be located even when the data are incomplete (e.g.,
when the class intervals are irregular and the final classes have open
ends)
Median: Disadvantages
1. The median is not so well suited to algebraic treatment like the arithmetic
geometric or harmonic mean
2. It is not so generally familiar as the arithmetic mean
Mode: Properties
1. It is not affected by the extreme values

2. It is the most typical value of the distribution
Mode: Advantages
1. Since it is the most typical value, it is the most important descriptive

average
2. It is easy to locate the approximate mode (but the determination of true
mode require extensive calculations)
3. Since the mode is usually an actual value, it indicates the precise value of
an important part of the series
Mode: Disadvantages
1. Unless the number of items is fairly large and the distribution reveals a
distinct central tendency, the mode has no significance
2. It is not capable of mathematical treatment
3. In a small number of items the mode may not exist
Geometric Mean: Properties
1. It is calculated value and depends upon the size of all the items
2. It give less importance to extreme items than dose the arithmetic mean
3. For any series of items it is always smaller than the arithmetic mean
4. It exists ordinarily only for positive values of the variate
Geometric Mean: Advantages
1. Since it is less affected by the extremes it is more typical average than the
arithmetic mean
2. Since it gives equal weight to equal ratios of change, it is particularly well
adopted when ratios of change are to be averaged
3. It is capable of algebraic treatment
Geometric Mean: Disadvantages
1. Its computation is relatively difficult

2. It cannot be determined if there is any negative value in the distribution,
or where one of the items has a zero value
3. It is not a widely known average
Harmonic mean
1. It is difficult to compute
2. It is not easily understandable

Basics For Understanding

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Basics For Understanding

Hochgeladen von

Copyright:

Verfügbare Formate

Basics for Understanding Statistics

No. Age (x) frequency

Normally distributed data

Population and Sample

Categories of Variables: Independent and Dependent Variables

There are two types of Statistics:

Arithmetic Mean: Properties

Arithmetic Mean: Advantages

Arithmetic Mean: Disadvantages

1. It is not affected by the extreme values

1. Since it is the most typical value, it is the most important descriptive

Geometric Mean: Properties

Geometric Mean: Advantages

Geometric Mean: Disadvantages

1. Its computation is relatively difficult

Das könnte Ihnen auch gefallen