Sie sind auf Seite 1von 8

Basics for Understanding Statistics

Data
The word “data” refers to the information that has been collected from an
experiment, a survey, an historical record, etc. (By the way, “data” is plural.
One piece of information is called a “datum.”)

Statistical data are usually obtained by counting or measuring items. Most data
can be put into the following categories:
 Qualitative - data are measurements that each fail into one of
several categories. (hair color, ethnic groups and other attributes of
the population)
 Quantitative - data are observations that are measured on a
numerical scale (distance traveled to college, number of children in
a family, etc.)
Qualitative data are generally described by words or letters. They are not as
widely used as quantitative data because many numerical techniques do not
apply to the qualitative data. For example, it does not make sense to find an
average hair color or blood type.
Qualitative data can be separated into two subgroups:
 Dichotomic (if it takes the form of a word with two options (gender -
male or female)
 Polynomic (if it takes the form of a word with more than two options
(education - primary school, secondary school and university).
Quantitative data are always numbers and are the result of counting or
measuring attributes of a population.
Quantitative data can be separated into two subgroups:
 Discrete (if it is the result of counting (the number of students of a
given ethnic group in a class, the number of books on a shelf, etc.)
 Continuous (if it is the result of measuring (distance traveled,
weight of luggage etc.)

Frequency Distribution
Frequency distribution shows the frequency, or number of occurences, in each
of several categories. Frequency distributions are used to summarize large
volumes of data values.
When the raw data are measured on a quantitative scale, either interval or ratio,
categories or classes must be designed for the data values before a frequency
distribution can be formulated.
Eg: The ages of twenty five students of Class B
21,22, 23, 24, 23,24,23, 23,24, 23, 25, 22,25, 21,21, 25,26, 26, 27,28,
26,27,27,22, 22
Frequency Distribution

No. Age (x) frequency


1 21 3
2 22 4
3 23 5
4 24 3
5 25 3
6 26 3
7 27 2
8 28 2

Note: Here the number of observation or the ‘n’ is 25 (which is nothing but the
cumulative frequency or the total frequency. IT IS NOT 8!!!!!).
Frequency distributions are good ways to present the essential aspects of data
collections in concise and understable terms

Normally distributed data


The histogram is bell-shaped, like the probability density function of a Normal
distribution. It appears, therefore, that the data can be modelled by a Normal
distribution. (Other methods for checking this assumption are available.)
Similarly, the histogram can be used to see whether data look as if they are
from an Exponential or Uniform distribution.

Skewed data
The relatively few large observations can have an undue influence when
comparing two or more sets of data. It might be worthwhile using a
transformation e.g. taking logarithms.

Bimodality
This may indicate the presence of two subpopulations with different
characteristics. If the subpopulations can be identified it might be better to
analyze them separately.
Outliers
The data appear to follow a pattern with the exception of one or two values. You
need to decide whether the strange values are simply mistakes, are to be
expected or whether they are correct but unexpected.

Population and Sample


In statistics, we often rely on a sample --- that is, a small subset of a larger set
of data --- to draw inferences about the larger set. The larger set is known as
the population from which the sample is drawn
 Population –The entire set of individuals or objects of interest or the
measurements obtained from all individuals or objects of interest
 Sample – A portion, or part, of the population of interest

Categories of Variables: Independent and Dependent Variables


Variables are properties or characteristics of some event, object, or person that
can take on different values or amounts (as opposed to constants such as π that
do not vary).
Types of variables

STATISTICS
Statistics is the science of collectiong, organizing, presenting, analyzing, and
interpreting data to assist in making more effective decisions. Or
“statistics” refers to a range of techniques and procedures for analyzing,
interpreting, displaying, and making decisions based on data.

There are two types of Statistics:


Descriptive Statistics and Inferential Statistics
Descriptive statistics are numbers that are used to summarize and describe
data. Descriptive statistics are just descriptive. They do not involve generalizing
beyond the data at hand. Descriptive Statistics are used by Researchers to
Report on Populations and Samples.
Generalizing from our data to another set of cases is the business of inferential
statistics
DESCRIPTIVE STATISTICS
 Central Tendency (or Groups’ “Middle Values”)
 Mean
 Median
 Mode
 Variation (or Summary of Differences Within Groups)
 Range
 Interquartile Range
 Variance
 Standard Deviation
Measures of Central Tendency
 Central tendency is the middle point of a distribution.
 Measures of Central Tendency are also called as measures of location
Three Measures of Central Tendency are
 Mean
 Median
 Mode
Measures of Dispersion
 Range
 Semi Interquartile Ranage
 Mean Absolute Deviation
 Variance and Standard Deviation

Arithmetic Mean: Properties


 Every set of interval-level and ratio-level data has a mean.
 All the values are included in computing the mean.
 A set of data has a unique mean.
 The mean is affected by unusually large or small data values.
 The arithmetic mean is the only measure of central tendency where the
sum of the deviations of each value from the mean is zero.

Arithmetic Mean: Advantages


1. It is rigidly defined
2. It is based on all the values in the distribution
3. It is least affected by fluctuations of sample
4. It is easily calculated
5. It is most easily understood
6. It is most amenable to algebraic treatment

Arithmetic Mean: Disadvantages


1. Means can be badly affected by outliers (data points with extreme values
unlike the rest)
2. Outliers can make the mean a bad measure of central tendency or
common experience.
3. Cannot compute mean for open ended classes.
Median: Properties
1. It is an average of position
2. It is affected by the number of items than by the extreme values
3. The sum of deviations about the median, signs ignored, will be less than
the sum of deviations taken from any other point

Median: Advantages
1. It is easily calculated and is not disturbed by the extreme values
2. It is more typical of the series
3. The median may be located even when the data are incomplete (e.g.,
when the class intervals are irregular and the final classes have open
ends)

Median: Disadvantages

1. The median is not so well suited to algebraic treatment like the arithmetic
geometric or harmonic mean
2. It is not so generally familiar as the arithmetic mean

Mode: Properties

1. It is not affected by the extreme values


2. It is the most typical value of the distribution

Mode: Advantages

1. Since it is the most typical value, it is the most important descriptive


average
2. It is easy to locate the approximate mode (but the determination of true
mode require extensive calculations)
3. Since the mode is usually an actual value, it indicates the precise value of
an important part of the series

Mode: Disadvantages

1. Unless the number of items is fairly large and the distribution reveals a
distinct central tendency, the mode has no significance
2. It is not capable of mathematical treatment
3. In a small number of items the mode may not exist

Geometric Mean: Properties

1. It is calculated value and depends upon the size of all the items
2. It give less importance to extreme items than dose the arithmetic mean
3. For any series of items it is always smaller than the arithmetic mean
4. It exists ordinarily only for positive values of the variate

Geometric Mean: Advantages

1. Since it is less affected by the extremes it is more typical average than the
arithmetic mean
2. Since it gives equal weight to equal ratios of change, it is particularly well
adopted when ratios of change are to be averaged
3. It is capable of algebraic treatment

Geometric Mean: Disadvantages

1. Its computation is relatively difficult


2. It cannot be determined if there is any negative value in the distribution,
or where one of the items has a zero value
3. It is not a widely known average

Harmonic mean

1. It is difficult to compute
2. It is not easily understandable

Das könnte Ihnen auch gefallen