Sie sind auf Seite 1von 7

Primary Data:Statistical data as we have seen can be either primary or secondary.

Primary data are


those which are collected for the first time and so are in crude form. But secondary data
are those which have already been collected.
Primary data are always collected from the source. It is collected either by the
investigator himself or through his agents. There are different methods of collecting
primary data. Each method has its relative merits and demerits. The investigator has to
choose a particular method to collect the information. The choice to a large extent
depends on the preliminaries to data collection some of the commonly used methods
are discussed below.
1. Direct Personal observation:
This is a very general method of collecting primary data. Here the investigator directly
contacts the informants, solicits their cooperation and enumerates the data. The
information are collected by direct personal interviews.

2. Indirect Oral Interviews :


This is an indirect method of collecting primary data. Here information are not collected
directly from the source but by interviewing persons closely related with the problem.
This method is applied to apprehend culprits in case of theft, murder etc. The
informations relating to one's personal life or which the informant hesitates to reveal are
better collected by this method.
3. Mailed Questionnaire method:
This is a very commonly used method of collecting primary data. Here information are
collected through a set of questionnaire. A questionnaire is a document prepared by the
investigator containing a set of questions. These questions relate to the problem of
enquiry directly or indirectly. Here first the questionnaires are mailed to the informants
with a formal request to answer the question and send them back.

4. Schedule Method:
In case the informants are largely uneducated and non-responsive data cannot be
collected by the mailed questionnaire method. In such cases, schedule method is used
to collect data. Here the questionnaires are sent through the enumerators to collect
informations. Enumerators are persons appointed by the investigator for the purpose.
They directly meet the informants with the questionnaire. They explain the scope and
objective of the enquiry to the informants and solicit their cooperation. The enumerators
ask the questions to the informants and record their answers in the questionnaire and
compile them. The success of this method depends on the sincerity and efficiency of the
enumerators. So the enumerator should be sweet-tempered, good-natured, trained and
well-behaved.
5. From Local Agents:
Sometimes primary data are collected from local agents or correspondents. These
agents are appointed by the sponsoring authorities. They are well conversant with the
local conditions like language, communication, food habits, traditions etc. Being on the
spot and well acquainted with the nature of the enquiry they are capable of furnishing
reliable information.

Secondary data are second hand informations. They are not collected from the source as the primary
data. In other words, secondary data are those which have already been collected. So they may be
relatively less accurate than the primary data. Secondary data are generally used when the time of
enquiry is short and the accuracy of the enquiry can be compromised to some extent. Secondary data
can be collected from a number of sources which can broadly be classified into two categories.
i) Published sources
ii) Unpublished sources
Published Sources:
Mostly secondary data are collected from published sources. Some important sources of published
data are the following.
1. Published reports of Central and State Governments and local bodies.

2. Statistical abstracts, census reports and other reports published by different ministries of the
Government.
3. Official publications of the foreign Governments.
4. Reports and Publications of trade associations, chambers of commerce, financial institutions etc.
5. Journals, Magazines and periodicals.
Unpublished Sources:
Statistical data can also be collected from various unpublished sources. Some of the important
unpublished sources from which secondary data can be collected are:
1. The research works carried out by scholars, teachers and professionals.
2. The records maintained by private firms and business enterprises. They may not like to publish
the information considering them as business secret.
3. Records and statistics maintained by various departments and offices of the Central and State
Governments, Corporations, Undertakings etc.

Data Classification:-

A well-planned data classification system makes essential data easy to find


and retrieve. This can be of particular importance for risk management, legal
discovery, andcompliance. Written procedures and guidelines for data
classification should define what categories and criteria the organization will
use to classify data and specify the roles and responsibilities of employees
within the organization regarding data stewardship. Once a data-classification
scheme has been created, security standards that specify appropriate
handling practices for each category and storage standards that define
thedata's lifecyle requirements should be addressed.
To be effective, a classification scheme should be simple enough that all
employees can execute it properly. Here is an example of what a data
classification scheme might look like:

Category 4: Highly sensitive corporate and customer data that if disclosed


could put the organization at financial or legal risk.
Example: Employee social security numbers, customer credit card numbers
Category 3: Sensitive internal data that if disclosed could negatively affect
operations.
Example: Contracts with third-party suppliers, employee reviews
Category 2: Internal data that is not meant for public disclosure.
Example: Sales contest rules, organizational charts
Category 1: Data that may be freely disclosed with the public.
Example: Contact information, price lists
Discrete data can only take particular values. There may potentially be an infinite number of
those values, but each is distinct and there's no grey area in between. Discrete data can be
numeric -- like numbers of apples -- but it can also be categorical -- like red or blue, or male
or female, or good or bad.
Continuous data are not restricted to defined separate values, but can occupy any value
over a continuous range. Between any two continuous data values there may be an infinite
number of others. Continuous data are always essentially numeric.

An ogive is a curve showing CUMULATIVE distribution


A histogram is a tabular graph of data distribution

The term correlation with reference to two or more variables signifies


that the variables are related in some way. Correlation analysis
determines whether a relationship between two variables exists, and

the strength of the relationship.


Regression analysis, on the other hand, uses the existing data to
determine a mathematical relationship between the variables which
can be used to determine the value of the dependent variable with
respect to any value of the independent variable.

Skewness
Skewness is a measure of degree of asymmetry of the distribution.
1. Symmetric
Mean, median, and mode are all the same here; the distribution is mound shaped, and no skewness is
apparent. The distribution is described as symmetric.

The above distribution is symmetric.


2. Skewed Left
Mean to the left of the median, long tail on the left.

The above distribution is skewed to the left.


3. Skewed Right

Mean to the right of the median, long tail on the right.

Both standard deviation and mean deviation are measures of variation (spread from a
central value like mean) in data.
Mean absolute deviation (MAD):
It is the mean/average of absolute deviations of data point from mean as suggested
by name i.e. we subtract the mean from each data point; take it's absolute value (nonnegative); sum it up and divide by the number of observations.
Note: The sum of deviations from mean in any data series is zero, so we take absolute value.
Refer: Nisha Arora's answer to How do I solve -- maths mean problem?
Standard deviation:
It is the mean/average of squared deviations of data point from mean i.e. we subtract
the mean from each data point; take it's square value (which is again non-negative); sum it
up and divide by the number of observations.
Surely, standard deviation is a better & most commonly used measure of
variation.

If excess kurtosis < 0 the distribution is platykurtic. They have a peak that is lower than the Normal:
the peak is flat and broad. The tails of the distribution are narrow. Uniform distributions are
platykurtic.
A mesokurtic distibution has excess kurtosis = 0. The Gaussian (Normal) distribution - whatever its
parameters - is mesokurtic. The binomial with probability of success close to 1/2 is also considered
to be mesokurtic.

If excess kurtosis is > 0 the distribution is leptokurtic. Leptokurtic distributions have a high and
narrow peak. A good example is the Student's t distribution.

The median is simply the point where 50% of the data is above and 50% is
below. It's a good, intuitive metric of centrality that is good at representing a "typical" or
"middle" value.
Quartile Deviation (QD) means the semi variation between the upper quartiles (Q3) and
lower quartiles (Q1) in a distribution.
Decile: Each of ten equal groups into which a population can be divided according to the
distribution of values of a particular variable.
Population: each of the 100 equal groups into which a population can be divided according to
the distribution of values of a particular variable.

Das könnte Ihnen auch gefallen