Sie sind auf Seite 1von 13

Data

The number or observations usually obtained by some process of counting or


measurement. These are referred to collectively as data (raw material of statistics).
For example, heights and weights of the students of a class. Take another example,
we can collect the number of telephones that several workers install on a given day
or that one worker installs per day over a period of several days and we can call the
results our data. A collection of data is called a data set and single observation a
data point.
Reasons for obtaining data
1. Data are needed to provide the necessary input to a survey.
2. Data are needed to provide the necessary input to a study.
3. Data are needed to measure performance of an ongoing service or production
process.
4. Data are needed to evaluate conformance to standards.
5. Data are needed to assist in formulating alternative courses of action in a
decision making process.
6. Data are needed to satisfy our curiosity.
Cross-sectional and time series data. Cross-sectional data are data collected data
the same or approximately the same point in time. Time series data are data
collected over several time periods.
Variable
If man be an element of a population which possesses certain characteristic such
as height, weight, age, hair, color etc. Each of these characteristics varies from man
to man either in magnitude or in quality and is, therefore, called a variable.
There are three basic ways of classifying a data set: (i) by the number of variables
(univariate, bivariate or multivariate), (ii) by the kind of information (numbers or
categories) represented by each variable and (iii) by whether the data set is time
sequence or comprises cross-sectional data (cross sectional is just a fancy way of
saying that no time sequence is involved, i.e., the first quarter 1996 earnings of eight
acre spaces firms).
Univeriate (one-variable) data sets have just one piece of information recorded for
each item. Only heights of the students of a class.
Bivariate (two-variable) data sets have exactly two pieces of information recorded
for each item. Heights and weights of the students of a class.
For bivariate data, in addition to looking at each variable as a univariate data set, you
can study the relationship between the two variables and predict one variable form
the other.

Multivariate (many variable) data sets have three or more pieces of information
recorded for each item. Heights, weights and length of forearms of the students of a
class. Also with multivariate data, you can look at each variable individually, as well
as examine the relationship among the variables and predict one variable from the
other.
Types of Data
Variables may be either quantitative or qualitative. Quantitative variable can be
measured while qualitative variable can categorized. The characteristics used to
classify an individual into different categories is called attribute. As for example,
height of a man, the yield of a crop, the price of a commodity is quantitative variables
while hair color is a qualitative variable. Some categories of the hair color are block
hair, golden hair and white hair.
The quantitative or measurable variable may of two types-discrete and continuous.
Discrete variable
When a variable can assume only isolated values, it is called a discrete variable. For
example, if the number of children in a family is the variable of interest, it is obvious
that it cannot assume fractional values and it is a discrete variable.
Continuous variable
A continuous variable is that which takes any value within some range. Height of a
man is a continuous variable since it can any value which may be either an integral
number of inches or fraction of an inch.
An another example of types of data
Data Type
Categorica
l
Numerical

Discrete
Continuous

Question Types
Do you currently own U. S.
Government Savings Bonds?
To how many magazines do
you currently subscribe?
How tall are you

Responses
Yes
No
3 Number
67 inches

Source: Berenson and Levine, page: Types of Data 25, Basic Business Statistics,
Concept and applications, seventh edition, 1999.
Levels of Measurement and types of Measurement scales (Source: Berenson
and Levine, page of Data 27)
Statistical data may be broadly classified as categorical and numerical.
Categorical data are of two types: nominal and ordinal, while numerical data are
measured in interval scale and ratio scale.

Nominal data. All qualitative measurements are nominal, regardless of whether the
categories are designated by names (red, white, male) or numerals (June 20, Room
10). Religion, political affiliation. Urban-rural, etc.
Ordinal data. When there is an ordered relationship among the categories, the
variable is said to be an ordinal variable. For example, we can classify level of
knowledge as good, average and poor. Education level: illiterate, primary and
secondary.
Attributes. The distinct categories of the qualitative variables are sometimes called
attributes. A worker, when reported to be smoking, is attributable to the category
smoker. His smoking behavior is used to classily him as smoker and thus it is an
attribute.
Interval Scale: Data generated through the measurement of an interval variable are
called interval data.
A thermometer, for example, measures temperatures in degrees, which are the same
size at any point of on the scale. The difference between 20 0C and 210C is the same
as the difference between 120C and 130C.
I.Q test score, calendar time (3 p.m. to 6 p.m.) etc.
Ratio Scale: Ratio data have all the ordering and distance properties of interval
data. In addition, a zero point can be meaningfully designated. For example, it is
quite meaningful to say that a 4-foot-tall boy is twice as tall as a 2-foot-tall boy.
Height, Weight, Fat consumed (in gm), Distance (in km), etc.
Example of nominal scaling
Categorical Variable
Automobile ownership
Political party affiliation

Categories
Yes
No
Democrat, Republican, Independent
Other

Example of Ordinal Data


Job classification such as president, vice-president, departmental head and
associate department head, recorded for each of a group of executives.
Example of Ordinal scaling
Categorical Variable
Product satisfaction
Faculty rank
Example.

Ordered Categories
Lowest-Highest:
Very
Unsatisfied,
Fairly Unsatisfied, Neutral, Fairly
Satisfied, Very Satisfied
Highest-Lowest: Professor, Associate
Professor, Assistant Professor, Lecturer

Numerical Variable
Temperate (in degrees Celsius or
Fahrenheit)
Calendar Time (Gregorian, Hebrew or
Islamic)
Height (in inches or centimeters)
Weights (in pounds or kilograms)
Age (in years or days)
Salary (in American dollars or Japanese
yen)

Level of Measurement
Interval
Interval
Ratio
Ratio
Ratio
Ratio

Sources of data
Primary data. When the investigator collects first hand data for the purpose at hand,
such data are known as primary data.
Secondary data. When the investigator obtained the data from published or
unpublished government, industrial or individual sources such data will constitute
secondary data.
Technique of data collection. There are five important technique of data collection,
namely (i) Census (ii) Sample survey (iii) Focus group discussion, (iv) Telephone
interview, (v) Data collection through electronics media (vi) We may design an
experiment to obtain the necessary data.
Classification
Classification is the process of arranging individuals in groups or classes according
to their affinities.
Types of Classification.
Broadly, the data can be classified on the following four bases:
1. Geographical, i.e., area-wise, e.g., cities, districts, etc.
2. Chronological, i.e., on the basis of time.
3. Qualitative, i.e., according to some attributes.
4. Quantitative, i.e., in terms of magnitudes.

1. Geographical Classification.
In this type of classification data are classified on the basis of geographical or
location differences between the various items, like States, cities, regions, zones,
areas, etc. For instance, the production of foodgrains in India may be presented
Stat-wise in the following manner:
State-wise Estimates of Production of Foodgrains: 1987-88
Name of State

Total Foodgrains
(Thousand tonnes)

Andhra Pradesh
Bihar
Haryana
Punjab
Uttar Pradesh
All India

9,690.5
9,074.5
6,301.9
17,065.4
28,095.7
1,38,414.3

Geographical classifications are usually listed in alphabetical order for easy


reference.
2. Chronological classification. This type of statistical data is classified according
to the time of its occurrence, such as years, months, weeks, days, hours, etc. For
example, census data are expressed in decades, national income is expressed
every year, and departmental sales are expressed every month or week.
Time series are also called chronological classification. They are further classified
into the period of time and at the point of time. Statistical data regarding
population, imports, exports, sales in a firm, etc., also come under this
classification.
Chronological Classification is illustrated below :
Population of India from 1921 to 1981
Year

Population (in inillion)

1921
1931
1941
1951
1961
1971
1981

248
276
313
357
438
536
684

3. Qualitative classification. When the data are classified according to some


quality or attributes, such as sex, honesty, intelligence, literacy, blindness, colour,,
deafness, religion, marital status, etc., the classification is termed as qualitative or

descriptive attributes. In this type we can only find out the presence or absence of
the attributes, in the given units.
This again can be classified into two types:
(a) Simple classification.
(b) Manifold classification.
(a) Simple classification. If the data are classified into only two classes, such
as literate and illiterate or honest and dishonest or skilled and unskilled, the
classification is termed as simple classification. This classification is normally
dichotomy or twofold; for example,
Population

Male

Population

Female

Literate

Illiterate

(b) Manifold classification. In manifold classification, the universe is classified


on the basis of more than one attribute at a time; for example, we may first
divide the population into males and females on the attribute of sex; then
further divide them on the basis of literacy and so on:
Population
Male
Literate
Married

Female
Illiterate

Literate

Unmarried Married Unmarried

Married

Illiterate

Unmarried Married Unmarried

4. Quantitative Classification. Quantitative classification refers to the classification


of data according to some characteristics that can be measured, such as height,
weight, income, sales, profits, production, etc. For example, the students of a
college may be classified according to weight as follows:
Weight in (lbs)

No. of Students

90 100
100 110
110 120
120 130
130 140
140 150
Total

50
200
260
360
90
40
1,000

Such a distribution is known as empirical frequency distribution or simple


frequency distribution. Series, which can be described by a continuous variable, are
called continuous series. Series represented by a discrete variable are called
discrete series. The following are two examples of discrete and continuous
frequency distributions:
Examples of discrete and continuous frequency distributions:
No. of Children No. of Children

0
1
2
3
4
5
6

10
40
80
100
250
150
50

Total

680

Weight (lbs.)

100 110
110 120
120 - 130
130 140
140 150
150 160

No. of Persons

10
15
40
45
20
4

Total

134

Tabulation of Data
Tabulation. By tabulation we mean, a systematic presentation of numerical data in
columns and rows in accordance with some salient features or
characteristics. Columns are vertical arrangement and rows are horizontal
arrangement.
Objects
Tabulation helps in understanding complex numerical data and makes them in a
simple and clear way that their similar and dissimilar facts are separated.
Parts of Tabulation
A good statistical table is an art. The following parts must be present in all tables :
1.
2.
3.
4.
5.
6.
7.
8.

Table number
Title
head note
Caption
Stubs
Body of the table
Foot-note
Source-note.

1. Table number. A table should always be numbered for identification and


reference in the future. Each column should also be numbered as shown in the
illustration.
2. Titled of the table. Each table should be given a suitable title. It must be written
on the top of table. It must describe the contents of the table. It must explain (1)

3.
4.
5.
6.
7.
8.

what the data are (2) where the data are (3) time or period of data (4) how the
data are classified, etc. District wise distribution of Infant Mortality in Bangladesh
in 2007.
Head not. It is a statement, given below the title and enclosed in brackets; for
example, the unit of measurement is written as a head-not, such as in millions
or in crores.
Captions. These are headings for the vertical columns. They must be brief and
self-explanatory. They have main heading and sub-headings and must be written
in small letters. Main heading: population. Sub heading: Male and female.
Stubs. These are the heading or designation for the horizontal rows. Stubs are
wider than columns. Ages: 14 19, 19 24 etc.
Body of the table. It contains the numerical information. It is the most important
part of the table. The arrangement in the body is generally from left to right in
rows and from top to bottom in columns.
Foot-note. If any explanation or elaboration regarding any item in necessary,
foot notes should be given. Example. Truncated class interval, i.e., every class
interval is 5- years but last one is 3- years.
Source-note. It refers to the source from where information has been taken. It is
useful to the reader to cheek the figures and gather additional information.
Source: Pocket Statistical year Book, 2007.

STRUCTURE OF A TABLE
Number
Title
(Head-note if any)
Stub Heading,

Caption
Col. Heading
Heading
(2)

(1)

Total
Col.
(4)
(3)

Stub entries
Body
Total
Foot-note:
Source:
Raw Data
Information or observation before it is arranged and analyzed is called raw data. It is
raw because it is unprocessed by statistical methods.
Example of raw data
16.2
15.7
16.4

Yards produced yesterday by each of 30 Carpet Looms


15.4
16.0
16.6
15.9
15.8
16.0
16.8
16.9
16.4
15.2
15.8
15.9
16.1
15.6
15.9
15.6
15.8
15.7
16.2
15.6
15.9
16.3
16.3
16.0

16.8
16.0
16.3

Source: Levin and Rubin, page No. 8, Statistics for Management, Seventh edition,
1997.
Data Array
Data Array is one of the simplest ways to present data. It arranged values in
ascending or descending order.
The above Carpet data rearranges in a data array in ascending order as
follows:
15.2
15.4
15.6
15.6
15.6

15.7
15.7
15.8
15.8
15.8

15.9
15.9
15.9
15.9
16.0

16.0
16.0
16.0
16.1
16.2

16.2
16.3
16.3
16.3
16.4

16.4
16.6
16.8
16.8
16.9

Ordered Array
If we place the raw data in order, form the smallest to the largest observation, the
ordered sequence obtained is called an ordered array.
Advantages of data arrays
1.
2.
3.
4.

We can quickly notice the lowest and highest values in the data.
We can easily divide the data into sections.
We can whether any values appear more that once in the array
We can observe the distance between succeeding values in the data.

Disadvantages of data arrays


In spite of these advantages, sometimes a data array is not helpful. Because it lists
every observation, it is cumbersome form for displaying large quantities of data. We
need to compress the information and still be able to use it for information and
decision making. Next we can use stem leaf display for data management.
The Stem and Leaf Display
The stem and leaf display is a valuable and versatile tool for organizing a set of data
and understanding how the values distribution and cluster over the range of the
observations in the set of data. A stem and leaf display separates data entries into
leading digits, or items, and trailing digits or leaves.
How the construct a stem and leaf display
1. Define the stem and leaf you wish to use you will probably wish to choose the
stem so that the number of possible stems in the display is not too large.
2. Write the stems in a column form the smallest stem at the top to the largest at
the bottom.
3. Record the leaf for each observation in the row corresponding to its stem.
Advantages
1. It gives us a good graphical picture of the small size of data set.
2. If we want to recover the original data from a stem and leaf display, we can
readily reconstruct the value of the observations by recombining the leaves
with the stems.
3. The construction of the stem and leaf display automatically arranges the
observations in ordered sets. This makes if easy to arrange the observations
from smallest to largest and for example, to find the observation in the middle
of this ordered arrangement.
One disadvantage of stem and leaf display is that it is awkward to control the
number of stems. A more obvious disadvantage is that the stem and leaf
display is unsuitable when the number of observations in the data set is large.

Because the number of leaves in the stem rows becomes too large.
Disadvantage. Source: Statistics for Business and Economics page 29 and
Basic Business Statistics Concepts and Applications page 55.
Example 1
Construct a stem and leaf display for the following measurements:
.02
.20
.18

.08
.06
.09

.14
.01
.02

.32
.03
.07

.27
.12

.08
.22

.01
.42

.11
.33

Solution
To construct a stem and leaf display, first we define the stem and leaf. For the stem,
use the first digit after the decimal point, thus giving stems .0, .1, .2, .3 and .4. We
will then write the stems in a column and record the leaf for each observation.
Stem
.0
.1
.2
.3
.4

Leaf
2
4
7
2
2

8
1
0
3

8
2
2

1
8

Example 2
The table contains the price-earrings (P/E) ratios for samples of firms from the
electronics industry and the auto parts industry.
Auto Parts
Firm
Leav Siegler
Purolator
Easco
Genuine Parts
Federal Mogul
PPG Industries
AO Smith
Borg-Warner
Hoover Universal
Libbey-Owong-Ford
Dana
Champion Spark Plug
Dayco
Sheller-Globe
Arvin Industries

P/E ratio
11
15
14
15
12
12
35
12
12
23
23
18
39
15
16

Electronics
Firm
AMP
Raytheon
General Instrument
Intel
Avnet
Perkin Elmer
TRW
Motorola
Hcwlett-Packard
Honeywell
American District
Corning Glass Works
Gould
EG4G
Varian Associates

P/E ratio
28
13
14
55
27
24
15
26
22
12
11
15
18
22
26

a. Construct a stem and leaf display for each of these data sets.
b. What do your stem and leaf display suggest about the level of the P/E
ratios of firms in the electronics industry as compared to firms in the
auto parts industry? Explain.
Auto Parts
Stem Leaf
1
1
2
3
3
5

5
3
9

4
7

5
4

3
6

1
2

5
2

8
6

Electronics
Stem
1
2
3
4
5

Leaf
3
8
5

From the stem and leaf charts, we can see the for auto parts, the P/E ratio is usually
<20, while for electronics, it is <30.
Levin and Rubin, Page 132
Table: 3.22 Grades on Midterm quiz of 27 students given below.
79
99
51

78
84
48

78
72
50

67
66
61

76
57
71

87
94
82

85
84
93

73
72
100

66
63
89

To Produce a stem and leaf display for the data in table 3.22 we make a vertical list
of stems (the digits of each data item) like this :
4
5
6
7
8
9
10
Then we draw a vertical line to the right of these stems, and list the leaves (the next
digit for all the stems) to the right of the line in the order that we encountered them in
the original data set

4
5
6
7
8
9
10

8
7
7
9
7
9
0

1
6
8
5
4

0
6
8
4
3

3
6
4

1
3
2

2
4

Finally, we arrange all of the leaves in each row in rank order.


4
5
6
7
8
9
10

8
0
1
1
2
3
0

1
3
2
4
4

7
6
2
4
9

6
3
5

7
6
7

8
9

If we pick the 9/3 4 9, it means there are three items in the data set that begin with
nine (93, 94 and 99).

Das könnte Ihnen auch gefallen