Sie sind auf Seite 1von 13

Introduction to Biostatistics

Statistics is the scientific field that deals with collection, classification,
presentation, description, analysis & interpretation of data.

It includes:

Descriptive statistics:
which is concerned with the summary measures of data for a sample of

a population.()

Analytic statistics:
concerns with the use of data from a sample of population. (


Vital statistics:
is the ongoing collection by government agencies of data relating to
events such as births, deaths, marriage, divorces, health and disease
related conditions reportable by local health authorities.


Application of statistical procedures in the field of biological sciences & medicine.

Uses in Medicine:

necessary for both clinical and preventive medicine.

Physicians should have the basic knowledge for evaluation and criticism of
researches published in medical journals.

Statistics is helpful to assess diagnostic testing and the effects of new drugs and
treatment modalities.

Epidemiologists need to know how to calculate rates, to compare between


are the basic building blocks of statistics and refer to the individual values
measured or observed. Data can be derived from a total population or a sample.

Methods of collection of data

1. By conducting survey:

Data collected from the population in the field of the study using a designed
questionnaire. There are two types of surveys
a) Comprehensive surveys:

From every member in the population (the total population present

in the field of the study)

Requires a great deal of time, great effort and money

Only used in census.

b) Sample survey:

From a representative sample could be generalized over the total


Commonly used for collection of data since it requires less effort,

time and money.

2. Data collected from records:

Data are already present. Sources of these data:
Population census.
Hospital records.
School health records.
Vital statistics; these are published yearly and contain data about births,
deaths and morbidity.

Text-books and scientific journals.

Types of data:

a) Constant data:

These are observations which do not vary from one person to another such as
number of eyes, fingers, ears etc.

b) Variables:

These are observations, which vary from one person to another or from one
group of members to others and are:

1. Quantitative variables:
These may be continuous or discrete.

a- Continuous quantitative variable:

Which are obtained by measurement and its value could be integer or

fractionated value.

Examples: Weight, height, Hemoglobin, age, income, volume of urine.

B-Discrete or discontinuous quantitative variables:

Which are obtained by enumeration and its value is always integer value.
Examples: Pulse, family size, number of live births, number of abortions.
2. Qualitative variables:

Which are expressed in quality and cannot be enumerated or measured but can
be categorized only.

They can be ordinal or nominal.

a- Ordinal qualitative:

Can be put in order. e.g. degree of success: excellent, very good, good, fair.
b- Nominal qualitative:

Cannot be put in order and is further subdivided into:

1. dichotomous (e.g. sex and Yes/No variables)

2. Multi-chotomous (e.g. marital status, blood groups).

Methods of presentation:

Numerical presentation:


Graphical presentation

1- Numerical presentation:

a) Simple numerical presentation (ungrouped or unclassified data):

This method is used when we are dealing with small size (5, 7, 10

observations). Examples: the weights of five infants are as follows: 8, 6, 9, 5,

and 7 kg. The heights of six students are 148, 161, 162, 170, 172, and 168

b) Tabular presentation (grouped data )

The best and most convenient method for summarization of a large mass of data
is using table.

Types of tables:

1. Simple frequency distribution table

2. Table of an association or contingency table

3. Comparing frequency distribution table

4. Two way table or two way classification

1-Simple frequency distribution table:

) (

When the type of variable is qualitative or quantitative

2-Table of an association or contingency table

It is used to show a relation between a condition and characteristic e.g. relation

between smoking and lung cancer
a) Two by two table:

i.e. Two columns by two rows

b) cxr table:

Generally for contingency tables other than 2X2 tables

3-Comparing frequency distribution table:

Distribution of two (or more) different groups according to one variable.

N.B.: For comparison the total groups should have the same total frequencies
otherwise calculate the percent of total for each frequency

4-Two way table or two way classification:

One group is classified according to two variables e.g. weight and height or age
and blood pressure to find any correlation between these two variables.



1. The line graph

2. The bar chart

3. The histogram

4. The frequency polygon

5. Pie Chart

1-the line graph :

That we use when we compare info in time

The time variable is a special type of continuous quantitative variable).

2-the bar chart

represent data of the two subtypes of qualitative and quantitative discrete type

3-the histogram

suitable for continuous quantitative variable. It is used only when the table is of a
simple frequency distribution type.

Continous data

4-the frequency polygon

This type is used when the variable is of continuous quantitative type and the
table is of simple or complex type

Each interval in the table is represented by a single point opposite its frequency
on Y axis and opposite the mid-point of the interval on X axis. Then every two
consecutive points are connected by a straight line.

5- Pie Chart

It can be used for all the four types of variables when represented in a simple

The circle is divided into a number of sectors equal to the number of

categories or intervals in the table, usually the division of the circle starts from 12
Oclock and it goes in a clockwise direction.

Each sector is proportional to the frequency of the category. This is decided

by calculating the angle of each sector.

Angle=frequency of category or interval x360

/ total frequency



This is subdivided into two types:

A) Measures of central tendency or averages.

B) Measures of dispersion
A) Measures of central tendency:

These are computed values around which most of the observations tend to
concentrate or allocate.

1. The arithmetic mean,

2. The median and
3. The mode.

1. The arithmetic mean:

Arithmetic mean = Sum of all observations /Number of observations
Computation of arithmetic mean from ungrouped data:

The formula for computation of the arithmetic mean for ungrouped data is:

X=X/ n
Advantages of arithmetic mean:

1. It takes all observations into consideration.

2. It is the best average for quantitative data to be used in statistical analysis.


1. It cannot be used with qualitative variables.

2. It is affected by the extreme observation.
3. Can not be used for open-ended tables
Computation of arithmetic mean from grouped data (weighted mean):

1. Find the mid point (X) for each interval given by:
X=lower limit +upper limit /2

2. Multiply (f) by (X) for each interval.

3. Find the sum of these products ( fX)
4. Find the arithmetic mean given by x= fx / f

2-The median:

The median is the value that lies in the middle of the ordered observations.
Computation of the median from ungrouped data:
A) When n is odd:
The steps are:

1. Observations are ordered according to an ascending or descending magnitude.

2. Determine the rank of the median given by n+ 1/2
3. Using the obtained rank and referring back to the ordered or arranged
observations, and find the value of median.
B) When n is even:
The steps are:

1.Observations are arranged in ascending or descending magnitude.

2. Determine the ranks of the two middle observations given by:
and n/2 , n/2 +1

3. Refer back to ordered observations and using the obtained ranks, we determine
the two middle values.

4. The median = sum of 2mid values /2


1. It can be used with quantitative and qualitative ordinal variables.

2. It can be used in open ended tables.
3. It is useful for summarizing data with extreme values as it is not affected by
extreme values. For example if the length of hospital stay (in days) for five

patients is: 1, 2,4,3,5. The median is 3 days and the mean is 1+2+3+4+5/5= 3 days

While if the length of stay of a patient is 150 days instead of 5, then the median is
still 3 days ,while the mean will be 1+2+3+4+150/5= 32 days.

1. It cannot be used with qualitative nominal variables.

2. It is not easy to be used in statistical analysis

3-The mode:
The mode is the observation which has the highest frequency, or it is the most
frequent observation

Determination of the mode from ungrouped data:

This is done by finding the observation which has the highest frequency.
e.g. Weight of five children as follows: 9, 8, 12, 7, 8 kg.

It is seen that eight is the observation of highest frequency.

The mode = 8 kg.

A similar procedure can be used for finding the mode from qualitative data
Determination of mode from grouped data:
Two methods can be used:

a) The modal interval. This is the interval opposite the highest frequency.
b) The mid-point of modal interval (used only for quantitative data) in this

method the modal interval is determined as before, and the midpoint is obtained
as follows:

= Lower limit of modal interval + upper limit of modal interval / 2


1- It can be used in all types of variables.

2- It is not affected by extremes or out-lying observation.
3- Can be used to determine the average from open ended table.

1- Sometimes the mode cannot be determined, this happens when all

observations have the same frequency.(i.e. uniform distribution)

2- Sometimes we may obtain two modes (bimodal) or more (multimodal) from

the same group of data.

e.g. 22, 24, 26, 28, 24, 26

Mode = 24 & 26

B) Measures of dispersion

1. Range:
It is a simple measure of dispersion and by definition range is difference
between the biggest and smallest observation.

From the above two examples range for first group = 36 28 = 8 years and for
second group = 62 8 = 54 years.

From a table we can calculate the range as follows:

The largest possible observation the lowest possible observation i.e. the
upper limit of last interval lower limit of first interval.

2. Standard Deviation (S):

It is the commonly used measure of dispersion and generally the best. It is defined
as the positive square root of the variance. It measures the deviation of
observations from the arithmetic mean.

N.B.: If n is equal to or more than 30 we divide only by n instead of n-1

Normal Distribution:

The normal distribution also called the Gaussian distribution, is a

theoretical, continuous symmetrical, uni-modal distribution of infinite


The normal distribution curve is bell shaped, with lower and upper tails
and is determined by the mean and the standard deviation of the

The mean, median and mode of a normally distributed population are


The standard normally distributed curve has the following properties:


Standard deviation=1