6 views

Uploaded by tazeb Abebe

bdfhbfghbfdgf

- CB 5 Session 1 -SMEA Processes
- QNT 351 Final Exam Guide
- Research Methods - Curs 2 En
- 12MBA13
- Esead Slides
- Chapter 1 End of Chapter Solutions
- akmal qmt.pdf
- Business Stat
- MB0050 Research Methodology
- Central Tendency
- MK0004
- Reviewer Statistics
- Chapter_1
- Course Outline BMA 2210
- GuideSelectingStatisticalTechniques_OCR.PDF
- productFlyer_978-1-4939-2613-8
- STA301FormulasDefinitions01to45
- The Role of Fuzzy Sets in Decision Sciences -Old Techniques and New Directions
- Engineering Mathematics
- PECS

You are on page 1of 93

1.1. Introduction

1.1.1 Definitions and classification of Statistics

Definition

Statistics can be defined in two senses:

a) Statistics in its plural sense: Statistics refer to numerical facts, or figures or quantitative information that

describes every aspect of social and economic phenomenon. Statistics are the raw data themselves, like

statistics of births, statistics of deaths, statistics of imports and exports, etc.

b) Statistics in its singular sense: Statistics as a branch of scientific method deals with the planning and

design of data collection, organization, presentation, analysis and interpretation and drawing conclusions

based on the data.

Classification

Statistics can be divided in to two broad areas.

1. Descriptive Statistics is concerned with summarizing or describing important features of the available

data without going beyond the data themselves. It is concerned with summary calculations, graphs,

charts and tables.

2. Inferential Statisticsis a method used to generalize from a sample to a population. It induces the use of

data from samples to make inferences about a population from which samples are drawn.

For example, the average income of all families (the population) in Ethiopia can be estimated from

figures obtained from a few hundred (the sample) families.

Statistical techniques based on probability theory are required.

The stages or steps in any statistical investigation are

1. Collection of data: The process of measuring, gathering, assembling the raw data up on which the

statistical investigation is to be based. Data can be collected in a variety of ways. Example, one of the

most common methods is through the use of survey. Survey can also be done in different methods like

questionnaire, interview.

2. Organization of data: Summarization of data in some meaningful way. Organization of data may involve

Editing, coding and classification of the collected data.

1

3. Presentation of the data: In this stage the collected and organized data are presented with some

systematic order to facilitate statistical analysis. The organized data are presented with the help of tables,

diagrams and graphs.

4. Analysis of data:

The process of extracting numerical description of data, mainly through the use of elementary

mathematical operation (like mean, standard deviation,..)

5. Interpretation of data: This involves giving meaning to the analyzed data and draw conclusions.

Statistical techniques based on probability theory are required.

A (statistical) population: is the complete set of possible measurements for which inferences are to be

made. The population represents the target of an investigation, and the objective of the investigation is to

draw conclusions about the population hence we sometimes call it target population.

Examples

Population of trees under specified climatic conditions

Population of animals fed a certain type of diet

Population of farms having a certain type of natural fertility

Population of households, etc

There are two ways of investigation: Census and sample survey.

Census: a complete enumeration of the population. But in most real problems it cannot be realized, hence we

take sample.

Sample: A sample from a population is the set of measurements that are actually collected in the course of

an investigation. It should be selected using some pre-defined sampling technique in such a way that they

represent the population very well. Sample is sub part of the population.

In practice, we don‟t conduct census, instead we conduct sample survey.Parameter: Characteristic or

measure obtained from a population.

Statistic: Characteristic or measure obtained from a sample.

Sampling: The process or method of sample selection from the population.

Sample size: The number of elements or observation to be included in the sample.

2

1.1.4. Applications, Uses and Limitations of statistics.

Applications of statistics:

In almost all fields of human endeavor

Almost all human beings in their daily life are subjected to obtaining numerical facts

Applicable in some process e.g. invention of certain drugs, extent of environmental pollution

In industries especially in quality control area.

Uses of statistics

The main function of statistics is to enlarge our knowledge of complex phenomena. Some uses of statistics:

It presents facts in a definite and precise form.

Data reduction.

Measuring the magnitude of variations in data.

Furnishes a technique of comparison

Estimating unknown population characteristics.

Testing and formulating of hypothesis.

Studying the relationship between two or more variable.

Forecasting future events.

Limitations of statistics

As a science statistics has its own limitations.

Some of the limitations:

Deals with only aggregate of facts and not with individual data items.

Statistical data are only approximately and not mathematical correct.

Statistics can be easily misused and therefore should be used be expert

Variable: It is an attribute or characteristic that can assume different values.

Variable is divided in to two: Qualitative and quantitative variable

1. Qualitative variables are nonnumeric variables and cannot be measured.

2. Quantitative Variables are numerical variables and can be measured. Examples include balance in

checking account, number of children in family.

3

Note that quantitative variables are either discrete or continuous

Discrete variable: It assumes a finite or countable number of possible values. It is usually obtained by

counting.

Example: number of children„s in a family, number of cars at a traffic light

Continuous variable: It can assume any value within the defined range. Continuous variables are usually

obtained by measuring. Example: weight in kg, height, time, air pressure in a tire

Measurement scales

Proper knowledge about the nature and type of data to be dealt with is essential in order to specify and apply

the proper statistical method for their analysis and inferences. Measurement scale refers to the property of

value assigned to the data based on the properties of order, distance and fixed zero.

Order

The property of order exists when an object that has more of the attribute than another object, is given a

bigger number by the rule system.

Distance

The property of distance is concerned with the relationship of differences between objects. If a measurement

system possesses the property of distance it means that the unit of measurement means the same thing

throughout the scale of numbers.. More precisely, an equal difference between two numbers reflects an equal

difference in the "real world" between the objects that were assigned the numbers.

True zero is related to the property of absolute absence of characteristic under consideration.

The property of fixed zero (true zero) is necessary for ratios between numbers to be meaningful.

Scale types

Four levels of measurement scales are commonly distinguished: nominal, ordinal, interval, and ratio and

each possessed different properties of measurement systems.

Nominal Scales

Nominal scales are measurement systems that possess none of the three properties stated above.

Level of measurement which classifies data into mutually exclusive, all inclusive categories in which

no order or ranking can be imposed on the data.

No arithmetic and relational operation can be applied.

4

Examples:

Sex (Male or Female),

Marital status (married, single, widow, divorce)

Country code

Regional differentiation of Ethiopia.

Ordinal Scales

Ordinal Scales are measurement systems that possess the property of order, but not the property of distance.

The property of fixed zero is not important if the property of distance is not satisfied.

Level of measurement which classifies data into categories that can be ranked. Differences between

the ranks do not exist.

Arithmetic operations are not applicable but relational operations are applicable.

Ordering is the sole property of ordinal scale.

Example: Rating scales (Excellent, Very good, Good, Fair, poor), Military status.

Interval Scales

Interval scales are measurement systems that possess the properties of Order and distance, but not the

property of fixed zero.

Level of measurement which classifies data that can be ranked and differences are meaningful.

However, there is no meaningful zero, so ratios are meaningless.

All arithmetic operations except division are applicable.

Relational operations are also possible.

Your score on an individual intelligence test as a measure of your intelligence.

A temperature of 0°C does not mean that there is no temperature. Furthermore, a temperature of 30°C in

town X on a specific day may not be twice as warm as 15°C on another day in the same town.

Ratio Scales

Ratio scales are measurement systems that possess all three properties: order, distance, and fixed zero. The

added power of a fixed zero allows ratios of numbers to be meaningfully interpreted; e.g. the ratio of the first

person‟s height to another person‟s height is 1.32, whereas this is not possible with interval scales.

5

Level of measurement which classifies data that can be ranked, differences are meaningful, and there

is a true zero. True ratios exist between the different units of measure.

All arithmetic and relational operations are applicable.

Examples: Weight, Height, Number of students, Age

Exercises: Classify the following different measurement systems into one of the four types of scales.

1. Your checking account number as a name for your account.

2. Your checking account balance as a measure of the amount of money you have in that account

3. Your score on the first statistics test as a measure of your knowledge of statistic

4. A response to the statement "Abortion is a woman's right" where "Strongly Disagree" = 1, "Disagree" =

2, "No Opinion" = 3, "Agree" = 4, and "Strongly Agree" = 5, as a measure of attitude toward abortion.

5. Times for swimmers to complete a 50-meter race

6. Months of the year Meskerm, Tikimit…

7. Socioeconomic status of a family when classified as low, middle and upper classes.

8. Blood type of individuals, A, B, AB and O.

9. Pollen counts provided as numbers between 1 and 10 where 1 implies there is almost no pollen and 10 that

it is rampant, but for which the values do not represent an actual counts of grains of pollen.

10. Regions numbers of Ethiopia

11. The number of students in a college

12. The net wages of a group of workers

13. The height of the men in a town

1.2.1 Methods of data collection

The statistical data may be classified under two categories, depending upon the sources – (1) Primary data

(2) Secondary data.

Primary Data: are those data, which are collected by the investigator himself for the purpose of a specific

inquiry or study. Such data are original in character and are mostly generated by surveys conducted by

individuals or research institutions.

Secondary Data: When an investigator uses data, which have already been collected by others, such data are

called "Secondary Data".

6

The secondary data can be obtained from journals, reports, government publications, publications of

professionals and research organizations.

According to the role of time, data are classified in to cross-section and time series data. Cross-section data is

a set of observations taken at one point in time, while, time series data is a set of observations collected for a

sequence of times, usually at equal interval which may be on weekly, monthly, quarterly, yearly, etc basis.

Before any statistical work can be done data must be collected. Depending on the type of variable and the

objective of the study different data collection methods can be employed. In the collection of data we have

to be systematic. If data are collected haphazardly, it will be difficult to answer our research questions in a

conclusive way.

• Observation • Using available information

• Interview (Face-to-face/telephone interviews) • Focus group discussions (FGD)

• Questionnaire (mailed and self-administered questionnaire)

• Other data collection techniques – life histories, case studies, etc.

i) Observation – It includes all methods from simple visual observations to the use of high level machines

and measurements, sophisticated equipment or facilities, such as radiographic, X-ray machines, microscope.

An observation guide should be prepared prior to data collection.

Advantages: Gives relatively more detailed, accurate and context related information.

Disadvantages: Investigators or observer‟s own biases, prejudice, desires, and etc. and needs more resources

and skilled human power during the use of high level machines.

ii) Interview

Could be face to face /telephone interview

Advantage:

- suitable for use with illiterates

- permits clarifications of questions

- higher response rate than self-administered questionnaire

Disadvantage:

- presence of interviewer can influence the response

- more costly than self-administered questionnaire

iii) Questionnaire (Mailed and self-administered questionnaire)

7

Questionnaire is list of questions arranged in a predetermined sequence for a predetermined purpose.

Self-administered questionnaires: under this method, the questionnaire is distributed by hand to the

respondents. The use of self-administered questionnaires is simpler and cheaper; such questionnaires can be

administered to many persons simultaneously (e.g. to a class of students).

Mailed Questionnaire Method

- The questionnaires are sent by post to the informants.

Limitations of questionnaire:

The method can be used only if the respondents are educated.

The response rates tend to be relatively low.

Informants may not return the completed questionnaire back and even if they did, they may have

filled them incorrectly.

It may not give the investigator a chance to explain the questions or ask supplementary and follow up

questions.

Types of questions used in a questionnaire

Depending on how questions are asked and recorded we can distinguish two major possibilities - Open –

ended questions, and closed ended questions.

a) Open-ended questions: Open-ended questions permit free responses that should be recorded in the

respondent‟s own words. The respondent is not given any possible answers to choose from. Such questions

are useful to obtain information on:

Facts with which the researcher is not very familiar

Opinions, attitudes, suggestions of informants, or Sensitive issues

b) Closed- ended questions: Closed questions offer a list of possible options or answers from which the

respondents must choose. When designing closed questions one should try to:

Offer a list of options that are exhaustive and mutually exclusive

Keep the number of options as few as possible.

The data collected in a survey is called raw data. In most cases, useful information is not immediately

evident from the mass of unsorted data. Collected data need to be organized in such a way as to condense the

information they contain in a way that will show patterns of variation clearly. Precise methods of analysis

can be decided up on only when the characteristics of the data are understood. For the primary objective of

this different techniques of data organization and presentation like order array, tables and diagrams are used.

8

Statistical Tables

A statistical table is an orderly and systematic presentation of data in rows and columns. Rows are horizontal

and columns are vertical arrangements. The use of tables for organizing, for example qualitative data,

involves grouping the data into mutually exclusive categories of the variables and counting the number of

occurrences (frequency) to each category.

The simple frequency table is used when the individual observations involve only to a single variable

whereas the cross tabulation is used to obtain the frequency distribution of one variable by the subset of

another variable.

Examples:

Simple or one-way table

Table 1: Immunization status of 210 children in a certain Woreda

Immunization status number of children percent (%)

Not immunized 75 35.7

Partially immunized 57 27.1

Fully immunized 78 37.2

Two-way table: This table shows two characteristics and is formed when either the row or the column is

divided into two or more parts.

Table 2: Immunization status by marital status of the women of childbearing age in a town.

Immunization Status

Marital Status Immunized Non Immunized Total

Married 156 294 450

Divorce 10 18 28

Widowed 7 7 14

Frequency distributions

For data to be more easily appreciated and to draw quick comparisons, it is often useful to arrange the data in

the form of a table, or in one of a number of different graphical forms.

Frequency: is the number of times a certain value of the variables is repeated in the given data. It is the

number of observations belonging to a given value or a group.

9

Frequency distribution: is a table which contains the values and the corresponding frequencies. From the

definition, a frequency distribution has two parts, namely- the values of the variables on the one hand and the

number of observations (frequency) corresponding to the values of the variables on the other.

Array (ordered array):is a serial arrangement of numerical data in an ascending or descending order.

There are two types of frequency distributions categorical (qualitative) and numerical (quantitative).

1. Categorical frequency distribution: Here data are classified according to non-numerical categories.

To construct a categorical frequency distribution, the categories contained in the frequency

distribution must be mutually exclusive and exhaustive. In other words, an element must be counted

in one and only one category.

Example: Seniors of a high school were interviewed on their plan after completing high school. The

following data give plans of 548 seniors of a high school.

SENIORS’ PLAN NUMBER OF SENIORS

Plan to attend college 240

May attend college 146

Plan to or may attend a vocational school 57

Will not attend any school 105

Total 548

numerical size. Numerical frequency distributions are either discrete or continuous according to

whether the variable is discrete or continuous.

Example: 10,392 persons were surveyed by a social scientist who wants to study the age of persons arrested

in a country. We can construct a continuous frequency distribution for this data, since age is a continuous

variable. In connection with large sets of data, a good overall picture and sufficient information can often be

conveyed by grouping the data into a number of class intervals as shown below.

10

Age (years) Number of persons

Under 18 1,748

18 – 24 3,325

25 – 34 3,149

35 – 44 1,323

45 – 54 512

55 and over 335

Total 10,392

This kind of frequency distribution is called grouped frequency distribution. Frequency distributions present

data in a relatively compact form, gives a good overall picture, and contain information that is adequate for

many purposes, but there are usually some things which can be determined only from the original data. For

instance, the above grouped frequency distribution cannot tell how many of the arrested persons are 19 years

old, or how many are over 62.

Class frequency (f): refers to the numbers of observations belonging to a class.

Class limit: are the lowest (called lower class limit-LCL) and highest (called upper class limit-UCL) values

that can be included in a class.

Units of measurement (U): the distance between two possible consecutive measures. It is usually taken as 1,

0.1, 0.01, 0.001, -----.

Class boundaries: are the values that fall half way between the class limits of adjacent classes. The

boundaries have one more decimal places than the row data and therefore do not appear in the data . Each

class has a lower boundary (LCB) and an upper class boundary (UCB).

Then UCB = UCL + ½*U and LCB = LCL – ½*U.

Class mark (class midpoint-mi): is the value located half way between the lower and upper class limits of

that class. The class mark of the ith class is denoted by mi is,

1

mi = * (LCL + UCL) = ½*(LCB + UCB).

2

Class width (class size-w): is the difference between the upper and lower class boundaries of the class, that

is, w = UCB – LCB. It is also the difference between the lower limits of any two consecutive classes or the

difference between any two consecutive class marks.

11

Cumulative frequencies: when frequencies of two or more classes are added up, such total frequencies are

called Cumulative Frequencies. This frequencies help as to find the total number of items whose values are

less than or greater than some value.

More than cumulative frequency: it is the total frequency of all values greater than or equal to the lower

class boundary of a given class.

Less than Cumulative frequency: it is the total frequency of all values less than or equal to the upper class

boundary of a given class.

Relative frequency: it is the frequency of each value or class divided by the total frequency

Determine the number of classes to use, preferably between 5 and 20. It is possible to take the

approximate number of classes (K) can be the Sturge‟s Formula, given by:

K = 1 + 3.322×log(n),where n is the number of observations.

Determine the class size (class width) as:

W = (Maximum value – Minimum value)/K = Range/K.

Pick a suitable starting point less than or equal to the minimum value. The starting point is called the

lower limit of the first class. Continue to add the class width to this lower limit to get the rest of the

lower limits.

To find the upper limit of the first class, subtract U from the lower limit of the second class. Then

continue to add the class width to this upper limit to find the rest of the upper limits.

Find the boundaries by subtracting U/2 units from the lower limits and adding U/2 units from the

upper limits.

Find the frequency and relative frequency of each class.

Example: Construct a grouped frequency distribution of the following data on the amount of time (in hours)

that 80 college students devoted to leisure activities during a typical school week:

23 24 18 14 20 24 24 26 23 21 16 15 19 20 22 14 13 20 19 27 29 22 38 28 34 44

23 19 21 31 16 28 19 18 12 27 15 21 25 16 30 17 22 29 29 18 25 20 16 11 17 12

15 24 25 21 22 17 18 15 21 20 23 18 17 15 16 26 23 22 11 16 18 20 23 19 17 15

20 10

12

Solution:

Using the above formula: K = 1 + 3.322 × log (80) = 7.32 ≈ 7 classes, Maximum value = 44 and Minimum

value = 10. Range = 44 – 10 =34 and class width, W = 35/7 = 4.857 ~ =5.

Let 10 be the lower limit of the first class. That is LCL1 = 10, LCL2 =10+W= 10+5=15, etc.

10, 15, 20, 25, 30, 35, and 40 are lower class limits.

Find the upper class limit; e.g. the first upper class limit (UCL1)=15-U=15-1=14,

UCL2 =14+W=14+5 = 19, etc.

14, 19, 24, 29, 34, 39, 44 are the upper class limits.

Time spent (hours) Frequency

10 – 14 8

15 – 19 28

20 – 24 27

25 – 29 12

30 – 34 3

35 – 39 1

40 – 44 1

The class boundaries are calculated by: UCB = UCL + ½*U and LCB = LCL – ½*U.

Example: consider the above example and determine the class boundaries.

UCB1 = UCL1 + ½*(U=1)=14 +1/2 = 14.5 and LCB1 = LCL1 - ½*(U=1) =10 - 1/2 = 9.5 etc.

The class marks are also calculated as: m1 = ½*(UCL1 +LCL1) = ½*(UCB1 + LCB1) = 12.

m2 = ½*(UCL2 +LCL2) = 17, etc.

So, the complete frequency distribution table with cumulative frequencies is as follows.

So, the complete frequency distribution table with cumulative frequencies is as follows.

Class class class mark frequency relative less than cumulative greater

limit boundary (mi) (fi) frequency frequency than cf

10 – 14 9.5 – 14.5 12 8 0.1 8 80

15 – 19 14.5 – 19.5 17 28 0.35 36 72

20 – 24 19.5 – 24.5 22 27 0.3375 63 44

25 – 29 24.5 – 29.5 27 12 0.15 75 17

30 – 34 29.5 – 34.5 32 3 0.0375 78 5

35 – 39 34.5 – 39.5 37 1 0.0125 79 2

40–44 39.5 – 44.5 42 1 0.0125 80 1

13

Diagrammatic and graphical presentation of Data

Appropriately drawn graph or diagram allows readers to obtain rapidly an overall grasp of the data presented.

The relationship between numbers of various magnitudes can usually be seen more quickly and easily from a

graph or diagram than from a table.

Bar charts and pie chart are commonly used diagrammatic presentation for qualitative data

Histograms, frequency polygons and ogive curve are graphical presentation of quantitative

continuous data.

Type of Diagrams

1) Bar Chart:

There are different types of bar charts, the most important ones are simple bar chart, component bar chart

and multiple bar chat.

a) Simple bar chart: It is a one-dimensional chart in which the bar represents the whole of the

magnitude. The height or length of each bar indicates the size (frequency) of the figure represented.

Consider the data on immunization status of children (Table 1)

90

78

80 75

70

60 57

50

40

30

20

10

0

not immunized partially immunized fully immunized

Immunization status

Fig.1 Immunization status

b) Component Bar chart: Bars are sub-divided into component parts of the figure. These sorts of

diagrams are constructed when each total is built up from two or more component figures. This is

done by dividing the bars into parts representing the components and shading them accordingly.

14

Consider the data on immunization status of women by marital status (table 2)

500

400

300 294

immunized

200

non immunized

177

100

156

58 18 7

0 10

single married divorced widowed

Marital status

Fig. 2. Immunization status by marital status of women 15-49 years

c) Multiple bar charts: In this type of chart the component figures are shown as separate bars

adjoining each other. The height of each bar represents the actual frequency of the component figure.

It depicts distributional pattern of more than one variable and comparisons of each component are

desired.

Example of multiple bar chart: consider that data on immunization status of women by marital status.

350

294

300

250

200 177

156 immunized

150

non immunized

100

58

50

10 18

7 7

0

single married divorced widowed

Marital status

Fig. 3. Immunization status by marital status of women 15-49 years

15

2) Pie-chart: it is a circle representing a categorical data by dividing the circle into different sectors of angle

in proportion of 360o to the amount associated to each category. The proportion of the category can express

either by percentages or by angles.

That is degree of central angle of a category = (amount of the category / total amount)* 360 o.The proportion

of a category = (frequency of a category / total frequency)* 100%.

FI NI

37% 36%

NI

PI

FI

PI

27%

Type of Graphs

The following are the most commonly used graphical presentations of data.

1) Histograms: A histogram is the graph of the frequency distribution of continuous measurement variables.

It is constructed on the basis of the following principles:

a) The horizontal axis is a continuous scale running from one extreme end of the distribution to the other. It

should be labeled with the name of the variable and the units of measurement.

b) For each class in the distribution a vertical rectangle is drawn with (i) its base on the horizontal axis

extending from one class boundary of the class to the other class boundary, there will never be any gap

between the histogram rectangles. (ii) the bases of all rectangles will be determined by the width of the

class intervals. If a distribution with unequal class-interval is to be presented by means of a histogram, it

is necessary to make adjustment for varying magnitudes of the class intervals.

Example: Consider the data on time (in hours) that 80 college students devoted to leisure activities during a

typical school week. Draw the histogram

2) Frequency Polygon: If we join the midpoints of the tops of the adjacent rectangles of the histogram with

line segments a frequency polygon is obtained. When the polygon is continued to the X-axis just outside the

range of the lengths the total area under the polygon will be equal to the total area under the histogram.

16

Example: Consider the above data on time spend on leisure activities.

30

28 27

25

20

15

12

10

8

5

3

0 1 1

0 5 10 15 20 25 30 35 40 45

Fig 5: Frequency polygon curve on time spent for leisure activities by students

3) Ogive or Cumulative Frequency Curve: When the cumulative frequencies of a distribution are graphed

the resulting curve is called Ogive Curve. Ogive are of two types, namely, “Less than” Ogive and “more

than” Ogive.

Less than Ogive: in this case the “less than” cumulative frequencies are plotted against upper class

boundaries of their respective classes and they are joined by lines adjacently.

More than Ogive: in this case, more than cumulative frequencies which are scaled on the Y- axis plotted

against the lower class boundary of their respective classes which are scaled on the X- axis are joined by

lines adjacently.

Example: Consider the above data on time spend on leisure activities.

90

80 80 78 79 80

75

70 72

63

60

50

44 Less than Ogive

40

36 More than Ogive

30

20

17

10 8

5

0 0 2 1 0

9.5 14.5 19.5 24.5 29.5 34.5 39.5 44.5

Fig 7: Cumulative frequency curve for amount of time college students devoted to leisure activities

17

2. SUMMARIZING OF DATA

2.1. MEASURES OF CENTERAL TENDENCY

When we want to make comparison between groups of numbers it is good to have a single value that is

considered to be a good representative of each group. This single value is called the average of the group.

Averages are also called measures of central tendency.

Objectives

Since the number of sample points is frequently large and it is easy to lose track of the overall picture by

looking at all the data at once, the data must be summarized as briefly as possible.

Some objectives of measuring central tendency:

To comprehend (understand) the data easily.

To facilitate comparison.

To make further statistical analysis.

Let X1, X2, X3, …,Xnbe a number of measurements where n is the total number of observation and Xi is

,

th

i observation.

n

The symbol X

i 1

i (read as “the sum of Xi where i runs from 1 to n”) is mathematical shorthand for

n

X1+X2+X3+...+Xn . That is X

i 1

i = X1+X2+…+Xn

Example: Suppose the following were scores made on the first homework assignment for five students in the

class: 5, 7, 7, 6, and 8.

5

X

i 1

i = X1+X2+ X3 + X4+ X5 = 5 + 7+7+6+8=33

Properties of Summation

n

i 1

n n

i 1 i 1

n n

(a bX ) na b X

i 1

i

i 1

i , a and b are constants.

n n n

( X i Yi ) X i Yi

i 1 i 1 i 1

18

Example: Consider the following data and determine

Xi 5 7 7 6 8

Yi 6 7 8 7 8

5 5

a) X i =5+7+7+6+8=33

i 1

e) (X

i 1

i Yi ) 3

5 5

b) Yi 36

i 1

f) X Y

i 1

i i =241

5 5

c) 10 10 * 5 50 g)

i 1

X

i 1

i

2

223

5 5 5 5 5

d) ( X i Yi )

i 1

X i + Yi =69

i 1 i 1

h) ( X i )( Yi ) = 1188

i 1 i 1

The different measures of central tendency are the Mean (Arithmetic, Geometric and Harmonic), the Mode,

the Median.

It is defined as the sum of the magnitude of the items divided by the number of items.

Suppose X1, X2, X3, …,Xn are n observed values in a sample of size n, then thearithmetic mean of the

sample, denoted by X is given as:

n

X 1 + X 2+ …+X n i =1 X i

X= = .

n n

N

X 1 + X 2+ …+X N i=1 X i

𝜇= = , where N stands for the total number of observations in the population.

N N

Example: Suppose the sample consists of birth weights (in grams) of live born infants at a private hospital in

a certain city during a 1-week period. These sample birth weights are:

3265, 3323, 2581, 2759, 3260, 3649, 2841, 3248, 3245, 3200, 3609, 3314, 3484,

3031, 2838, 3101, 4146, 2069, 3541, 2834.

Then find arithmetic mean for the sample birth weights.

1 1 63338

Solution:X=20 Xi = (3265 + 3260 + ….+ 2834) = = 3166.9 gram.

20 20

19

If X is a variable having values X1, X2,…,Xk occurring with frequencies of f1, f2,…, fk respectively, then its

arithmetic mean is given by:

k

X 1f 1 + X 2f 2 + …+X k f k i =1 X if i

X= = k f .

f 1 +f 2 +⋯+f k i=1 i

Example: Suppose the X values are 3, 5, 4, 2, 7 and 6 with corresponding frequencies of 2, 1, 3, 2, 1 and 1

respectively. Then fine the mean for data.

Xi 3 5 4 2 7 6

frequency, fi 2 1 3 2 1 1

Solution:X= = 10 = 4.

2+⋯+1

This method is applicable where the entire range of observations has been grouped into a continuous

frequency distribution. In such cases the mean of the distribution is computed as:

k

i=1 m if i

X= k f , where

i=1 i

k is number of classes,

mi is the midpoint of the ith class and

fi is the ith class frequency.

Example: Calculate the mean for grouped data on the amount of time (in hours) that 80 college students

devoted to leisure activities during a typical school week given below:

Time spent (hours) Frequency

10 – 14 8

15 – 19 28

20 – 24 27

25 – 29 12

30 – 34 3

35 – 39 1

40 - 44 1

Solution:

First find the class marks (midpoints)

Find the product of frequency and class marks

20

Find mean using the formula.

The class marks of the distribution are: 12, 17, 22, 27, 32, 37, 42.

Then the mean of the data is computed as:

7

i=1 m if i 12∗8+17∗28+⋯+42∗1 1655

X= 7 f = = = 20.7 hours.

i=1 i 8+28+⋯+1 80

2) The sum of the squares of deviations from the arithmetic mean is less than the sum of squared of

deviations about any other value in the data set,

2 2

i. e. Xi − X Xi − A . A X

3) If we have means X1 , X 2 , X 3 , …, X k of k groups having the same unit of measurements of a

variable, based on n1, n2, n3, …, nk observations respectively. Then the mean of all the observation in

all groups often called the combined mean is given by

n1 X 1 n2 X 2 ... nk X k

Xc =

n1 n2 ... nk

Example: If the mean final exam mark of one class of 50 students is 30 and the mean of marks of another

class of 100 students in the same final exam is 40. What is the mean mark of all 150 students?

50 * 30 100 * 40

Solution: X c 36.7 (50*30 + 100*40)/(50 + 100) =36.7.

50 100

4) If a wrong figure has been used when calculating the mean, then the correct mean can be obtained

without repeating the whole process using:

correct value wrong value

Correct mean = wrong mean +

n

Where n= number of observations

Example: An average weight of 10 students was calculated to be 65. Later it was discovered that one weight

was misread as 40 instead of 80 k.g.

Calculate the correct average weight.

80 40

Correct mean = 65+ = 65+4 = 69

10

21

5) The effect of transforming original series on the mean.

a) If a constant k is added to / subtracted from/ every observation then the new mean will be the

old mean ± k respectively.

b) If every observations are multiplied by a constant k then the new mean will be k*old mean.

Example: The mean of a set of numbers is 500.

a. If 10 is added to each of the numbers in the set, then what will be the mean of the new set?

New mean = 500+10 =510

b. If each of the numbers in the set are multiplied by -5, then what will be the mean of the new set?

New mean = -5*500= -2500

Example: The mean of n observations X , X , …,X are known to be 12 . New set of another

1 2 n

observations are obtained by the linear transformation Y = 2X – 0.5 ( i = 1, 2, …, n ) then what will be

i i

Solutions: New Mean = 2* Old Mean – 0.5 = 2*12 – 0.5 = 23.5.

It is based on all values

It is easy to calculate and simple to understand

It is suitable for further mathematical treatment.

It is stable average, i.e. it is not affected by fluctuations of sampling to some extent.

It is affected by extreme observations.

It cannot be used in the case of open end classes.

It cannot be determined by the method of inspection.

It cannot be used when dealing with qualitative characteristics, such as intelligence, honesty, beauty.

Sometimes it leads to wrong conclusion if the details of the data from which it is obtained are not

available.

Weighted Mean

In computation of arithmetic mean we had given equal importance to each observation. While, when

averaging quantities, it is often necessary to account for the fact that not all of them are equally important in

the phenomenon being described. In order to give quantities being averaged their proper degree of

22

importance, it is necessary to assign them relative importance called weights, and then calculate a weighted

mean.

In general, the weighted mean Xw of a set of values X1, X2, …,Xn, whose relative importance is expressed

numerically by a corresponding set of weights W1, W2, … Wn, is given by:

n

X 1W 1 + X 2W 2 + …+X n W n i=1 X iW i

Xw = = n W .

W 1 +W 2 +⋯+W n i =1 i

Example: A student obtained results 60, 75, 63, 59, and 55 in English, Biology, Mathematics, Physics and

Chemistry examinations respectively. Find the students weighted arithmetic mean if weights 1, 2, 1, 3, 3

respectively are allotted to the subjects.

Solution: X w = (60*1 +75*2 + 63*1 + 59*3 + 55*3)/ (1+2+1+3+3) = 615/10 = 61.5.

The mode

The mode is the value of the observation that occurs with the greatest frequency. A particular disadvantage is

that, with a small number of observations, there may be no mode. In addition, sometimes, there may be more

than one mode such as when dealing with a bimodal (two-peak) distribution. .

Example: Find the modal values for the following data:

(a) 1.8, 3.0, 3.3, 2.8, 2.9, 3.6, 3.0, 1.9, 3.2, 3.5 (modal value = 3.0 kg).

(b) 10, 10, 9, 9, 8, 12, 15, 5 (modal value = 9 and 10). Hence, it is possible for a frequency distribution to

have more than one mode.

Note: Distributions with one mode are called unimodal, those with two modes are called bimodal, and

those with more than two modes are called multimodal.

To find the Modal value for grouped (continuous) frequency distribution, first find the modal class which is

the class with the highest frequency. Then to compute the modal value for grouped data, we use the formula:

∆1

Mode = Lmo + * w , where

∆1 + ∆2

Lmo = Lower class boundary of the modal class;

w = the class width of the modal class;

∆1 = fmo − f1 ;

∆2 = fmo − f3 ;

23

fmo = frequency of the modal class

f1 = frequencyoftheclassimmediatelyprecidingthemodalclass;

f3 = frequency of the class immediately succeeding the modal class.

Note: The modal class is a class with the highest frequency.

Example: Consider the following grouped quantitative data. Calculate the modal value of the data.

6 – 11 5.5 – 11.5 2

12 – 17 11.5 – 17.5 2

18 – 23 17.5 – 23.5 7

24 – 29 23.5 – 29.5 4

30 – 35 29.5 – 35.5 3

36 – 41 35.5 – 41.5 2

Lmo = 17.5, w =6, ∆1 = fmo − f1 = 7 – 2 = 5; ∆2 = fmo − f3 = 7- 4 =3

∆1

Mode = Lmo + *w

∆1 + ∆2

5

= 17.5+ 6

5 3

=21.25

The Median

An alternative measure of location, perhaps second in popularity to the arithmetic mean, is the median. In a

distribution, median is the value of the variable which divides it in to two equal halves. In an ordered series

of data median is an observation lying exactly in the middle of the series. It is the middle most value in the

sense that the number of values less than the median is equal to the number of values greater than it.

Suppose there are n observations in a sample and if these observations are ordered from smallest to largest,

then the sample median foe ungrouped data is defined as:

n + 1 th

(1) The observations if n is odd

2

n th n th

(2) The average of the and + 1 observations if n is even.

2 2

24

Example: Find the median of the following numbers.

(a) 6, 2, 8, 9, 4 (b) 5, 2, 1, 8, 3,7, 8, 9.

Solution: a) ascending ordered data: 2, 4, 6, 8, 9 (n=5)

5 1

th

rd

2

b) Ascending order: 1, 2, 3, 5, 7, 8, 8, 9 (n=8)

4 rd 5th 5 7

Median = =6

2 2

Median for Grouped Data

For a grouped (continuous) frequency distribution, median is calculated as:

n

−cf

2

Median = L + ∗ w , where

f

L = lower class boundary of the median class

w = length of the interval

n = total frequency of the sample

cf = Cumulative frequency preceding the median class.

f = Frequency of that interval containing the median.

The median class is the class with the smallest cumulative frequency (less than type) greater than or equal to

n

2

40 – 44 7 7

45 – 49 10 17

50 – 54 22 39

55 – 59 15 54

60 – 64 12 66

65 – 69 6 72

70 – 74 3 75

25

n 75

37.5

2 2

39 is the first cumulative frequency to be greater than or equal to 37.5.

Therefore, 50 – 54 is the median class. L = 49.5, n=75, w = 5, cf =17, f = 22

n

−cf

2

Hence, Median = L + ∗w

f

(37.5 17)5

= 49.5+ = 54.16

22

Note:

Median is a positional average and hence not influenced by extreme observations.

Median can be calculated in the case of open end intervals.

Median can be located even if the data are incomplete.

When a distribution is arranged in order of magnitude of items, the median is the value of the middle term.

Their measures that depend up on their positions in distribution quartiles, deciles, and percentiles are

collectively called quantiles.

Quartiles: Quartiles are measures that divide the frequency distribution in to four equal parts. The value of

the variables corresponding to these divisions are denoted Q , Q , and Q often called the first, the second

1 2 3

Q is a value in which 25% items are less than or equal to it. Q has 50% items with value less than or equal

1 2

to it and Q has 75% items whose values are less than or equal to it.

3

k(n + 1)th

The kth quartile Qk for ungrouped data is the value of the item which is the position,

4

where k =1, 2, 3 and n is the total number of observations.

The computation of three quartiles for a grouped data can be done as follows:

kn kn

Calculate and search for the minimum cumulative frequency which is greater than or equal to ,

4 4

k=1, 2, 3.

The class corresponding to this cumulative frequency is the kthquartile class. This is the class where

Qk lies.

26

kn

w ( 4 −cf)

Thus, Qk = L + , k =1, 2, 3, where

f

L = lower class boundary of the kth quartile class

n= the total number of observations

cf = the less than cumulative frequency corresponding to the class immediately preceding the k th

quartile class

w= the class width of the quartile class and

f= frequency of the kth quartile class

Deciles: Deciles are measures that divide the frequency distribution in to ten equal parts. The values of the

variables corresponding to these divisions are denoted D , D ,.. D often called the first, the second,…, the

1 2 9

kn

To find Dk(i=1, 2,..9) we count of the classes beginning from the lowest class.

10

kn

w (10 −cf)

Dk = L + , k =1, 2, 3…9, where

f

L = lower class boundary of the kthdeciles class

n= the total number of observations

cf = the less than cumulative frequency corresponding to the class immediately preceding the

kthdeciles class

w= the class width of the deciles class

f = frequency of the kthdeciles class

Percentiles: Percentiles are measures that divide the frequency distribution in to hundred equal parts. The

values of the variables corresponding to these divisions are denoted P , P ,.. P often called the first, the

1 2 99

kn

To find P (i=1, 2,..99) we count of the classes beginning from the lowest class.

i 100

For grouped data we have the following formula:

kn

w (100 −cf)

Pk = L + , k =1, 2, 3…99, where

f

L = lower class boundary of the kth percentiles class

n= the total number of observations

27

cf = the less than cumulative frequency corresponding to the class immediately preceding the k th

percentiles class

w= the class width of the percentiles class

f = frequency of the kth percentiles class

Note: To compute quantiles, we first sort the data in ascending order.

Q2 = D5 = P50 = median, P25 = Q1, P75 = Q3, and Di = Pi*10,i=1, 2, 3,…9.

Example: Considering the following distribution

Calculate: a) All quartiles b) The 7thdecile c) The 90th percentile.

Class limit Frequency Cumulative freq.(less than type)

141 – 150 17 17

151 – 160 29 46

161 – 170 42 88

171 – 180 72 160

181 – 190 84 244

191 – 200 107 351

201 – 210 49 400

211 – 220 34 434

221 – 230 31 465

231 – 240 16 481

241 – 250 12 493

Solution a) quartiles

Q1: Determine the class containing the first quartile.

n

123.25 . Hence, 171- 180 is the class containing the first quartile.

4

L =170.5, n =493, w= 10, cf = 88, f= 72

10(123.25 88)

kn

w ( −cf)

4

Q1 = L + = 170.5+ = 174.43

f 72

2n

246.5 . Hence, 191- 200 is the class containing the second quartile.

4

L =190.5, n =493, w= 10, cf =244 , f= 107

10(246.5 244)

2n

w ( −cf)

4

Q2 = L + = 190.5+ = 190.73

f 107

28

3n

369.75 . Hence, 201- 210 is the class containing the third quartile.

4

L =200.5, n =493, w= 10, cf = 351 , f= 49

10(369.75 351)

3n

w ( −cf)

4

Q3 = L + = 200.5+ = 204.33

f 49

b) D7: Determine the class containing the 7thdecile.

7n

345.1 . Hence, 191- 200 is the class containing the seventh decile.

10

L =190.5, n =493, w= 10, cf = 244 , f= 107

10(345.1 244)

7n

w ( −cf)

10

D7= L + = 190.5+ = 199.95

f 107

c) P90: Determine the class containing the 90th percentile.

90n

443.7 . Hence, 221- 230 is the class containing 90thpercentile.

100

L =220.5, n =493, w= 10, cf = 434 , f= 31

10(443.7 434)

90n

w( −cf)

100

P90= L + = 220.5+ = 223.63

f 31

29

2.2. Measures of variation (dispersion)

Introduction

The measure of central tendency helps us in describing a set of data by a single number or typical value.

However, they do not provide us any information about the extent to which the values differ from one

another or from the average value. Hence, to increase our understanding of the pattern of a data, we must

also measure its dispersion- indicates the degree to which the numerical data tend to spread or variability

about an average value. The scatter or spread of items of a distribution is known as dispersion or variation.

The measures of dispersion also enable us to compare several samples with similar averages.

Consider the following data sets:

Set 1: 60 40 30 50 60 40 70 50

Set 2: 50 49 49 51 48 50 53 50

Set 3: 50 50 50 50 50 50 50 50

The three data sets have a mean of 50, but obviously set 1 is more “spread out” than set 2 and set 3 has no

variability.

Objectives

The general object of measuring dispersion is to obtain a single summary figure which adequately exhibits

whether the distribution is compact or spread out.

• To judge the reliability of measures of central tendency

• To control variability itself.

• To compare two or more groups of numbers in terms of their variability.

• To make further statistical analysis.

The measures of dispersion which are expressed in terms of the original unit of a series are termed as

absolute measures. Such measures are not suitable for comparing the variability of two distributions which

are expressed in different units of measurement and different average size. Relative measures of dispersions

are a ratio or percentage of a measure of absolute dispersion to an appropriate measure of central tendency

and are thus pure numbers independent of the units of measurement. For comparing the variability of two

distributions (even if they are not measured in the same unit), we compute the relative measure of dispersion

instead of absolute measures of dispersion.

30

Types of Measures of Dispersion

It is useful for comparing variation in two or more distributions where units of measurements are the same.

Various measures of dispersions are in use. The most commonly used measures of dispersions are:

1) Range and Relative Range

2) Quartile Deviation and Coefficient of Quartile Deviation

3) Mean Deviation and Coefficient of Mean Deviation

4) Standard Deviation and Coefficient of Variation.

The range is the largest value minus the smallest value in a data set. The range is greatly affected by extreme

values. Range = largest value – smallest value.

The following two distributions have the same range, 13, yet appear to differ greatly in the amount of

variability.

Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45

Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45

For this reason, among others, the range is not the most important measure of variability.

Merits:

• It is rigidly defined.

• It is easy to calculate and simple to understand.

Demerits:

• It is not based on all observation.

• It is highly affected by extreme observations.

• It is affected by fluctuation in sampling.

• It cannot be computed in the case of open end distribution.

• It is very sensitive to the size of the sample.

Relative Range (RR)

It is also sometimes called coefficient of range and given by:

Highest value lowest value

RR =

Highest value lowest value

Example:

1. Find the relative range of the above two distribution. (Exercise!)

31

2. If the range and relative range of a series are 4 and 0.25 respectively. Then what is the value of:

a) Smallest observation (Ans. 6)

b) Largest observation (Ans. 10)

The inter quartile range is the difference between the third and the first quartiles of a set of items.

IQR = Q3 – Q1, and semi-inter quartile range is half of the inter quartile range.

Q3 − Q1

Q.D = 2

Coefficient of Quartile Deviation (C.Q.D)

Q3 − Q1

2

Q3 − Q1

C.Q.D = Q =

3 + Q1 Q3 + Q1

2

Remark: Q.D or C.Q.D includes only the middle 50% of the observation.

The mean deviation of a set of items is defined as the arithmetic mean of the values of the absolute

deviations from a given average. Depending up on the type of averages used we have different mean

deviations.

Mean Deviation about the mean for a data set x1, x2, …, xn

n

x i X

MD i 1

,

n

For the case of a frequency distribution data where the values X1, X2, X3, …,Xk occur f1, f2, f3, …, fk times

k

f

i 1

i Xi X

k

f

i 1

i

If the data is given in the form of frequency distribution of k-classes in which mi and fi are the class marks

and frequency of the ith class respectively then the mean deviation is given by:

k

f

i 1

i mi X

MD = k

f

i 1

i

32

Steps to calculate M.D:

1. Find the arithmetic mean,

2. Find the deviations of each reading from X and

3. Find the arithmetic mean of the deviations, ignoring sign.

Xi 10 8 9 7 6

fi 8 9 13 6 3

Solution: first find the mean as = = (10*8 + 8*9 +…+6*3)/(8+9+…+3) = 8.4, then

Xi 10 8 9 7 6

fi 8 9 13 6 3

f i

8 9 13 6 3

Interpretation: each value deviates on average 1.02 from the arithmetic mean, 8.4.

Coefficient of Mean Deviation (C.M.D)

mean

The variance

The variance is the "average squared deviation from the mean" and it measures the average of the square of

the deviations from the mean for each observations.

Suppose we have population of N observations, say X1, X2, X3, …, XN, then we define the population

variance as:

N N

X i X N 2

2 2

i

2 i 1

i 1

N N

But most of the time we have sample of n observations, say X1, X2, X3, …, Xn from the population of N, then

we define the sample variance as:

33

2

n

X X

n n n

X n X i X i

2

nX

2 2 2

i i

S

2 i 1

,or S 2 i 1

,or S 2 i 1 i 1

n 1 n 1 n(n 1)

This measure of variation is universally used to show the scatter of the individual measurements around the

mean of all the measurements in a given distribution. But the disadvantage is that the units of variance are

the square of the units of the original observations. The easiest way for this difficulty is to use the square root

of the variance as a measure of variability called the standard deviation.

Standard deviation

The population and the sample standard deviations denoted by σ and S respectively are defined as:

N 2

x i

i 1

, where is the popuplatio n mean

N

n

(x i X )2

S i 1

where X is the sample mean

n 1

For the case of frequency distribution data the population and sample variance are given as:

f (x i i )2

2

N

, where N= f i

f (x i i X )2

S2

n 1

,where n = f i

The sample variance for a grouped frequency distribution is given by

f (m i i X )2

S2

n 1

, where n = f i , mi = midpoint of ith class

Example: Areas of spray able surfaces with DDT from a sample of 15 houses are as follows (m2): 101, 105,

110, 114, 115, 124, 125, 125, 130, 133, 135, 136, 137, 140, 145. Find the variance and standard deviation..

Solution: The mean of the sample is 125 ( X 125) , then

34

X X

n

2

i

(101 125) 2 (105 125) 2 ... (145 125) 2

S2 i 1

= 178.71

n 1 14

Hence, the standard deviation = S = 178.71 = 13.37.

Examples: Find the variance and standard deviation of the following grouped sample data

Class Frequency

40-44 7

45-49 10

50-54 22

55-59 15

60-64 12

65-69 6

70-74 3

Sample mean, = 55, n=75

mi(midpoint) 42 47 52 57 62 67 72 Total

fi(mi- 2

) 1183 640 198 60 588 864 867 4400

f (m i i X )2

4400

Then S 2 = = 59.46

n 1 74

and S = 59.46 = 7.71

Note:

If the standard deviation of X1, X2, ….., Xn is S, then the standard deviation of

a) X1+ k, X2+k, …, Xn+k will also be S (where k =constant)

b) kX1, kX2, …, kXn will be |k|S.

c) c+kX1, c+kX2, …,a+ kXn will be |k|S ( c and k are constants)

Example1: The standard deviation of n observations X1, X2, ...., Xn is known to be 3. New set of

bservations are obtained by the linear transformation Yi = 2Xi– 0.5 ( i = 1, 2, …, n ), then what will be the

standard deviation of the new set of observations.

Solution: new standard deviation = |k|S = 2*3 =6

Example 2: The mean and the standard deviation of a set of numbers are respectively 500 and 10.

a) If 10 is added to each of the numbers in the set, then what will be the variance and standard deviation

of the new set?

b) If each of the numbers in the set are multiplied by -5, then what will be the variance and standard

deviation of the new set?

35

Solutions: a) The variance and standard deviation will remain the same.

b) New standard deviation= |k|S =5*10 =50

The coefficient of variation (CV) is defined by

s tan darddeviation

CV= *100%

mean

S

CV= *100%.

X

The coefficient of variation is most useful in comparing the variability of several different samples, each

with different means. This is because a higher variability is usually expected when the mean increases, and

the CV is a measure that accounts for this variability.

CV is a relative measure free from unit of measurement.

Examples: An analysis of the weekly wages paid (in Birr) to workers in two firms A and B belonging to the

same industry gives the following results.

In which firm the wages is more variable?

Value Firm A Firm B

Mean wage 56 72

Variance 100 121

S 10

Solution: C.VA = *100% = *100% = 17.86% and

X 56

S 11

C.VB = *100% = *100%= 15.28%.

X 72

Since C.VA > C.VB in A there is greater variability in individual wages.

It is the number of standard deviations that a given value X is below or above the mean.

The standard score of any value Xi is defined as

X i mean

Zi

s tan darddeviation

Xi X

Zi (for the sample data sets)

S

36

Values above the mean have positive z-scores and values below the mean have negative Z-scores. Z-scores

are generally meaningless by themselves unless they are compared to the distribution or scores from some

reference group.

Note: A Z-score value less than -2 and greater than 2 considers as unusually low or high value.

Example 1: Two sections were given introduction to statistics examinations. The following information was

given.

Value Section 1 Section 2

Mean 78 90

Standard deviation 6 5

Student A from section 1 scored 90 and student B from section 2 scored 95. Relatively speaking who

performed better?

XA X 90 78

Solution: Z A = 2 and

S 6

XB X 95 90

ZB = 1

S 5

Student A performed better relative to his section because the score of student A is two standard deviation

above the mean score of his section while, the score of student B is only one standard deviation above the

mean score of his section.

Example 2: Two groups of people were trained to perform a certain task and tested to find out which group is

faster to learn the task. For the two groups the following information was given:

Value Group one Group two

Mean 10.4 min 11.9 min

Stan.dev. 1.2 min 1.3 min

Relatively speaking:

a) Which group is more consistent(less variable) in its performance?

b) Suppose a person A from group one takes 9.2 minutes while person B from Group

two takes 9.3 minutes, who was faster in performing the task? Why?

37

Solutions:

a) Use coefficient of variation.

S1 1.2

CV1 = *100% *100% 11.54%

X1 10.4

S2 1.3

CV2 = *100% *100% 10.92%

X2 11.9

Since C.V2 < C.V1, group 2 is more consistent (less variable)

b) Calculate the standard scores of A and B

X A X1 9.2 10.4

ZA = 1 and

S1 1.2

X B X 2 9.3 11.9

ZB = 2

S2 1.3

Person B is faster because the time taken by person B is two standard deviation shorter than the average time

taken by group 2 while, the time taken by person A is only one standard deviation shorter than the average

time taken by group 1

38

3. Elementary probability

A deterministic model is one in which every set of variable states is uniquely determined by parameters in

the model and by sets of previous states of these variables. Hypothesize exact relationships and it will be

suitable when prediction error is negligible

In a non-deterministic (stochastic/probabilistic) model, randomness is present, and variable states are not

described by unique values, but rather by probability distributions. Hence, there will be a defined pattern or

regularity appears to construct a precise mathematical model. Hypothesize two components, which is

deterministic and random error.

Random experiments

An experiment is the process by which an observation (measurement) is obtained. Results of experiments

may not be the same even through conditions which are identical. Such experiments are called random

experiments.

Example:

a. If we aretossing a fair die the result of the experiment is that it will come up with one of the following

numbers in the set S = {1, 2, 3, 4, 5, 6}

b. If an experiment consists of measuring “lifetimes” of electric light bulbs produced by a company,

then the result of the experiment is a time t in hours that lies in some interval say, 0 ≤ t ≤ 4000 where

we assume that no bulb lasts more than 4000 hours.

Sample space

Sample space is the set of all possible outcomes of a random experiment. It is denoted by S. Each

outcome is called sample point.

Simple event: If an event E consists of a single outcome, then it is called a simple or elementary event.

Complement of an event: The complement of event A (denoted by Ac or A ), consists of all the sample

points in the sample space that are not in A.

39

Mutually exclusive events: Two events A and B are said to be mutually exclusive if they cannot occur

simultaneously (i.e. A B ). The intersection of mutually exclusive sets is empty set.

Independent events: Two events are said to be independent if the occurrence of one is not affected by, and

does not affect, the other. If two events are not independent, then they are said to be dependent.

Equally likely out comes: If each out come in an experiment has the same chance to occur, then the

outcomes are said to be equally likely.

Example: In an experiment of rolling a fair die, S = {1, 2, 3, 4, 5, 6}, each sample point is an equally likely

outcome. It is possible to define many events on this sample space as follows:

Example: In tossing a coin the sample space S is S = {Head, Tail} The events will be

A = { Head, Tail }, B = { Head}, C = { Tail } and D = {}.

Definition

Set is a collection of well-defined objects. These objects are called elements. Sets are usually denoted by

capital letters and elements by small letters. Membership for a given set can be denoted by to show

belongingness and to say not belong to the set.

Description of sets: Sets can be described by any of the following three ways. That is the complete listing

method (all element of the set are listed), the partial listing method (the elements of the set can be indicated

by listing some of the elements of the set) and the set builder method (using an open proposition to describe

elements that belongs to the set).

Example: The possible outcomes in tossing a six side die

S = {1, 2, 3, 4, 5, 6} or S = {1, 2, . . ., 6} or S = {x: x is an outcome in tossing a six side die}

Types of set

Universal set: is a set that contains all elements of the set that can be considered the objects of that particular

discussion.

40

Empty or null set: is a set which has no element, denoted by {} or

Finite set: is a set which contains a finite number of elements. (eg.{x: x is an integer, 0 < x < 5})

Infinite set: is a set which contains an infinite number of elements. (eg. {x : x , x > 0})

Sub set: If every element of set A is also elements of set B, set A is called sub sets of B, and denoted by A

B.

Proper subset: For two sets A and B if A is subset of B and B is not sub set of A, then A is said to be a

proper subset of B. Denoted by A B.

Equal sets: two sets A and B are said to be equal if elements of set A are also elements of set B.

Equivalent sets: Two sets A and B are said to be equivalent if there is a one to one correspondence between

elements of the two sets.

Set Operation and their Properties

There are many ways of operating two or more set to get another set. Some of them are discussed below.

Union of sets: The union of two sets A and B is a set which contains elements which belongs to either of the

two sets. Union of two sets denoted by , A B (A union B).

Intersection of sets: The intersection of two sets A and B is a set which contains elements which belongs to

both sets A and B. Intersection of two sets denoted by , A B (A intersection B).

Disjoint sets: are two sets whose intersection is empty set.

Absolute complement or complement: Let U is the universal set and A be the subset of U, then the

complement of set A is denoted by Ac is a set which contains elements in U but does not belong

in A.

Relative complement (or differences): The difference of set A with respected to set B, written as A Bc (or

A – B) is a set which contain elements in A that doesn`t belong in B.

Symmetric difference: of two sets A and B denoted by A B is a set which contain elements which belong

in A but not in B and contain elements which belong in B but not in A. That is, A B is a set

which equals to (A Bc) (B Ac).

Let U be the universal set and sets A, B, C are sets in the universe, the following properties will hold true.

1. A B = B A (Union of sets is commutative)

2. A (B C) = (A B) C = A B C (Union of sets is associative)

3. A B = B A (Intersection of sets is commutative)

4. A (B C) = (A B) C = A B C (Intersection of sets is associative)

41

5. A (B C) = (A B) (A C) (union of sets is distributive over Intersection)

6. A (B C) = (A B) (A C) (Intersection of sets is distributive over union)

7. If A B, then Bc Ac

8. A = A and A =

9. A U = U and A U = A

10. (A B)c = Ac Bc De Morgan‟s first rule

11. (A B)c = Ac Bc De Morgan‟s second rule

12. A = (A B) (A Bc)

In many problems of probability, we are interested in events that are actually combinations of two or more

events formed by unions, intersections, and complements. Since the concept of set theory is of vital

importance in probability theory, we need a brief review.

The union of two sets A and B, A B, is the set with all elements in A or B or both.

The intersection of A and B, A B, is the set that contains all elements in both A & B.

The complement of A, Ac, is the set that contains all elements in the universal set U that are not found in A.

If a sample space has finite number of points, it is called a finite sample space. If it has as many point as

natural numbers1, 2, 3,…it is called a countable infinite sample space. If it has as many point as there are in

some interval, such as 0 <x< 1, it is called a non countable infinite sample space. A sample space which is

finite or countable infinite is often called a discrete sample space while a set which is non countable infinite

is called continuous sample space.

Equally Likely Outcomes

Equally likely outcomes are outcomes of an experiment which has equal chance (equally probable) to

appear. In most cases it is commonly assumed finite or countable infinite sample space is equally likely.

If we have n equally likely outcomes in the sample space then the probability of the i th sample point xi is p

1

(xi) = n , where xican be the first, second,... or the nth outcome.

Example: In an experiment tossing a fair die, the outcomes are equally likely (each outcomeis equally

1

probable. Hence,P(xi = 1) = P(xi = 2) = P(xi = 3) = P(xi = 4) = P(xi = 5) = P(xi = 6) = 6

42

3.5. Counting Techniques

In many cases the number of sample points in a sample space is not very large, and so direct enumeration or

counting of sample points used to obtain probabilities is not difficult. However, problems arise where direct

counting becomes a practical impossibility. To avoid such difficulties we apply the fundamental principles of

counting (counting techniques).

Multiplication Rule

Suppose a task is completed in k stages by carrying out a number of subtasks in each one of the k stages. If

in the first stage the task can be accomplished in n1 different ways and after this in the second stage the task

can be accomplished in n2 different ways, . . . , and finally in the kth stage the task can be accomplished in nk

different ways, then the overall task can be done in n1 ×n2 ×・・・×nk different ways.

Example: Suppose that a person has 2 different pairs of trousers and 3 shirts. In how many ways can he

wear his trousers and shirts?

Example: How many four-digit numbers can be formed from the digits 1, 2, 5, 6 and 9 if each digit can be

used only once? Solution: We have a total of 5*4*3*2= 120 four digit numbers.

Permutations

Suppose that we are given n distinct objects and wish to arrange r of these objects in a line. Since there are n

ways of choosing the 1st object, and after this is done, n - 1 ways of choosing the 2nd object, . . . , and finally

n - r + 1 ways of choosing the rth object, it follows by the fundamental principle of counting that the number

of different arrangements or permutations is given by n(n - 1)(n - 2) . . . (n - r + 1) = nPr where it is noted that

the product has r factors.

We call nPr the number of permutations of n objects taken r at a time and is given by

n!

nPr = n–r !

n!

When r = n, the above equation becomes nPn = n−n != n! which is called n factorial.

Note: 0! = 1

Example: In one year, three awards (research, teaching, and service) will be given to a class of 25 graduate

students in a statistics department. If each student can receive at most one award, how many possible

selections are there?

43

Solution: Since the awards are distinguishable, it is a permutation problem. The total number of sample

points is

25! 25!

25P3= = = (25)(24)(23) = 13, 800.

(25 3)! 22!

Example: A president and a treasurer are to be chosen from a student club consisting of 50people. How

many different choices of officers are possible if there are no restrictions?

50! 50 !

50P2= = (50)(49) = 2450.

(50 2)! 48!

Remark

If a set consists of n objects of which n1 are of one type (i.e., indistinguishable from each other), n2 are of a

second type, . . . , nk are of a kth type. Then the number of different permutations of the objects is given by:

n!

n

pn n

1

,

2

,.. ., nk

=n

1 ! n2 ! . . . nk !

Example: How many different letter arrangements can be made from the letters in the word

“STATISTICS”?

Solution: Here we have 10 total letters, with 2 letters (S, T) appearing 3 times each, letter I appearing twice,

and letters A and C appearing once each

10!

Therefore, there are 50,400 letter arrangements

3!3!1!2!1!

Combinations

In permutation we are interested in the order of arrangement of the objects. In many problems, however, we

are interested only in selecting or choosing objects without regard to order. Such selections are called

combinations.

The total number of combinations of r objects selected from n (also called the combinations of n objects

n n

taken r at a time) is denoted by or C r is given by

r

n n!

=

r r! n − r !

Example: In how many ways can a committee of 2 students be formed out of 6?

6 6! 65

Solution: 15 .

2 2!.4! 2!

44

Example: Out of 5 male workers and 7 female workers of a factory, a task force consisting of 5 workers is to

be formed. In how many ways can this be done if the task force will consist of

(a) 2 male and 3 female workers?

(b) all female workers?

(c) at least 3 male workers?

Solution:

5 7 5! 7!

a) 350

2 3 2!3! 3!4!

5 7 5! 7!

b) 21

0 5 0!5! 5!2 !

5 7 5 7 5 7

c) 210 35 1 246

3 2 4 1 5 0

In any random experiment there is always uncertainty as to whether a particular event will or will not occur.

As a measure of the chance, or probability, with which we can expect the event to occur, it is convenient to

assign a number between 0 and 1. If we are sure or certain that the event will occur, we say that its

probability is 100% or 1, but if we are sure that the event will not occur, we say that its probability is zero.

There are different procedures by means of which we can define or estimate the probability of an event.

These procedures are discussed below:

1. Classical Approach

Let S be a sample space, associated with a certain random experiment and consisting of finitely many sample

points n, say, each of which is equally likely to occur whenever the random experiment is carried out. Then

k

the probability of any event A, consisting of k sample points (0 ≤ k ≤ n), is given by: P(A) = n

Example: What is the probability that an odd number will turn up in rolling a fair die?

Solution: S ={1, 2, 3, 4, 5, 6}; let A ={1, 3, 5}. For a fair die, P(1)=P(2) = =P(6)=1/6; then,

k 3 1

P( A) .

n 6 2

Example: In an experiment of tossing a fair coin three times, find the probability of getting exactly two heads

45

Solution: For each toss, there are two possible outcomes, head (H) or tail (T). Thus, the number of possible

outcomes is n =2x2x2=8.

The sample space is S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}

k 3

Therefore, P(E1) = n = 8.

Example: Out of 5 male workers and 7 female workers of a factory, a task force consisting of 5 workers is to

be formed. What is the probability that the task force will consist of

(a) 2 male and 3 female workers?

(b) all female workers?

(c) at least 3 male workers?

12

Solution: Total possible committee = n(S) = 792

5

5 7

a) Let A = 2 male and 3 female workers , n(A) = 350

2 3

n( A) 350

Hence, P(A) = = 0.442

n( S ) 792

5 7

b) P(all female )

0 5 21

0.0265

12 792

5

246

c) P(at least 3 male) 0.312

792

Let N(A) be the number of times an event A occurs in N repetitions of a random experiment, and assume that

N(A)

the relative frequency of A, , converges to a limit as N →∞. This limit is denoted by P(A) and is called

N

the probability of A.

Example: If records show that 60 out of 100,000 bulbs produced are defective. What is the probability of a

newly produced bulb to be defective?

46

Solution: Let A be the event that the newly produced bulb is defective.

60

P(A) = 0.0006

100,000

Axioms of probability

Probability is a function defined for each event of a sample space S, taking on values in the real line , and

satisfying the following three properties (or axioms of probability). We write P(A) for the probability of

event A occurs

Axiom 1: P(A) ≥ 0 for every event A

Axiom 2: P(S) = 1 , where S = sample space (sure or certain event)

Axiom 3: If A1, A2, A3,…,An are mutually exclusive events (meaning Ai Aj = , i ≠ j),

n

thenP( A1 A2 A3 . . . An ) =P A1 + P A2 + P A3 . . . +P(An )= P(A)

i 1

i

Theorem 3: P( ) = 0, where is empty set

Theorem 4: If A and B are any two events, then P(A B) = P(A) + P(B) - P(A B)

More generally, if A, B, C are any three events, then

P(A B C) = P(A) + P(B) + P(C) - P(A B) - P(B C) - P(A C)+ P(A B C)

Theorem 5: For any events A and B,P(A) = P(A B) + P(A Bc),since (A B) and

(A Bc) are mutually .exclusive.

Example: In a class of 200 students, 138 are enrolled in a mathematics course, 115 are enrolled in

statistics, and 91 are enrolled in both. What percent of these students take

b) Neither mathematics nor statistics d) mathematics but not statistics

Solution:

138 115 91

a) P(M S) = 0.69+0.575 - 0.455 =0.81

200 200 200

81% of the students take either course

b) P(M S)c = 1- P(A B) = 1- 0.81= 0.19. 19% of students take neither course

47

c) P(S Mc) =P(S) – P(S M) = 0.575-0.455=0.12.

12% of students take statistics but not mathematics

d) P(M Sc) =P(M) – P(S M) = 0.69-0.455=0.235

23.5% of students take mathematics but not statistics

48

4. Conditional Probability and Independence

4.1. Conditional Probability

Definition: The conditional probability of an event A, given that event B has occurred with P(B)>0,

P( A B)

denoted by P(A|B), is defined as P(A|B) = . P(B)≠0

P( B)

Note:

P(S|B) =1 , for any event B and S = sample space

P(Ac|B)= 1 – P(A|B)

Example: A fair die is tossed once. What is the probability that the die shows a 4 given that the die shows

even number?

Let A = 4, B= 2, 4, 6, then A B 4

1

P( A B) 1

P(A|B) = = 6

P( B) 3 3

6

Example: A random sample of 200 adults are classified below by sex and their level

of education attained.

Education Male Female

Elementary 38 45

Secondary 28 50

College 22 17

If a person is picked at random from this group, find the probability that

(a) the person is a male given that the person has a secondary education

(b) the person does not have a college education given that the person is a female.

Solution: Let S= the person has secondary education, M = the person is male, F = the person is female,C =

the person has college education and Cc =the person does not have college education

28

P( M S ) 200 28

a) P(M|S) = =

P( S ) 78 78

200

95

c P(C c F ) 200 95

b) P(C |F) = =

P( F ) 112 112

200

49

4.2 Multiplication Rule

If in an experiment the events A and B can both occur, then P(A ∩ B) = P(A)P(B|A), provided P(A) >0.Thus,

the probability that both A and B occur is equal to the probability that A occurs multiplied by the conditional

probability that B occurs, given that A occurs.

Example: Suppose that we have a fuse box containing 20 fuses, of which 5 are defective. If 2 fuses are

selected at random and removed from the box in succession without replacing the first, what is the

probability that both fuses are defective?

Solution: let A be the event that the first fuse is defective and B the event that the second fuse is defective;

then A ∩ B = both fuses are defective

The probability of first removing a defective fuse is ¼ (that is (P(A) =1/4); then the probability that the

second fuse is defective given that the first fuse is defective is 4/19 (i.e. P(B|A) =4/19)

1 4 1

Hence, P(A ∩ B) = P(A)P(B|A) =

4 19 19

In general, If, in an experiment, the events A1, A2, . . . , Akcan occur, then

P(A1∩ A2∩ ·· · ∩Ak) = P(A1)P(A2|A1)P(A3|A1∩ A2) · · · P(Ak|A1∩ A2∩ · · · ∩ Ak-1).

.

Example: Three balls are drawn in succession, without replacement, from a box containing 6 red and 4 blue

balls. Find the probability that three of them are red.

Solution: First we define the events

A1: the first ball is a red,

A2: the second ball is red,

A3: the third ball is red

Required: P( A1 A2 A3 ) =?

6 3 5 4 1

Now, P(A1) = , P( A2 A1 ) , P( A3 A1 A2 )

10 5 9 8 2

3 5 1 1

P(three of them are red) = P(A1∩ A2∩ A3) =

5 9 2 6

50

4.3 Theorem of total probability and Bayes’ Theorem

Partition of sample space: A collection of events {B1,B2, . . . ,Bn} of a sample space S is called a partition

of S if B1, B2, . . . , Bnare mutually exclusive and B1∪ B2∪ ·· · ∪ Bn= S.

If the events B1, B2, . . . , Bnconstitute a partition of the sample space S such that P(Bi) ≠0 for i= 1, 2, . . . , n,

then for any event A of S,

P(A) = P(A ∩ B1) + P(A ∩B2) + P(A ∩B3) + . . . + P(A ∩ Bn)

=P(B1)P(A\B1) + P(B2)P(A\B2) + . . . + P(Bn)P(A\Bn)

Example: In a certain assembly plant, three machines, B1, B2, and B3, make 30%, 45%, and25%,

respectively, of the products. It is known from past experience that 2%, 3%, and 2% of the products made by

each machine, respectively, are defective. Now, suppose that a finished product is randomly selected. What

is the probability that it is defective?

A: the product is defective,

B1: the product is made by machine B1,

B2: the product is made by machine B2,

B3: the product is made by machine B3.

Then, P(B1) = 0.3, P(B2) = 0.45, P(B3) = 0.25 , P(A|B1) =0.02, P(A|B2) = 0.03, P(A|B3) = 0.02

Applying the theorem of total probability,

P(A) = P(B1)P(A|B1) + P(B2)P(A|B2) + P(B3)P(A|B3).

= (0.3)(0.02)+ (0.45)(0.03)+ (0.25)(0.02)= 0.006+0.0135+ 0.005= 0.0245

Suppose that B1, B2, . . .,Bnare partitions of the sample space ( they are mutually exclusive events whose

union is the sample space S). Then if A is any event, we have the follow in theorem:

P ( B r ) P ( A Br )

P( Br A) n

P( B ) P ( A B )

i 1

i i

Example: With reference to above Example, if a product was chosen randomly and found tobe defective,

what is the probability that it was made by machine B3?

51

Solution : Using Bayes‟ rule

P( B3 ) P( A B3 )

P(B3/A)=

P( B1 ) P( A B1 ) P( B2 ) P( A B2 ) P( B3 ) P( A B3 )

and then substituting the probabilities calculated in the above Example, we have

0.005 0.005 10

P( B3 A)

0.006 0.0135 0.005 0.0245 49

Example: An instructor has taught probability for many years. The instructor has found that 80% of students

who do the homework pass the exam, while 10% of students who don‟t do the homework pass the exam. If

60% of the students do the homework,

a) What percent of students pass the exam?

b) Of students who pass the exam, what percent did the homework?

A: the student passes the exam

B: the student does the home work

Bc: the student does not do the home work

Now, P(A|B) = 0.8, P(A|Bc) = 0.1, P(B) = 0.6, P(Bc) =0.4

a) Applying the theorem of total probability,

P(A) = P(B)P(A|B) + P(Bc)P(A|Bc) = (0.6)(0.8) + (0.4)(0.1) = 0.48+0.04 = 0.52

52% of students pass the exam

b) Applying Bayes‟ rule,

P( B)( A B) 0.48

P(B|A) = 0.9231

P( B) P( A B) P( B ) P( A B )

c c 0.48 0.04

Definition: Two events A and B are said to be independent (in the probability sense), if P(A∩ B) = P(A)

P(B).

In other words, two events A and B are independent means the occurrence of one event A is not affected by

the occurrence or non-occurrence of B and vice versa.

52

Remark

If two events A and B are independent, then P(B\A) = P(B), for P(A) > 0 and P(A|B) = P(A) where P(B) > 0.

The definition of independent event can be extended in two more than two event as follow:

Definition: The events A1, A2, . . . ,An are said to be independent (statistically or stochastically or in the

probability sense) if, for all possible choices of k out of n events (2 ≤ k ≤ n), the probability of their

intersection equals the product of their probabilities. More formally, a collection of events A={A1, A2, . .

.,An}are mutually independent if for any subset of A, Ai1, Ai2, . . ., Aik for 2 ≤ k ≤ n, we have

P( Ai1 ... Aik ) P( Ai1 ) . . . P( Aik )

NB: If at least one of the relations violates the above equation, the events are said to be dependent.

The three events, A1, A2 and A3,are independent if the following four conditions are satisfied.

P(A1∩A2) = P(A1) P(A2),

P(A1∩A3) = P(A1) P(A3),

P(A2∩A3) = P(A2) P(A3),

P(A1∩A2∩A3) = P(A1) P(A2) P(A3).

The first three conditions simply assert that any two events are independent, a property known as pair wise

independence. But the fourth condition is also important and does not follow from the first three. Conversely,

the fourth condition does not imply the first three conditions.

NB: The equality P(A1 ∩ A2 ∩ A3) = P(A1)P(A2)P(A3) is not enough for independence.

Example: Consider two independent rolls of a fair die, and the following events:

A = {1st roll shows 1, 2, or 3},

B = {1st roll shows 3, 4, or 5},

C = {the sum of the two rolls is 9}.

1 1 1

We have𝑃 𝐴 ∩ 𝐵 = 6 ≠ 2 ∗ 2 = 𝑃 𝐴 𝑃 𝐵 ;

1 1 4

𝑃 𝐴∩𝐶 = ≠ ∗ = 𝑃 𝐴 𝑃(𝐶)

36 2 36

53

1 1 4

𝑃 𝐵∩𝐶 = ≠ ∗ = 𝑃 𝐵 𝑃(𝐶)

12 2 36

Thus the three events A, B, and C are not independent, and indeed no two of these events are independent.

On the other hand, we have

1 1 1 4

𝑃 𝐴∩𝐵∩𝐶 = = . . = 𝑃 𝐴 𝑃 𝐵 𝑃(𝐶)

36 2 2 36

Note: If the events A and B are independent, then all three sets of events are also independent:

A and Bc; Ac and B; Ac and Bc

Example: If A and B are independent, then show that A and Bc are also independent.

Proof:

We need to show P(A B c ) P( A) P( B c )

From set and probability theory, P(A) = P(A B) + P(A Bc)

So, P(A Bc) = P(A) – P(A B)

= P(A) – P(A)P(B), A and B are independent (given)

= P(A) 1 P( B)

P(A Bc) = P(A)P(Bc), hence proved.

54

5. One-Dimensional Random Variables

5.1. Definitions of Random Variables

Let S be a sample space of an experiment and X is a real valued function defined over the sample space S,

then X is called a random variable (or stochastic variable).

A random variable, usually shortened to r.v. (rv), is a function defined on a sample space S and taking values

in the real line , and denoted by capital letters, such as X, Y, Z. Thus, the value of the r.v. X at the sample

point s is X(s), and the set of all values of X, that is, the range of X, is usually denoted by X(S) or RX.

The difference between a r.v. and a function is that, the domain of a r.v. is a sample space S, unlike the usual

concept of a function, whose domain is a subset of or of a Euclidean space of higher dimension. The

usage of the term “random variable” employed here rather than that of a function may be explained by the

fact that a r.v is associated with the outcomes of a random experiment. Of course, on the same sample space,

one may define many distinct r.vs.

Example 1: Assume tossing of three distinct coins once, so that the sample space is S = {HHH, HHT, HTH,

THH, HTT, THT, TTH, TTT}. Then, the random variable X can be defined as X(s), X(s) = the number

of heads (H‟s) in S.

Example 2: In rolling two distinct dice once. The sample space S is S = {(1, 1), (1, 2), . . , (2, 1), . . , (6, 1),

(6, 2), . .. , (6, 6)}, a r.v. X of interest may be defined by X(s) = sum of the numbers in the pair S.

In the examples discussed above we saw r.v.s with different values. Hence, random variables can be

categorized in to two broad categories such as discrete and continuous random variables.

A random variable X is called discrete (or of the discrete type), if X takes on a finite or countable infinite

number of values; that is, either finitely many values such as x1, . . . , xn, or countable infinite many values

such as x0, x1, x2, . . . .

Or we can describe discrete random variable as, it

Take whole numbers (like 0, 1, 2, 3 etc.)

Take finite or countably infinite number of values

Jump from one value to the next and cannot take any values in between.

55

Example 3: In Example 1 and 2 above, the random variables defined are discrete r.v.s.

Example 4:

Experiment Random Variable (X) Variable values

Children of one gender in a family Number of girls 0, 1, 2, …

Answer 23 questions of an exam Number of correct 0, 1, 2, ..., 23

Count cars at toll between 11:00 am &1:00 pm Number of cars arriving 0, 1, 2, ..., n

If X is a discrete random variable, the function given by f(x) = P(X = x) for each x within the range of X is

called the probability distribution or probability mass function of X.

Remark

The probability distribution (mass) function f(x), of a discrete random variable X, satisfy the

following two conditions

1. f (x) ≥ 0

2. 𝑥 𝑓 𝑥 = 1, The summation is taken over all possible values of x.

Example 5: A shipment of 20 similar laptop computers to a retail outlet contains 3 that are defective. If a

school makes a random purchase of 2 of these computers, find the probability distribution for the number of

defectives. Check that f (x) defines a pdf;

Solution: Let X be a random variable whose values x are the possible numbers of defective computers

purchased by the school. Then x can only take the numbers 0, 1, and 2. Now

56

5.3. Continuous Random Variables

A r.v X is called continuous (or of the continuous type) if X takes all values in a proper interval I ⊆ . Or

we can describe continuous random variables as follows:

Take whole or fractional number.

Obtained by measuring.

Take infinite number of values in an interval.

Example 7: Recording the lifetime of an electronic device, or of an electrical appliance. Here S is the

interval (0, T) or for some justifiable reasons, S = (0, ∞), a r.v. X of interest is X(s) = s, s ∈ S. Here the

random variables defined are continuous r.v.s

Weigh 100 People Weight 45.1, 78, ...

Measure Part Life Hours 900, 875.9, …

Measure Time Between Arrivals Inter-Arrival time 0, 1.3, 2.78, ...

A function with values f(x), defined over the set of all real numbers, is called a probability density function

of the continuous random variable X if and only if

𝑏

P (a ≤ x ≤ b) = ∫𝑎 𝑓 𝑥 𝑑𝑥 for any real constant a ≤ b.

Probability density function also referred as probability densities (p.d.f.), probability function, or simply

densities.

Remarks

The probability density function f (x) of the continuous random variable X, has the following

properties (satisfy the conditions)

1. f(x) ≥ 0 for all x, or for −∞ <x < ∞

2. f ( x) dx 1

If X is a continuous random variable and a and b are real constants with a ≤ b, then

P (a ≤ x ≤ b) = P (a < x ≤ b) = P (a ≤ x < b) = P (a < x < b)

57

Example 9: Suppose that the r-v X is continuous with the pdf of f x 2 x, o x 1,

0, otherwise

b) Find P X 0.5 ;

2 3 3

1 1

1

0

0 0

1

Hence, f (x) is the pdf of some r-v X and note that f ( x)dx f ( x)dx, since f(x) is zero in the other two

0

intervals: , 0 1, .

0.5 0.5

0.5

0.25.

0

0 0

2 3 3 3 2

1/ 2

Then, P X 1 1 X 2 P( A / B) P( A B) , where P( A B) 5

2 xdx 36 ,

2 3 3 P( B) 1/ 3

2/3

1

and P( B) 2 xdx 3 .

1/ 3

5 / 36 5 5

P( A / B) 3 .

1/ 3 36 12

The cumulative distribution function, or the distribution function, for a random variable X is a function

defined by:𝐹 𝑥 = 𝑃 (𝑋 ≤ 𝑥)

Where x is any real number, i.e., - ∞ < x < ∞. Thus, the distribution function specifies, for all real values x,

the probability that the random variable is less than or equal to x.

1. 0 ≤ F(x) ≤ 1 for all x in R

2. F(x) is non-decreasing [i.e., F(x) ≤ F(y) if x ≤ y].

58

3. F(x) is continuous from the right [i.e., lim F ( x h) F ( x) for all x]

h 0

4. lim F ( x ) 0 and lim F (x ) 1

x x

If X is a discrete random variable, the function given by: F ( x) P ( X x) f (t ) For all x in and t

t x

∈X, where f(t) is the value of probability distribution or p.m.f of X at t, is called the distribution function, or

the cumulative distribution function of X. If X takes on only a finite number of values x 1, x2, . . . , xn, then

the distribution function is given by:

𝑜 −∞<𝑥 <∞

𝑓 𝑥1 𝑥1 ≤ 𝑥 < 𝑥2

𝑓 𝑥1 + 𝑓 𝑥2 𝑥2 ≤ 𝑥 < 𝑥3

.

𝐹 𝑥 =

.

.

𝑓 𝑥1 + ⋯ … + 𝑓 𝑥𝑛 𝑥𝑛 ≤ 𝑥 < ∞

Example 10: Find the cumulative distribution function of the random variable X , if the following

information is given as follows f(0)= 1/16, f(1) = 1/4, f(2)= 3/8, f(3)= 1/4, and f(4)= 1/16. Therefore,

59

5.4.2.Distribution Functions of Continuous Random Variables

If X is a continuous random variable and the value of its probability density is f (t), then function given by

x

F ( x) P ( X x) f (t ) dt

is called the distribution function, or the cumulative distribution of the continuous

r.v. X.

Theorem: If f (x) and F(x) are the values of the probability density and the distribution function of X at x,

then P (a ≤ x ≤ b) = F(b) - F(a)

𝑑𝐹 (𝑥)

For any real constant a and b with a ≤ b, and 𝑓 𝑥 = Where the derivative exist.

𝑑𝑥

Example 11: (a) Find the constant C such that the function f(x) is the density function of a r.v. X, where f(x)

2

is given by 𝑓 𝑥 = 𝐶𝑥 0 < 𝑥 < 3 (b) Compute P(1 < x < 2)?

0 𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒

3 3

2

0 0

3

x3 27 𝑐

=c =1 , 3 = 1 , c= 1/9

3 0.

2

2 2

x3

b) .P(1 < x < 2) = P(1 X 2) f ( x)dx cx dx = c

2

=1/27( 8-1)=7/27

1 1

31

60

6. Functions of Random Variables

In standard statistical methods, the result of statistical hypothesis testing, estimation, or even statistical

graphics does not involve a single random variable but, rather, functions of one or more random variables.

As a result, statistical inference requires the distributions of these functions. In many situations in statistics,

we may be interested (it is necessary) to derive the probability distribution of a function of one or more

random variables.

Let X be a random variable defined on a sample space, S, and let Y be a function of X then Y is also a

random variable. Define Rx and Ry called the range space of X and Y can take. Let C ∁ Ry and B ∁ Rx

defined as: B ={X ∈ Rx: Y(X)∈ C} then the event B and C are called equivalent events. Or if B and C are

two events defined on different sample spaces, saying they are equivalent means that one occurs if and only

if the other one occurs. Or let E be an experiment and S be its sample space and X be a random variable

defined on S and let Rx be its range space. Let B be an event with respected to Rx, that is, B ⊆ Rx, suppose

that A is defined as A ={s ɛ S: X(s) ɛ B}, and we say A and B are equivalent events.

Example 1: In tossing two coins the sample space S = {HH, HT, TH, TT}. Let the random variable X =

Number of heads, Rx = {0, 1, 2}. Let B ⊆ Rx and B = {1}. Moreover X (HT) = X (TH) = 1. If A =

{HT, TH} then A and B are equivalent events.

Example 2: Let X is a discrete random variable on scores of a die and Y = X2, then Y is a discrete random

variable as X is discrete. Therefore, the range sample space of X is Rx = {1, 2, 3, 4, 5, 6,} and the

range sample space of Y is Ry = {1, 4, 9, 16, 25, 36}. Now,

{Y =4} is equivalent to {X=2}

{Y < 9} is equivalent to {X <3}

{Y ≤25} is equivalent to {X ≤5}etc.

Let B be an event in the range space Rx of the random variable X, we define P(B) as P(B) = P(A) where A =

{s ɛ S: X(s) ɛ B}.From this definition, we saw that if two events are equivalent then their probabilities are

equal.

6.2. Functions of discrete random variables

If X is a discrete or continuous random variable and Y is a function of X, then it follows immediately that Y

is also discrete or continuous. Suppose that X is a discrete random variable with probability distribution p(x).

61

Let Y = g(X) define a one-to-one transformation between the values of X and Y so that the equation y = g(x)

can be uniquely solved for x in terms of y, say x = w(y). Then the probability distribution of Y is p(y) =

p[w(y)].

3 1 𝑥

Example: Let X be a random variable with probability distribution p(x) = , x= 1, 2, 3, . . . then find

4 4

Solution: Since the values of X are all positive, the transformation defines a one-to-one correspondence

3 1 𝑦

between the x and y values, y = x2 and x = 𝑦. Hence p (y) =p( 𝑦) = , y= 1, 4, 9, . . . , and

4 4

0, elsewhere.

Example: If X is the number of heads obtained in four tosses of a balanced coin, find the probability

1

distribution of H(X) = .

1+𝑋

Solution: The sample space S = {HHHH, HHHT, HHTH, HTHH, THHH, HHTT, HTHT, HTTH, TTHH,

THTH, THHT, HTTT, TTTH, TTHT, THTT, TTTT}

x 0 1 2 3 4

p(x) 1/16 4/16 6/16 4/16 1/16

Then, using the relation y = 1/ (1 + x) to substitute values of Y for values of X, we find the

probability distribution of Y

p(y) 1/16 4/16 6/16 4/16 1/16

A straight forward method of obtaining the probability density function of continuous random variables

consists of first finding its distribution function and then the probability density by differentiation. Thus, if X

is a continuous random variable with probability density f(x), then the probability density of Y = H(X) is

obtained by first determining an expression for the probability

G (y) = P(Y ≤ y) = P (H(X) ≤ y) and then differentiating

𝑑 𝐺(𝑦)

𝑔 𝑦 = 𝑑𝑦

62

To find the probability distribution of the random variable Y = u(X) when X is a continuous random variable

and the transformation is one-to-one, we shall need the following definition.

Suppose that X is a continuous random variable with probability distribution f(x). Let Y = g(X) define a one-

to-one correspondence between the values of X and Y so that the equation y = g(x) can be uniquely solved for

x in terms of y, say x = w(y). Then the probability distribution of Y is f(y) = f[w(y)]|J|, where J = w’(y) and is

called the Jacobian of the transformation.

Remarks

Suppose that X1and X2are discrete random variables with joint probability distribution p(x1, x2). Let

Y1= g1(X1,X2) and Y2= g2(X1,X2) define a one-to-one transformation between the points (x1, x2) and

(y1, y2) so that the equations y1= g1(x1, x2) and y2= g2(x1, x2) may be uniquely solved for x1and x2 in

terms of y1and y2, say x1= w1(y1, y2) and x2= w2(y1, y2). Then the joint probability distribution of Y1and

Y2 is g(y1, y2) = p[w1(y1, y2), w2(y1, y2)].

To find the joint probability distribution of the random variables Y1= g1(X1,X2) and Y2= g2(X1,X2) when X1and

X2 are continuous and the transformation is one-to-one, we need an additional definition as follows:

Suppose that X1 and X2 are continuous random variables with joint probability distribution f(x1, x2).

Let Y1= g1(X1,X2) and Y2= g2(X1,X2) define a one-to-one transformation between the points (x1, x2) and

(y1, y2) so that the equations y1= g1(x1, x2) and y2= g2(x1, x2) may be uniquely solved for x1 and x2 in

terms of y1 and y2, say x1= w1(yl, y2) and x2= w2(y1, y2). Then the joint probability distribution of Y1and

Y2 is g(y1, y2) = f[w1(y1, y2), w2(y1, y2)]|J|, where the Jacobian is the 2 × 2 determinant

𝜕𝑥 1 𝜕𝑥 1

𝜕𝑦 1 𝜕𝑦 2

J= 𝜕𝑥 2 𝜕𝑥 2

𝜕𝑦 1 𝜕𝑦 2

𝜕𝑥 1

and is simply the derivative of x1= w1(y1, y2) with respect to y1 holding y2 constant, as the partial

𝜕𝑦 1

derivative of x1with respect to y1. The other partial derivatives are defined in a similar manner.

𝑥

for 1 < 𝑥 < 5

𝑓 𝑥 = 12 Then find the probability distribution of the random variable Y = 2X − 3

0 elsewhere

Solution: The inverse solution of y = 2x − 3 yields x = (y + 3)/2, from which we obtain

63

J = w’(y) = dx/dy = 1/2. Therefore, we find the density function of Y to be

𝑦+3 1 𝑦+3

f(y) = = , −1 < y <7, and 0, elsewhere.

24 2 48

Example: Let X1 and X2 be two continuous random variables with joint probability distribution

f(x1, x2) = 4x1x2, 0 < x1 <1, 0 < x2 <1, and 0, elsewhere. Then find the joint probability

distribution of Y1= 𝑋12 and Y2= X1X2.

𝑦2 1

Solution: The inverse solutions of y1= 𝑥12 and y2= x1x2 are x1= 𝑦1 and x2= , from which we obtain: J = 2𝑦 .

𝑦1 1

2𝑦 2

Finally, from the above definition the joint probability distribution of Y1 and Y2 is g(y1, y2) = , 𝑦22 < y1<1, 0

𝑦1

64

7. Two or More Dimension Random Variables

7.1. Definitions of Two-dimensional Random Variables

We are often interested simultaneously in two outcomes rather than one. Then with each one of these

outcomes a random variable is associated, thus we are furnished with two random variables or a 2-

dimensional random vector denoted by (X, Y).

o Let (X, Y) is a two-dimensional random variable. (X, Y) is called a two dimensional discrete random

variable if the possible values of (X, Y) are finite or countable infinite. That is the possible values of

(X, Y) may be represented as (xi, yj), i = 1, 2, ….,n, … and j = 1, 2, . . . , m,….

o Let (X, Y) is a two-dimensional random variable. (X, Y) is called a two dimensional continuous

random variables if the possible values of (X, Y) can assume all values in some non countable set of

Euclidian space. That is, (X, Y) can assume values in a rectangle {(x,y): a ≤ x ≤ b and c ≤ y ≤ d} or in

a circle {(x,y): x2 + y2 ≤ 1} etc.

If X and Y are two random variables, the probability distribution for their simultaneous occurrence can be

represented by a function with values p(x, y) for any pair of values (x, y) within the range of the random

variables X and Y. It is customary to refer to this function as the joint probability distribution of X and Y.

o Let (X, Y) is a two-dimensional discrete random variables that is the possible values of (X, Y) may

be represented as (xi, yj), i = 1, 2, ….,n, … and j = 1, 2, . . . , m,….Hence, in the discrete case, p(x, y)

= P(X = x, Y = y); that is, the values p(x, y) give the probability that outcomes x and y occur at the

same time, then the function p(x, y) is a joint probability distribution or probability mass function

of the discrete random variables X and Y if:

1. P(xi. yj) ≥ 0 for all (x, y)

f ( x, y) 1

2. x y

Example: Two ballpoint pens are selected at random from a box that contains 3 blue pens, 2 red pens, and 3

green pens. If X is the number of blue pens selected and Y is the number of red pens selected, then find the

joint probability mass function p(x, y) and verify that it is pmf.

65

Solution: The possible pairs of values (x, y) are (0, 0), (0, 1), (1, 0), (1, 1), (0, 2), and (2, 0). Now, p(0, 1), for

example, represents the probability that a red and a green pens are selected. The total number of equally

8

likely ways of selecting any 2 pens from the 8 is = 28. The number of ways of selecting 1 red from 2 red

2

2 3 6 2

pens and 1 green from 3 green pens is = 6. Hence, p(0, 1) =28 = . Similar calculations yield the

1 1 14

probabilities for the other cases, whichare presented in the following Table.

Joint Probability Distribution

(Y, 0 1 2 𝑝𝑦 (𝑦)

X)

0 3 9 3 15

28 28 28 28

1 3 3 0 3

14 14 7

2 1 0 0 1

28 28

𝑝𝑥 (𝑥) 5 15 3 1

14 28 28

The probabilities sum to 1is shows that it is probability mass function. Note that, the joint probability mass

3 2 3

𝑥 𝑦 2−𝑥−𝑦

function of the above Table can be represented by the formula: p(x, y) = 8 , for x = 0, 1, 2; y = 0, 1,

2

2; and 0 ≤ x + y ≤ 2.

Example: Consider two discrete random variables, X and Y, where x=1 or x=2, and y=0 and y=1. The

bivariate probability mass function for X and Y is defined as follows. p(x, y)=

0.25+𝑥−𝑦

, consider the joint probability function and then verify that the properties of a discrete joint

5

Solution: Since X takes on two values (1 or 2) and Y takes on two values (0 or 1), there are 2x2 = 4 possible

combinations of X and Y. these four (x, y) pairs are (1,0), (1,1), (2, 0), and (2, 1). Substituting these

possible values of X and Y into the formula for p(x, y), we obtain the following joint probabilities.

X 1 2

0 0.25 0.45

Y 1 0.05 0.25

The probabilities sum to 1 and all values are non negative are shows that it is probability mass function.

66

o Let (X, Y) is a two dimensional continuous random variables assuming all values in some region R of

the Euclidian space that is, (X, Y) can assume values in a rectangle {(x,y): a ≤ x ≤ b and c ≤ y ≤ d} or

in a circle {(x,y): x2 + y2 ≤ 1} etc, then the function f(x, y) is a joint density function of the

continuous random variables X and Y if:

1) f(x, y) ≥ 0 for all (x, y) ∈ R and

2) ∫ ∫ 𝑓 𝑥, 𝑦 𝑑𝑥𝑑𝑦 = 1

Examples: The joint probability function of two continuous random variables X and Y is given by

f ( x, y) c (2 x y) , where x and y can assume all integers such that 0 ≤x ≤ 2, 0 ≤ y ≤ 3, and f (x, y) = 0

otherwise.

a) Find the value of the constant c?

b) Find P(X ≤ 2, Y ≤ 1)?

3 2 3 3

Solution: (a) ∫0 ∫0 𝑐 2𝑥 + 𝑦 𝑑𝑥𝑑𝑦 = 1 =c ∫0 [𝑥 2 + 𝑦𝑥]20 𝑑𝑦 = 𝑐 ∫0 (4 + 2𝑦) 𝑑𝑦 = 𝑐[4𝑦 + 𝑦 2 ]30

= 21c then c =1/21.

1 2 1 1 1

(b) p(X ≤2, Y ≤1) = ∫0 ∫0 2𝑥 + 𝑦 𝑑𝑥𝑑𝑦 = 21 ∫0 [𝑥 2 + 𝑦𝑥]20 𝑑𝑦

21

1 1 1 1 5

= 21 ∫0 (4 + 2𝑦) 𝑑𝑦 = 21 [4𝑦 + 𝑥 2 ]10 = 21 4+1) = 21

A function closely related to the probability distribution is the cumulative distribution function, CDF. If (X,

Y) is a two-dimensional random variable, then the cumulative distribution function is defined as follows. Let

(X, Y) is a two-dimensional discrete random variable, then the joint distribution or joint cumulative

distribution function, CDF of (X, Y) is defined by F(x, y) = P(X ≤ x, Y ≤ y)

= p (s, t ),

s x t y

s ≤ x, t ≤ yfor -∞ <x<∞ and -∞ <y<∞, where p(s, t) is the joint probability mass function

Let (X, Y) is a two dimensional continuous random variable, then the joint distribution or joint cumulative

distribution function, CDF of (X, Y) is defined by F(x, y) = P(X ≤ x, Y ≤ y)

y x

= f (s, t ) ds dt

for -∞ <x<∞ and -∞ <y<∞, where f(s, t) is the joint probability density function of (X, Y)

at (s, t).

67

Remark:

If F(x, y) is joint cumulative distribution function of a two dimensional random variable (X, Y) with joint

𝑑 2 𝐹(𝑥,𝑦 )

p.d.f f(x, y), then: 𝑓 𝑥, 𝑦 = .

𝑑𝑥 𝑑𝑦

Marginal Probability Distributions

In a two dimensional random variable (X, Y) we associated two one dimensional random variables X and Y.

Sometime we may be interested in the probability distribution of X or Y. Given the joint probability

distribution p(x, y) of the discrete random variables X and Y, the probability distribution px(x) of X alone is

obtained by summing p(x, y) over the values of Y. Similarly, the probability distribution py(y) of Y alone is

obtained by summing p(x, y) over the values of X. We define px(x) and py(y) to be the marginal

distributions of X and Y, respectively. When X and Y are continuous random variables, summations are

replaced by integrals.

If X and Y are two-dimensional discrete random variables and p(x, y) is the value of their joint probability

mass function at (x, y), the function given by px(x) = p ( x, y ) for each y within the range of X is called the

y

marginal distribution of X. Similarly, the function given by py(y) = p ( x, y) for each x within the range of

x

The term marginal is used here because, in the discrete case, the values of g(x) and h(y) are just the marginal

totals of the respective columns and rows when the values of f(x, y) are displayed in a rectangular table.

Examples: Consider two discrete random variables, X and Y with the joint probability mass function of X

and Y:

X 1 2

Y 0 0.25 0.45

1 0.05 0.25

68

Solution:

x 1 2 Total y 0 1 Total

Px(x) 0.3 0.7 1 Py(y) 0.7 0.3 1

Example: Two ballpoint pens are selected at random from a box that contains 3 blue pens, 2 red pens, and 3

green pens. If X is the number of blue pens selected and Y is the number of red pens selected have the

joint probability mass function p(x, y) as shown below. Then verify that the column and row totals are

the marginal of X and Y, respectively.

(X, Y) 0 1 2

0 3 9 3

28 28 28

1 3 3 0

14 14

2 1 0 0

28

Solution:

X 0 1 2 Total y 0 1 2 Total

𝑝𝑥 (𝑥) 5 15 3 1 Py(y) 15 3 1 1

14 28 28 28 7 28

If X and Y are two-dimensional continuous random variables and f(x, y) is the value of their joint probability

density function at (x, y), the function given by fx(x) = f ( x, y) dy for - ∞ ≤ x ≤ ∞ is called the marginal

distribution of X. Similarly, the function given byfy(y) = f ( x, y) dx for - ∞

≤ y ≤ ∞ is called the marginal

distribution of Y.

Remark

The fact that the marginal distributions px(x) and py(y) are indeed the probability distributions of the

individual variables X and Y alone can be verified by showing that the conditions of probability distributions

stated in the one-dimensional case are satisfied.

In one-dimensional random variable case, we stated that the value X of the random variable X represents an

event that is a subset of the sample space. If we use the definition of conditional probability as stated in the

69

𝐴∩𝐵

previous chapter, P(B/A) = 𝑝(𝐴), provided p(A) > 0, where A and B are now the events defined by X = x and

Y = y, respectively, then

𝑝(𝑋=𝑥,𝑌=𝑦) 𝑝(𝑥, 𝑦)

P(Y = y | X = x) = = , provided px(x) >0, where X and Y are discrete random variables. It is

𝑝(𝑋=𝑥) 𝑝𝑥 (𝑥)

𝑝(𝑥, 𝑦)

clear that the function , which is strictly a functionof y with x fixed, satisfies all the conditions of a

𝑝𝑥 (𝑥)

probability distribution. This is also true when f(x, y) and 𝑓𝑥(𝑥) are the joint probability density function and

marginal distribution, respectively, of continuous random variables. As a result, it is extremely important that

𝑓(𝑥, 𝑦)

we make use of the special type of distribution of the form , inorder to be able to effectively compute

𝑓𝑥 (𝑥)

conditional probabilities. This type of distribution is called a conditional probability distribution; the

formal definitions are given as follows.

o The probability of numerical event X, given that the event Y occurred, is the conditional probability

of X given Y = y. A table, graph or formula that gives these probabilities for all values of Y is called

the conditional probability distribution for X given Y and is denoted by the symbol p(x/y).

Therefore, let X and Y be discrete random variables and let p(x, y) be their joint probability mass function,

𝑝(𝑥,𝑦)

then the conditional probability distributions for X and Y is defined as: p(x/y) = , provided py(y) > 0.

𝑝 𝑦 (𝑦 )

𝑝(𝑥,𝑦)

Similarly, the conditional probability distribution of X given that Y = y is defined as: p(y/x) = , provided

𝑝 𝑥 (𝑥)

px(x) > 0.

Again, let X and Y be continuous random variables and let f(x, y) be their joint probability density function,

𝑓(𝑥,𝑦)

then the conditional probability distributions for X and Y is defined as: f(x/y) = , provided fy(y) > 0.

𝑓𝑦 (𝑦 )

𝑓(𝑥,𝑦 )

Similarly, the conditional probability distribution of X given that Y = y is defined as: f(y/x) = , provided

𝑓𝑥 (𝑥)

fx(x) > 0.

70

Examples: The joint probability mass function of two discrete random variables X and Y is given by p(x, y)

= cxy for x = 1, 2, 3and y = 1, 2, 3, and zero otherwise. Then find the conditional probability

distribution of X given Y and Y given X.

Solution: first 𝑐𝑥𝑦 = 1 = c(1x1 + 1x2 + …+ 3x2 + 3x3) = 1, then c = 1/36 and finally P(x,y) = (xy)/36.

𝑥𝑦 𝑥𝑦

𝑝(𝑥,𝑦) 36 𝑦 36 𝑥

Therefore, p(X/Y) = = = 6 , y = 1, 2, 3 and p(Y/X) = = 6 , x = 1, 2, 3.

𝑝 𝑦 (𝑦 ) ∀𝑥 𝑝(𝑥,𝑦) ∀𝑦 𝑝(𝑥,𝑦)

Example: A software program is designed to perform two tasks, A and B. let X represent the number of IF-

THEN statement in the code for task A and let Y represent the number of IF-THEN statements in the

code for task B. the joint probability distribution p(x, y) for the two discrete random variables is

given in the accompanying table.

X

0 10 2 3 4 5

Y 0 0.000 0.050 0.025 0.000 0.025 0.000

1 0.200 0.050 0.000 0.300 0.000 0.000

2 0.100 0.000 0.000 0.000 0.100 0.150

Then construct the conditional probability distribution of X=0 given Y= 1 and Y=2 given X =5.

𝑝(𝑥=0,𝑦=1) 0.2

Solution: p(X=0/Y=1) = = 0.55 = 4/11

𝑝 𝑦 (𝑦 =1)

Example: The joint density function for the random variables (X, Y ), where X is the unit temperature change

and Y is the proportion of spectrum shift that a certain atomic particle produces, is f(x, y) = 10xy2, 0 <

x < y <1, and 0, elsewhere, then

(a) Construct the conditional probability distribution of Y given X.

(b) Find the probability that the spectrum shifts more than half of the total observations, given that the

temperature is increased by 0.25 units.

𝑓(𝑥,𝑦) 10𝑥𝑦 2 10𝑥𝑦 2 10𝑥𝑦 2 3𝑦 2

Solution: (a) f(y/x) = = 1 = 1 = 1 = 1− 𝑥 3 , 0 < x < y < 1

𝑓𝑥 (𝑥) ∫𝑥 𝑓(𝑥,𝑦)𝑑𝑦 ∫𝑥 10𝑥𝑦 2 𝑑𝑦 ∫𝑥 10𝑥𝑦 2 𝑑𝑦

71

1 1

∫0 ∫1/2 10𝑥𝑦 2 𝑑𝑦𝑑𝑥 1 1

(b) p(Y > ½ /x = ¼) = = ∫1/2 𝑓 𝑦 / 𝑥 = 4 𝑑𝑦 = 8/9.

𝑓𝑥 (𝑥=1/4)

If the conditional probability distribution of X given Y does not depend on y, then the joint probability

distribution of X and Y is become the product of the marginal distributions of X and Y. It should make sense

to the reader that if the conditional probability distribution of X given Y does not depend on y, then of course

the outcome of the random variable Y has no impact on the outcome of the random variable X. In other

words, we say that X and Y are independent random variables. We now offer the following formal definition

of statistical independence.

o Let X and Y be two discrete random variables with joint probability mass function of p(x, y) and

marginal distributions px(x) and py(y), respectively. The random variables X and Y are said to be

statistically independent if and only if p(x, y) = fx(x)fy(y), for all (x, y) within their range.

o Let X and Y be two continuous random variables with joint probability density function f(x, y) and

marginal distributions fx(x) and fy(y), respectively. The random variables X and Y are said to be

statistically independent if and only if f(x, y) = fx(x)fy(y), for all (x, y) within their range.

Note that, checking for statistical independence of discrete random variables requires a more thorough

investigation, since it is possible to have the product of the marginal distributions equal to the joint

probability distribution for some but not all combinations of (x, y). If you can find any point (x, y) for which

p(x, y) is defined such that p(x, y) ≠px(x)py(y), the discrete variables X and Y are not statistically independent.

Remark

If we know the joint probability distribution of X and Y, we can find the marginal probability

distributions, but if we have the marginal probability distributions, we may not have the joint

probability distribution unless X and Y are statistically independent.

Theorem:

a) Let (X, Y) be a two dimensional discrete random variable. Then, X and Y are independent if and only

if P(xi | yj) = Pxi(xi) for all i and j and P(yj | xi) = Pyj(yj) for all i and j.

b) Let (X, Y) be a two dimensional continuous random variable. Then, X and Y are independent if and

only if f(x| y) = fx(x) for all (x, y)and equivalently f(y | x)= fy(y) for all (x, y).

72

Examples: Let X and Y are binary random variables; that is 0 or 1 are the only possible outcomes for each of

X and Y. p(0, 0) = 0.3; p(1, 1) = 0.2 and the marginal probability mass function of x = 0 and x= 1 are

0.6 and 0.4, respectively. Then

(a) Construct the joint probability mass function of X and Y;

(b) Calculate the marginal probability mass function of Y.

Solution: (a) (b)

X 0 1 Py(y)

0 0.3 0.2 0.5

Y 1 0.3 0.2 0.5

Px(x) 0.6 0.4 1

Example: Let X and Y are the life length of two electronic devices. Suppose that their joint p.d.f is given

−(𝑥 + 𝑦)

by 𝑓 𝑥, 𝑦 = 𝑒 𝑥 ≥ 0 𝑎𝑛𝑑 𝑦 > 0, can these two random variables independent?

0 𝑒𝑙𝑠𝑒𝑤𝑒𝑟𝑒

Solution: If X and Y are independent, then the product of their marginal distributions should equal to the

joint pdf. So, fx(x) = 𝑒 −𝑥 x ≥ 0 and fy(y) = 𝑒 −𝑦 y ≥ 0.

Now f(x, y) = fx(x) fy(y) = 𝑒 −𝑥 𝑒 −𝑦 = 𝑒 −(𝑥+𝑦) x ≥ 0, y ≥ 0. Implies X and Y are statistically independent.

73

8. Expectation

8.1. Expectation of a Random Variable

The data we analyze in engineering and the sciences often results from observing a process. Consequently,

we can describe process data with numerical descriptive measures, such as its mean and variance. Therefore,

the expectation of X is very often called the mean of X and is denoted by E(X). The mean, or expectation, of

the random variable X gives a single value that acts as a representative or average of the values of X, and for

this reason it is often called a measure of central tendency.

Let X be a discrete random variable which takes values xi (x1, . . . ,xn) with corresponding

probabilities P(X = xi) = p(xi), i = 1, . . . , n. Then the expectation of X (or mathematical expectation

or mean value of X) is denoted by E(X) and is defined as:

n

E(X) = x1p(x1) + . . . + xnp(xn) = x p( x )

i 1

i i

= x p( x )

x

Example: A school class of 120 students is driven in 3 buses to a symphonic performance. There are 36

students in one of the buses, 40 in another, and 44 in the third bus. When the buses arrive, one of the

120 students is randomly chosen. Let X denote the number of students on the bus of that randomly

chosen student, and find E[X].

Solution: Since the randomly chosen student is equally likely to be any of the 120students, it follows that:

36 40 44

P{X = 36} = 120 , P{X = 40} = 120 , P{X = 44} = 120 .

3 1 11 1208

Hence E(X) = 36x10 +40x3 +44x30 = = 40.2667.

30

Example: Let a fair die be rolled once. Find the mean number rolled, say X.

Solution: Since S = { 1, 2, 3, 4, 5, 6} and all are equally likely with prob. of 1/6, we have

1 1 1 1 1 1 21

E ( X ) 1. 2. 3. 4. 5. 6. 3.5.

6 6 6 6 6 6 6

Example: A lot of 12 TV sets includes two which are defectives. If two of the sets are chosen at random,

find the expected number of defective sets.

Then, the possible values of X are 0, 1, 2. Using conditional probability rule, we get,

74

P(X 0) P (both non defective) = 10 9 15 , P(X 2)

P (both defective) = 2 1 1 ,

12 11 22 12 11 66

= P (first defective and second good) + P (first good and second defective)

2 10 10 2 10 10 10

.

12 11 12 11 66 66 33

66 66 33

2

15 10 1 1

E ( X ) xi P( X xi ) 0 1 2 .

i 0 22 33 66 3

The mathematical expectations, in general, of a continuous r-v are defined in a similar way with those of

a discrete r-v with the exception that summations have to be replaced by integrations on specified

domains. Let the random variable X is continuous with p.d.f. f(x), its expectation is defined by:

E ( X ) x f ( x) dx , provided this integral exists.

1

x 0 x 2

Example: The density function of a random variable X is given by: f ( x) 2 Then, find

0 otherwise

2 21

Solution: E(X) = ∫0 xf x dx = ∫0 2 x 2 dx = [1/6 x3]20 = 4/3.

Example: Find the expected value of the random variable X with the CDF of F(x) = x3,0 < x< 1.

1 1 1 1

Solution: E(X) = ∫0 xf x dx = ∫0 x 4 dx = 5 [x5]10 = 5.

The Statistics that we will subsequently use for making inferences are computed from the data contained in a

sample. The sample measurements can be viewed as observations on n random samples,x 1, x2, x3, …, xn.

Since the sample Statistics are functions of the random variables x 1, x2, x3, …, xn, they also will be random

variables and will possess probability distributions. To describes these distributions, we will define the

expected value (or mean) of functions of random variables.

75

Now let us consider a new random variable g(X), which depends on X; that is, each value of g(X) is

determined by the value of X. In particular, let X be a discrete random variable with probability function

p(x). Then Y = g(X) is also a discrete random variable, and the probability function of Y is p(y) = P(Y = y) =

x

P ( X x)

g ( x ) y

x

f ( x)

g ( x ) y

and hence we can define expectation of functions of random variables as:

(a) If X is a discrete random variable and p(xi) = P(X=xi) is the p.m.f, we will have

E(Y) = E(g(X)) = g ( x ) p( x )

i 1

i i

E(Y) = E(g(X)) = g ( x) f ( x) dx

The reader should note that the way to calculate the expected value, or mean,shown here is different from the

way to calculate the sample mean described in Introduction to Statistics, where the sample mean is obtained

by using data. Here is in random variable, the expected value is calculated by using the probability

distribution. However, the mean is usually understood as a “center” value of the underlyingdistribution if we

use the expected value.

Example: Suppose that a balanced die is rolled once. If X is the number that shows up,find the

expected value of g ( X ) 2 X 2 1 .

Solution: Since each possible outcome has the probability 1/6, we get,

6

1

E ( g ( X )) (2 x 2 1). (2 12 1). (2 6 2 1)

1 1 94

.

x 1 6 6 6 3

Let X and Y be random variables with joint probability distribution p(x, y) [or f(x, y)] and let H = g(x, y) be a

real valued function of (X, Y), then the mean, or expected value, of the random variable (X,Y) and g(X, Y)

are:

E[XY] = xyf ( x, y) dx dy if X and Y are continuous random variables.

76

E[g(X, Y)] = g ( x, y) f ( x, y) dx dy if X and Y are continuous random variables.

2

( x 2 y) 0 x 1, 1 y 2

f ( x) 7 Then find the expected value of g(X, Y) = X/Y?

0 otherwise

2 2 1𝑥 2 2 1 𝑥2

Solution: E{g(x, y)} = ∫ ∫ 𝑔 𝑥, 𝑦 𝑓 𝑥, 𝑦 𝑑𝑥𝑑𝑦 = 7 ∫1 ∫0 𝑦 (x + 2y)dxdy = 7 ∫1 ∫0 { 𝑦 + 2x}dxdy

2 2 1 𝑥3 2 2 1 1 2

= 7 ∫1 ∫0 { 3𝑦 + x2}dy = 7 ∫1 ∫0 { 3𝑦 + 1}dy = 7{1/3 (ln2 –ln1) + 1}

= 0.35172

Remark

In calculating E(X) over a two-dimensional space, one may use either the joint probability distribution of X

and Y or the marginal distribution of X as:

E[X] = 𝑥 𝑦 𝑥𝑝(𝑥, 𝑦) = 𝑥 𝑥 𝑝𝑥 (𝑥)if X is discrete random variable.

∞

E[X] = xf ( x, y) dx dy

= ∫−∞ xpx x dx if X is continuous random variable, where px(x) is the marginal

E[Y] = x y yp(x, y) = yy py (y) if Y is discrete random variable.

∞

E[Y] = yf ( x, y) dx dy

= ∫−∞ ypy y dy if Y is continuous random variable, where py(y) is the marginal

Let X is a random variable. The variance of X, denoted by V(X) or Var(X) or δ2x , defined as:

V(X) = E (X – E(X))2 = V ( X ) E ( X 2 ) [ E ( X )]2 E ( X 2 ) xi f ( xi )

2

where

Note that, the positive square root of V(X) is called the standard deviation of X and denoted by σx .Unlike the

variance, the standard deviation is measured in the same units as X (and E(X)) and serves as a yardstick of

measuring deviations of X from E(X).

Examples: Find the expected value and the variance of the r-v given in as

x if 0 < 𝑥 < 1

f x = 2 − x if 1 < 𝑥 < 2

0 elsewhere

77

1 2 1 2

Solution: E ( X ) x. f ( x)dx x.xdx x.(2 x)dx x dx (2 x x 2 )dx

2

0 1 0 1

2

3 1

x x

3

1 8 1

x 2 4 1 1 4 2 = 1.

3 0 3 1 3 3 3 3 3 3

1 2 1 2

E( X 2 ) x . f ( x)dx x .xdx x (2 x)dx = x 3 dx (2 x 2 x 3 )dx

2 2 2

0 1 0 1

2

4 1

x 2 x4 1 16 2 1

x3 4 1 4 5 7 .

4 0 3 4 1 4 3 3 4 4 3 12 6

V ( X ) E ( X 2 ) E ( X )

7 2 1

1 .

2

6 6

Properties of Expectation and Variance

There are cases where our interest may not only be on the expected value of a r -v, but also on the expected

value of a r -v related to X. In general, such relations are useful to explain the properties of the mean and the

variance.

o If b is constant, then E (b) b .

o If a and b are constants, then E(aX b) aE( X ) b .

o Let X and Y are any two random variables. Then E(X + Y) = E(X) + E(Y). This can be generalized to

n random variables, That is, if X1, X2, X3,. . . ,Xn are random variables then, E(X1 + X2 + X3+ . . . +

Xn) = E(X1) + E(X2) + E(X3) + . . . + E(Xn)

o Let X and Y are any two random variables. If X and Y are independent. Then E(XY) = E(X)E(Y)

o Let (X, Y) is a two dimensional random variable with a joint probability distribution. Let Z = H1(X,

Y) and W = H2(X, Y). Then E(Z + W) = E(Z) + E(W)

o For constant values a and b, V (aX b) a 2V ( X ) .

o Variance is not independent of change of scale, i.e. V (aX ) a2V ( X )

o Variance is independent of change of origin, i.e., V ( X b) V ( X )

o Variance of a constant is zero, i.e., V (b) 0 .

o Let X1, X2, X3, . . . , Xn be n independent random variable, then V(X1 + X2 + X3 + . . . + Xn) = V(X1)

+ V(X2) + V(X3) + . . . + V(Xn)

o If (X, Y) be a two dimensional random variable, and if X and Y are independent thenV(X + Y) =

V(X) + V(Y) and V(X - Y) = V(X) + V(Y)

78

Examples: A continuous random variable X has probability density given by

0 x≤0

(a) The variance of X (b) The standard deviation of X (c) Var (KX) (d) Var (K + X)

Solution: (a) V(X) = ∫ x 2 f x dx - [E(X)] = ∫ 2x 2 e−2x dx - [∫ 2xe−2x dx] = 2(1/2)2 – (1/2)2 = ¼

2 2

(b) SD(V(X)) = 1

4=½

K2

(c) V(KX) = K2V(X) = (d) V(K + X) = V(X) =1/4.

4

Example: Let X be a random variable with p.d.f. f (x) = 3x2, for 0 < x <1.

(a) Calculate the Var (X). (b) If the random variable Y is defined by Y = 3X − 2, calculate the Var(Y).

8.4. Chebyshev’s Inequality

Let X be random variable with E(X) = µ and variance σ2and let k be any positive constant. Then the

probability that any random variable X will assume a value within k standard deviations of the mean is at

1 1

least 1 − . Thatis,P(μ − kσ < X < μ+ kσ) ≥ 1 − 𝑘 2 .

𝑘2

Note that, Chebyshev‟s theorem holds for any distribution of observations, and for this reason the results are

usually weak. The value given by the theorem is a lower bound only. That is, we know that the probability of

a random variable falling within two standard deviations of the mean can be no less than 3/4, but we never

know how much more it might actually be. Only when the probability distribution is known can we

determine exact probabilities. For this reason we call the theorem a distribution-free result. The use of

Chebyshev‟s theorem is relegated to situations where the form of the distribution is unknown.

Examples: A random variable X has a mean μ = 8, a variance σ2= 9, and an unknown probability

distribution. Find

(a) P(−4 < X <20),

(b) P(|X − 8| ≥ 6).

Solution: (a) p(-4 < X < 20) = p{(-4-8)/3 < Z < (20-8)/3} = p(-4 < Z < 4) = 1

(b) p({|X-8|≥6} = p(14 < X or X > 14) = p(14 < X) + p(X > 14)

= p(-1.33 < Z) + p(Z > 2) = 0.5- p(0 < Z 1.33) + 0.5 – p(0 < Z <2)

= 1 – (0.4082 + 0.4772) = 0.1146

79

8.5. Covariance and Correlation Coefficient

8.5.1.Covariance

The covariance between two random variables is a measure of the nature of between the two. If large values

of X often result in large values of Y or small values of X result in small values of Y , positive X−μX will often

result in positive Y −μY and negative X−μX will often result in negative Y −μY . Thus, the product (X −μX)(Y

−μY ) will tend to be positive. On the other hand, if large X values often result in small Y values, the product

(X−μX)(Y −μY ) will tend to be negative. The sign of the covariance indicates whether the relationship

between two dependent random variables is positive or negative.

Cov (X, Y) = 𝜎𝑥𝑦 = E[(X – E(X))(Y – E(Y)] = E(XY) − (EX)(EY)

N.B.:When X and Y are statistically independent, it can be shown that the covariance is zero.

Cov (X, Y) = E[(X – E(X))(Y – E(Y)] = E[(X – E(X))]E[(Y – E(Y)] = 0.

Thus if X and Y are independent, they are also uncorrelated. However, the reverse is not true as illustrated by

the following example.

Examples: The pair of random variables (X, Y) takes the values (1, 0), (0, 1), (−1, 0), and (0,−1), each with

probability ¼.

Solution: The marginal p.m.f.`s of X and Y are symmetric around 0, &E[X] = E[Y ] = 0. Furthermore, for all

possible value pairs of (x, y), either x or y is equal to 0, which implies that XY = 0 and E[XY ] = 0.

Therefore, Cov(X, Y) = E[(X – E(X)(Y – E(Y)] = 0

Properties of co variance

o Cov(X, Y) = Cov (Y, X)

o Cov (X, X) = Var(X)

o Cov(KX, Y) = K Cov(X, Y) for a constant K

o Var (X ± Y) = Var (X) + Var (Y) ± 2 Cov (X, Y)

80

8.5.2. Correlation Coefficient

Although the covariance between two random variables does provide information regarding the nature of the

relationship, the magnitude of σXY does not indicate anything regarding the strength of the relationship, since

σXY is not scale-free. Its magnitude will depend on the units used to measure both X and Y. There is a scale-

free version of the covariance called the correlation coefficient that is used widely in statistics.

Let X and Y be random variables with covariance Cov(X, Y)and standard deviations σXand σY, respectively.

The correlation coefficient (or coefficient of correlation)ρ of two random variables X and Y that have none

zero variances is defined as:

xy

Cov ( X ,Y ) E(XY) - E(X)E(Y) σ xy E{[ X – E(X)][Y – E(Y)]}

= = = .

Var ( X )Var (Y ) Var ( X )Var (Y ) σx σy 𝑉 𝑋 𝑉 (𝑌 )

It should be clear to the reader that ρXY is free of the units of X and Y. The correlation coefficient satisfies the

inequality −1 ≤ ρXY ≤ 1 and it assumes a value of zero when σXY = 0.

Examples: Let X and Y be random variables having joint probability density function

x y 0 x 1, 0 y 1

f ( x, y ) then find Cor(X,Y)

0 elsewhere

E(XY) - E(X)E(Y)

Solution: Cor(X, Y) = =((1/3) – (7/12)(7/12)}/ (264/3456) = 8.636364.

Var ( X )Var (Y )

81

9. Common Probability distributions

9.1. Common Discrete Distributions and their Properties

9.1.1. Binomial distribution

In this sub-unit, we shall study one of the most popular discrete probability distributions, namely, the

Binomial distribution. It simplifies many probability problems which, otherwise, might be very tedious and

complicated while listing all the possible outcomes of an experiment.

Repeated trials play an important role in probability and statistics, especially when the number of trial (n)is

fixed, the parameter p (the probability of success) is same for each trial, and the trial are all independent.

Several random variables are a rise in connection with repeated trials. The one we shall study here concerns

the total number of success.

Asking 200 people whether they watch BBC news.

Rolling a die 10 times to see if a 5 appears.

A random variable X has Binomial distribution and it referred to as a Binomial random variable if and only if

n x

its probability distribution given by: f x; n, θ = θ 1 − θ n−x for x = 0, 1, . . . , n. In general

x

binomial distribution has the following characteristics:

Only two possible outcomes: success (S) or Failure (F).

P(S) (fixed at any trial).

The n-trials are independent

n n x

Mean: E(X) = µ = xp x = x=1 x θ (1 − θ)n−x = n θ

x

Variance: Var(X) = E(X – E(x))2 = nθ (1 − θ)

Remark

the mean of the Binomial distribution is

n

E ( X ) x P( X x)

x 0

n

= x

x 0

n

c x p x q n x

82

n

= x

x 0

n

c x p x q n x

n

n!

=x p x q n x

x 0 x!(n x)!

n

n(n 1)!

=x p p x 1 q n x

x 0 x( x 1)!(n x)!

n

(n 1)!

= np p x 1q n x

x 1 ( x 1)!(n x)!

n

= np n 1

c x 1 p x 1 q n x

x 1

= np(q p) n1

= np(1) n1 [ q p 1 ]

= np

The mean of the binomial distribution is np

The variance of the Binomial distribution is

V ( X ) E ( X 2 ) [ E ( X )]2

Now,

n

E( X 2 ) = = x

x 0

2 n

c x p x q n x

n

= [ x( x 1) x]

x 0

n

c x p x q n x

n n

n! n!

= x( x 1) p x q n x + x p x q n x

x 0 x!(n x)! x 0 x!(n x)!

n

n(n 1)(n 2)!

= x( x 1) p 2 p x 2 q n x E ( X )

x 0 x( x 1)( x 2)!(n x)!

n

(n 2)!

= n(n 1) p 2 p x 2 q n x np

x 2 ( x 2)!(n x)!

n

= n(n 1) p 2 n2

c x 2 p x 2 q n x np

x 2

83

= n(n 1) p 2 (q p) n2 np

Putting (2) in (1) we get

V (X ) n(n 1) p 2 np - (np) 2

= np(np p 1 np)

= np(1 p)

= npq

The variance of the Binomial distribution is npq

Example: A machine that produces stampings for automobile engines is malfunctioning and producing

5%defectives. The defective and non-defective stampings proceed from the machine in a random

manner. If the next five stampings are tested, find the probability that three of them are defective.

Solution: Let x equal the number of defectives in n = 5 trials. Then x is a binomial random variable with p,

the probability that a single stamping will be defective, equal to 0.05, and q = 1- 0.05 = 1 – 0.05 = 0.95.

The probability distribution for x is given by the expression:

5

P(X 3) 0.053 (1 0.05) 53

3

5!

(0.05) 3 (0.95) 2

3! (5 - 3)!

5x4x3x2x1

(0.05) 3 (0.95) 2

3x2x1(2x1)

Example: If the probability is 0.20 that a person traveling on a certain airplane flight will request a

vegetarian lunch, what is the probability that three of 10 people traveling on this flight will request a

vegetarian lunch?

10

P( X 3) 0.2 (0.8) 7 0.201 .

3

3

84

9.1.2. Poisson Distribution

The Poisson probability distribution, named for the French Mathematician S.D. Poisson (1781-1840),

provides a model for the relative frequency of the number of “rare events” that occurs in a unit of time, area,

volume, etc.

Examples of events whose relative frequency distribution can be Poisson probability distributions are:

The number of new jobs submitted to a computer in any one minute,

The number of fatal accidents per month in a manufacturing plant,

The number of customers arrived during a given period of time,

The number of bacteria per small volume of fluid,

The number of customers arrived during a given period of time.

The properties of Poisson random variables are the following.

The experiment consists of counting the number of items X a particular event occurs during a given

units,

The probability that an event occurs in a given units is the same for all the units,

The number of events that occur in one unit is independent of the number that occurs in other units.

And a random variable X has Poisson distribution with parameter 𝜆 and it referred to as a Poisson random

𝜆 𝑥 𝑒 −𝜆

variable if and only if its probability distribution given by: 𝑝 𝑥; 𝜆 = for x = 0, 1, 2, . . .

𝑥!

∞ 𝜆 𝑥 𝑒 −𝜆

Mean: E(X) = µ = 𝑥=0 𝑥 𝑥 ! = λ

Remark

The mean and variance for a Poisson distribution are both .

x

x 1

E(X) = x

x 0 x!

e , (letting y = x - 1) e

x 1 ( x 1)!

y

= e e e

y 0 y!

x

( x 1 1) x 1 ( x 1) x 1 x 1

E(X2) = x 2 e e e e

x 0 x! x 1 ( x 1)! x 1 ( x 1)! x 1 ( x 1)!

85

y y y

(re writing x – 1 as y) then E(X2) =

y 0 y !

e e e e 2 , hence

y 0 y !

Example: Suppose that customers enter a waiting line at random at a rate of 4 per minute. Assuming that the

number entering the line during a given time interval has a Poisson distribution, find the probability

that one customer enters during a given one-minute interval of time?

1 4

Solution: Given 4 per min, P( x 1) 4 e 4e 4 0.0733 .

1!

Geometric distribution arises in a binomial experiment situation when trials are carried out independently

(with constant probability𝑝 of Success) until the first occurs. The random variable X denoting the number of

required trials is a geometrically distributed with parameter p.

Often we will be interested in measuring the length of time before some event occurs, for example, the

length of time a customer must wait in line until receiving service, or the length of time until a piece of

equipment fails. For this application, we view each unit of time as Bernoulli trail and consider a series of

trails identical to those described for the Binomial experiment. Unlike the Binomial experiment where X is

the total number of successes, the random variable of interest here is X, the number of trails (time units) until

the first success is observed.

And a random variable X has Geometric distribution with parameter P and it referred to as a Geometric

x−1

random variable if and only if its probability distribution given by: p x; p = p 1 − p , x = 0,1, 2, . . .,

where p is probability of success and x is number of trials until the first success occurs.

∞ x−1 1

Mean: E(X) = µ = x=1 xp 1−p =

p

1−p q

Variance: Var(X) = E(X – E(x))2 = = p2

p2

86

Example: If the probability is 0.75 that an applicant for a driver‟s license will pass the road test on any given

try. What is the probability that an applicant will finally pass the test on the fourth try?

Solution: Assuming that trials are independent, we substitute x=4 and p=0.75 into the formula for the

𝑥−1 4−1

geometric distribution, to get: p(x) = 𝑝 1 − 𝑝 = 0.75 1 − 0.75 = 0.75(0.25)3 = 0.011719

9.2.1. Uniform Distribution

One of the simplest continuous distributions in all of statistics is the continuous uniform distribution. This

distribution is characterized by a density function that is “flat,” and thus the probability is uniform in a closed

interval, say [a, b].Suppose you were to randomly select a number X represented by a point in the interval

l𝑎 ≤ 𝑥 ≤ 𝑏. The density function of X is represented graphically as follows.

1

Note that the density function forms a rectangle with base b−a and constant height to ensure that the

b−a

area under the rectangle equals one. As a result, the uniform distribution is often called the rectangular

distribution.

A random variable of the shown in the above graph is called a uniform random variable. Therefore, the

probability density function for a uniform random variable, X with the parameters of a and b is given by:

1

, a ≤x ≤b

f(x) = b−a

0, elsewhere

b 1 a+b

Mean: E(X) = µ =∫a x b−a dx= 2

(b− a)2

Variance: Var(X) = E(X – E(x))2 = 12

Example: The department of transportation has determined that the winning (low) bid X (in dollars) on a

5 2d

road construction contract has a uniform distribution with probability density function f(x) = 8d, if 5 <

x< 2d, where d is the department of transportation estimate of the cost of job. (a) Find the mean and

SD of X. (b) What fraction of the winning bids on road construction contracts are greater than the

department of transportation estimate?

87

2𝑑 5

Solution: (a) E(X) = ∫2𝑑/2 𝑥 8𝑑 𝑑𝑥 = (2d- 2d/2)/2 = d/2

(2𝑑− 2𝑑/2)2

V(X) = E(X – E(x))2 = = d2/12

12

2𝑑 5 5 5 5

(b) p(X > d) = ∫𝑑 𝑑𝑥 = 8𝑑 [x]2𝑑

𝑑 = 8𝑑 (2d - d) = 8

8𝑑

9.2.2.Normal Distribution

A random variable X is normal or normally distributed with parameters μ and σ2, (abbreviated N(μ, σ2)), if it

is continuous with probability density function:

1 x μ 2

1 ( )

f(x) e2 σ

x ; σ 0 and μ ,the parameters μ and σ2 are the mean and the variance,

σ 2Π

respectively, of the normal random variable.

1. The curve is bell-shaped.

2. The mean, median and mode are equal and located at the center of the distribution.

3. The curve is symmetrical about the mean and it is uni-modal.

4. The curve is continuous, i.e., for each X, there is a corresponding Y value.

5. It never touches the X axis.

6. The total area under the curve is 1 and half of it is 0.5000

7. The areas under the curve that lie within one standard deviation, two and three standard deviations of

the mean are approximately 0.68 (68%), 0.95 (95%) and 0.997 (99.7%) respectively.

Graphically, it can be shown as:

88

9.2.2.1. Standard Normal Distribution

If we want to compute the probability P(a X b) , we have to evaluate the area under the normal curve

f (x) on the interval (a, b). This means we need to integrate the function f (x) defined above. Obviously,

the integral is not easily evaluated. That is,

b x 2

1

P ( a X b)

2

e

a

2 2

dx cannot be integrated directly.

But this is easily evaluated using a table of probabilities prepared for a special kind of normal distribution,

called the standard normal distribution.

𝑋−𝜇

If X is a normal random variable with the mean μ and variance σ then the variable Z = is the

𝜎

standardized normal random variable. In particular, if μ = 0 and σ= 1, then the density function is called the

standardized normal density and the graph of the standardized normal density distribution is similar to

normal distribution.

Convert all normal random variables to standard normal in order to easily obtain the area under the curve

with the help of the standard normal table.

Let X be a normal r-v with mean and standard deviation . Then we define the standard normal variable

Z as: Z X . Then the pdf of Z is, thus, given by:

1

1 2 z2

f ( z) e , z .

2

89

Properties of the Standard Normal Curve (Z):

1. The highest point occurs at μ=0.

2. It is a bell-shaped curve that is symmetric about the mean, μ=0. One half of the curve is a mirror image of

the other half, i.e., the area under the curve to the right of μ=0 is equal to the area under the curve to the

left of μ=0 equals ½.

5. The total area under the curve equals one.

6. Empirical Rule:

Approximately 68% of the area under the curve is between -1 and +1.

Approximately 95% of the area under the curve is between -2 and +2.

Approximately 99.7% of the area under the curve is between -3 and +3.

i. Draw the picture

ii. Shade the desired area /region

i. If the area/region is:

between 0 and any Z value, then look up the Z value in the table,

in any tail, then look up the Z value to get the area and subtract the area from 0.5000,

between two Z values on the same side of the mean, then look up both Z values from the table

and subtract the smaller area from the larger,

between two Z values on opposite sides of the mean, then look up both Z values and add the

areas,

less than any Z value to the right of the mean, then look up the Z value from the table to get the

area and add 0.5000 to the area,

greater than any Z value to the left of the mean, then look up the Z value and add 0.5000 to the

area,

in any two tails, then look up the Z values from the table, subtract the areas from 0.5000 and

add the answers.

Note that finding the area under the curve is the same as finding the probability of choosing any Z value at

random.

90

Example: Find the probabilities that a r-v having the standard N.D will take on a value

a) Less than 1.72; b)Less than -0.88;

c) Between 1.30 and 1.75; d) Between -0.25 and 0.45.

Solution:

a) P(Z 1.72) P(Z 0) P(0 Z 1.72) 0.5 0.4573 0.9573 .

b) P(Z 0.88) P(Z 0.88) 0.5 P(0 Z 0.88) 0.5 0.3106 0.1894 .

c) P(1.30 Z 1.75) P(0 Z 1.75) P(0 Z 1.30) 0.4599 0.4032 0.0567 .

d) P(0.25 Z 0.45) P(0.25 Z 0) P(0 Z 0.45) .

P(0 Z 0.25) P(0 Z 0.45) 0.0987 0.1736 0.2723 .

Remark

The curve of any continuous probability distribution or density function is constructed so that the area under

the curve bounded by the two ordinates a= x1 and b= b equals the probability that the random variable X

assumes a value between a= x1and x = b. Thus, for the normal curve:

a X b

P(a X b) P P( z1 Z z 2 ),

Now, we need only to get the readings from the Z- table corresponding to z1 and z2 to get the required

probabilities, as we have done in the preceding example.

Example 9.5:If the scores for an IQ test have a mean of 100 and a standard deviation of 15, find the

probability that IQ scores will fall below 112.

Solution: IQ ~ N(100, 225)

Y μ 112 100

P(Y 112) P[ ]

σ 15

P[Z .800] 0.500 P(0 Z .800) 0.500 0.2881 0.7881

Exponential distribution is an important density function that employed as a model for the relative frequency

distribution of the length of time between random arrivals at a service counter when the probability of a

costumer arrival in any one unit of time is equal to the probability of arrival during any other. It is also used

as a model for the length of life of industrial equipment or products when the probability that an “old”

component will operate at least t additional time units, given it is now functioning, is the same as the

probability that a “new” component will operate at least t time units. Equipment subject to periodic

maintenance and parts replacement often exhibits this property of “never growing old”.

91

The exponential distribution is related to the Poisson probability distribution. In fact, it can be shown that if

the number of arrivals at a service counter follows a Poisson probability distribution with the mean number

1

of arrivals per unit of time equal to 𝛽 .

The continuous random variable X has an exponential distribution, with parameter β, if its density function

−𝑥

𝑒 𝛽

is given by: f(x) = , x ≥ 0, 𝛽 ≥ 0 .

𝛽

−𝑥

∞ 𝑒 𝛽

Mean: E(X) = µ =∫0 𝑥 𝑑𝑥== 𝛽

𝛽

−𝑥

∞ 𝑒 2

Variance: Var(X) = E(X – E(x))2= ∫0 𝑥 2 𝑑𝑥 - 𝛽 2 = 𝛽 2

2

−x

e 2

Example: Let X be an exponential random variable with pdf of : f(x) = , x ≥ 0then finf the mean and

2

−x

∞ e 2

Solution: E(X) = µ =∫0 x dx= 2 and Var(X) = E(X – E(x))2 =4.

2

Example: The probability density of X is f (x)= 3e−3x forx > 0 then what is the mean and variance

0 elsewhere

of this pdf?

Solution: this distribution is an exponential and the mean and variance it is obtain in the manner as: E(X) =

∞ ∞

∫0 x 3e−3x dx= 1/3 and V(X) = ∫0 x 2 3e−3x dx – (1/3)2 = 1/9.

92

- CB 5 Session 1 -SMEA ProcessesUploaded byCacait Rojanie
- QNT 351 Final Exam GuideUploaded byassignmentclick07
- Research Methods - Curs 2 EnUploaded bySheri Wright
- 12MBA13Uploaded by29_ramesh170
- Esead SlidesUploaded byteste210
- Chapter 1 End of Chapter SolutionsUploaded byHan Myo
- akmal qmt.pdfUploaded byHanif Farhaty
- Business StatUploaded bysajjadmir
- MB0050 Research MethodologyUploaded byRajesh Sonkar
- Central TendencyUploaded byAdityaBalaji
- MK0004Uploaded byraytanma
- Reviewer StatisticsUploaded byShayne De Torres
- Chapter_1Uploaded byKyutiMiralles
- Course Outline BMA 2210Uploaded bysamuthee
- GuideSelectingStatisticalTechniques_OCR.PDFUploaded byLourdes Durand
- productFlyer_978-1-4939-2613-8Uploaded byBaderalhussain0
- STA301FormulasDefinitions01to45Uploaded byM Fahad Irshad
- The Role of Fuzzy Sets in Decision Sciences -Old Techniques and New DirectionsUploaded byMuhammad Imran Khan
- Engineering MathematicsUploaded byAsif Mohammed
- PECSUploaded byJulie Pearl Marie Lovite
- Knowledge of Common Freshmen Paulinians About St. Paul University Iloilo (1)Uploaded byKendrickAlbasonCallao
- 03_SamplingDataCollectionUploaded byrajeshviswa
- A Decision-Support Model of Land Suitability Analysis for the Ohio Lake Erie Balanced Growth Program.pdfUploaded byiplascak
- STATISTICSUploaded byCHARLES CALIXTO
- SDA_BookUploaded byDrDhananjhay Gangineni
- Evans_Analytics1e_ppt_01.pptxUploaded bySai Kumar
- Tools and Techniques of Data CollectionUploaded bycuteangels
- HistogramUploaded byAlex Chen
- feporfolio mathUploaded byapi-251337304
- Libro 'Obteniendo Resultados-GTO'Uploaded byHéctor Sanjuán García

- Assigment Ip Addresses - Siti Nordiyana Binti RodiUploaded byNur Diyana
- HTML Basics 1Uploaded byaggarwalmegha
- Application - Portal (3).pdfUploaded bytazeb Abebe
- CHAPTER 1-1Uploaded bytazeb Abebe
- CHAPTER 1-1Uploaded bytazeb Abebe
- MEN GET COUploaded bytazeb Abebe
- Introduction History ArchitectureUploaded bytazeb Abebe
- Application - Portal (3)Uploaded bytazeb Abebe
- EmfUploaded bytazeb Abebe
- Geometric Design Manual with appendices Final - Chapter 1-4.pdfUploaded bytazeb Abebe
- Geometric Design Manual with appendices Final - Chapter 1-4.pdfUploaded bytazeb Abebe
- 07 - Chapter-7 Sight DistanceUploaded bytazeb Abebe
- Application - PortalUploaded bytazeb Abebe
- bdUploaded bytazeb Abebe
- A9F6BC47Uploaded bytazeb Abebe
- Worked example.pdfUploaded bytazeb Abebe
- Chap 1 (Introduction)Uploaded bytazeb Abebe
- Chapter-1 Funad. Eng MaterialUploaded bytazeb Abebe
- Ch 6Uploaded bytazeb Abebe
- Presentation 1Uploaded bytazeb Abebe
- Lecture 1Uploaded bytazeb Abebe
- CH-2 (2).pptxUploaded bytazeb Abebe
- in1Uploaded bytazeb Abebe
- What is PiezoelectricityUploaded bytazeb Abebe
- CH-2 (2).pptxUploaded bytazeb Abebe
- Pile Foundation DesignUploaded bycuongnguyen
- ETC BME-1Uploaded bytazeb Abebe
- Electronic_1st-7-8.pdfUploaded bytazeb Abebe
- yaredoUploaded bytazeb Abebe

- strategies for activating prior knowledgeUploaded byapi-317116910
- Reported SpeechUploaded byDago Sancho
- Contoh2 Ind Formatif@Assessmt Form 2014 Year 3LUploaded byAnn Essesma
- Teacher's Pack 4 Unit 1_final_0.pdfUploaded byRoxana Madalina Albu
- Interact Teacher's Guide level 3 low_170814.pdfUploaded byYo Rk
- Postal Exam 710Uploaded byMISTAcare
- LESSON PLAN Writing TaskUploaded byPipit Novita
- Certificate in teaching and learning-assessment taskUploaded bysaleem2412
- critical thinking portfolioUploaded byapi-238517108
- a tuning protocol 667Uploaded byapi-313541549
- Plenary 2 - Informal Formal Writing and Defining ResearchUploaded byBass Boosterz
- Real Answers to the Meaning of Life and Finding Happiness[1]Uploaded byKilliam Wettler
- Define UrselfUploaded byapi-27477209
- gradeUploaded byManshuk Yegemberdiyeva
- M02_GXP_TB_A1GLB_3669_U02Uploaded byMiriam Gugelmeier
- 2017 examiners reportUploaded byapi-416655882
- Final Academic Writing Assignment Guide-1Uploaded byRebel Bear
- Indirect QuestionsUploaded byNedy Hortet
- Project IdentficationUploaded byherbert_musoke
- Teacher's NotesUploaded bymiguel_santaella20
- June 2007 Solutions - Atlas LSATUploaded byLeigh Ann Webb
- 01 Phonics Year 1 Overview & Sample SOW.pdfUploaded byeijiez
- ENG INTONATION Meaning copia.pdfUploaded byTexia Dominguez
- Solar Energy Field TripUploaded byScience Companion
- What to Do When Your Witness ForgetsUploaded byMaryrose
- 3_TNA_01Uploaded byaichahmeni
- 04chapter4.pdfUploaded byTyrone Bandola
- Olevel English Standards BookletUploaded byDean Ambrose
- Charu CR NotesUploaded bykumargaurav_29
- คู่มือฝึกพูดภาษาไทย #๓# (ซ่อมแซม)Uploaded byKongsak Khoburi