Sie sind auf Seite 1von 103

What is Statistics?

McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved.
What is Meant by Statistics?

Statistics is the science of collecting,


organizing, presenting, analyzing, and
interpreting numerical data to assist in
making more effective decisions.
OR A set of mathematically based tools
and techniques to transform raw data
into few summary measures

1-2
Why Study Statistics?

1. Numerical information is everywhere


2. Statistical techniques are used to make decisions
that affect our daily lives
3. The knowledge of statistical methods will help you
understand how decisions are made and give you a
better understanding of how they affect you.
Is a vital tool in research.
No matter what line of work you select, you will find
yourself faced with decisions where an
understanding of data analysis is helpful.

1-3
Why Study Statistics

 Strengthens management decisions by


providing evidence based information or
quantifiable basis.
 This increases the confidence in decision
making
TASK 1: Give a detailed account on how
and why managers use Statistics?

1-4
What is Meant by Statistics?

 In the more common usage, statistics refers


to numerical information
Examples: the average starting salary of college graduates, the
number of deaths due to alcoholism last year, the change in the
Dow Jones Industrial Average from yesterday to today, and the
number of home runs hit by the Chicago Cubs during the 2007
season.
 We often present statistical information in a
graphical form for capturing reader attention
and to portray a large amount of information.

1-5
Formal Definition of Statistics

STATISTICS The science of collecting, organizing, presenting,


analyzing, and interpreting data to assist in making more effective
decisions.
Some examples of the need for data collection.
1. Research analysts for Merrill Lynch evaluate many facets of a
particular stock before making a “buy” or “sell” recommendation.
2. The marketing department at Colgate-Palmolive Co., a manufacturer
of soap products, has the responsibility of making recommendations
regarding the potential profitability of a newly developed group of
face soaps having fruit smells.
3. The United States government is concerned with the present
condition of our economy and with predicting future economic trends.
4. Managers must make decisions about the quality of their product or
service.

1-6
Who Uses Statistics?

Statistical techniques are used


extensively by marketing,
accounting, quality control,
consumers, professional sports
people, hospital administrators,
educators, politicians, physicians,
etc...

1-7
Types of Statistics – Descriptive Statistics and
Inferential Statistics

Descriptive Statistics - methods of organizing,


summarizing, and presenting data in an
informative way.
EXAMPLE 1: The United States government reports the population of the
United States was 179,323,000 in 1960; 203,302,000 in 1970;
226,542,000 in 1980; 248,709,000 in 1990, and 265,000,000 in 2000.

EXAMPLE 2: According to the Bureau of Labor Statistics, the average hourly


earnings of production workers was $17.90 for April 2008.

1-8
Types of Statistics – Descriptive Statistics and
Inferential Statistics

Inferential Statistics: A decision, estimate,


prediction, or generalization about a
population, based on a sample.

Note: In statistics the word population and sample have a broader


meaning. A population or sample may consist of individuals or
objects

1-9
Population versus Sample

A population is a collection of all possible individuals, objects, or


measurements of interest under study.

A sample is a portion, or part, or subset of data values drawn from


a population of interest

1-10
Why take a sample instead of studying every
member of the population?

1. Prohibitive cost of census


2. Destruction of item being studied may be
required
3. Not possible to test or inspect all members
of a population being studied
4. Time constraint

1-11
A Sampling Unit

 Is the object being measured, counted or


observed with respect to the random variable
under study.
 The question is “What is a Random
Variable?”
 Any attribute of interest on which data is
collected and analysed.
 Eg brand of coffee preferred, daily
occupancy rates, output per worker etc.

1-12
Population Parameter

 A measure that describes a characteristic of


a population e.g population average,
population proportion.

 It is a parameter because it uses all the


population data to compute its value.

1-13
Sample Statistic

 A measure that describes a characteristic of


a sample e.g sample average and sample
proportion.
 Give any two examples of appropriate
sample statistics in business.

1-14
Usefulness of a Sample in Learning about a
Population

Using a sample to learn something about a


population is done extensively in business,
agriculture, politics, and government.

EXAMPLE: Television networks constantly


monitor the popularity of their programs by
hiring Nielsen and other organizations to
sample the preferences of TV viewers.

1-15
Components of Statistics

1. Descriptive Statistics: condenses data into


few summary descriptive measures.
2. Inferential statistics: generalises sample
findings to the broader population.
3. Statistical modelling: builds models of
relationships between variables by
constructing equations (or models) to
estimate one of these variables based on
values of related variables.

1-16
Data and Data Quality

 Define data.
 The usefulness of data depends of the
quality of data collected.
 The quality of data depends on:
i) The data type
ii) The source of data
iii) Methods of data collection.

1-17
Types of Variables

A. Qualitative or Attribute variable - the characteristic being


studied is nonnumeric or categorical.
EXAMPLES: Gender, religious affiliation, type of automobile owned,
state of birth, eye color are examples.

B. Quantitative variable - information is reported


numerically and in real numbers
EXAMPLES: balance in your checking account, age of a student,
minutes remaining in class, or number of children in a family, price of
a product, distance travelled by a car.

1-18
Quantitative Variables - Classifications

Quantitative variables can be classified as either discrete


or continuous.

A. Discrete variables: can only assume certain values and


there are usually “gaps” between values. Is whole number (or
integer) data.
EXAMPLE: the number of bedrooms in a house, or the number of
hammers sold at the local Home Depot (1,2,3,…,etc).

B. Continuous variable can assume any value within a


specified range. Any number that can occur in an interval

EXAMPLE:The pressure in a tire, the weight of a pork chop, or the


height of students in a class, passengers’ hand luggage
(0.8kg), volume of fuel 50l ; 45.5l
1-19
Summary of Types of Variables

1-20
Four Levels of Measurement or
Measurement scales
Nominal level - data that is Interval level - similar to the ordinal
classified into categories and level, with the additional property
cannot be arranged in any that meaningful amounts of
particular order. differences between data values
can be determined. There is no
natural zero point.
EXAMPLES: eye color, gender, EXAMPLE: Temperature on the
religious affiliation. Fahrenheit scale.

Ordinal level – data arranged in Ratio level - the interval level with an
some order, but the differences inherent zero starting point.
between data values cannot be Differences and ratios are
determined or are meaningless. meaningful for this level of
measurement.
EXAMPLE: During a taste test of 4 soft
drinks, Mellow Yellow was ranked EXAMPLES: Monthly income of surgeons, or
number 1, Sprite number 2, Seven-up distance traveled by manufacturer’s
number 3, and Orange Crush number 4. representatives per month.

1-21
Nominal-Level Data

Properties:
1. Observations of a qualitative variable can
only be classified and counted.
2. There is no particular order to the labels.
3. Eg. Gender (1-male; 2- female); mode of
transport ( 1- bus, 2-car, 3-train, 4- bicycle)

1-22
Ordinal-Level Data

Properties:
1. Data classifications are
represented by sets of labels or
names (high, medium, low);
lower, middle, upper, that have
relative values.
2. Because of the relative values,
the data classified can be
ranked or ordered.
3. Is associated with categorical
data

1-23
Interval-Level Data

Properties: Is associated with numeric data


1. Data classifications are ordered according to the amount of the
characteristic they possess.
2. They possess two properties:
 Rank order- same as ordinal data
 Distance- how much more or less an object possess of a xrstic
Example: Women’s dress sizes
listed on the table.

1-24
Examples of interval-scaled data

 How would you rate your chances of getting


employed after completing your first degree?
1. Very low 2. Low 3. Moderate 4. high 5. Very
high
 How satisfied are you with your current job
description?
1. Very dissatisfied 2. Dissatisfied 3. Satisfied 4.
Very Satisfied
 Has no zero point

1-25
Ratio-Level Data

 Consists of all real numbers associated with


quantitative variables. Eg employee ages, customer
incomes, distance travelled (km) number of shopping
trips per month (0; 1; 2; 3; etc
 Ratio level is the “highest” level of measurement.
Properties:
1. Data classifications are ordered according to the amount of the
characteristics they possess.
2. Equal differences in the characteristic are represented by equal
differences in the numbers assigned to the classifications.

1-26
Properties of Ratio data

 Has all properties of numbers (order, distance and


an absolute origin of zero) that allow such data to be
added, subtracted, multiplied or divided.
 Is the strongest data for statistical analysis since the
most amount of statistical information can be
extracted.
 More statistical methods can be applied to ratio data
than any other type.
 The zero property means that ratios can be
computed eg 5 is half of 10.

1-27
Why Know the Level of Measurement of a
Data?

 The level of measurement of the data


dictates the calculations that can be done to
summarize and present the data.
 To determine the statistical tests that should
be performed on the data

1-28
Summary of the Characteristics for
Levels of Measurement

1-29
Data Sources

 Sources of data can be internal, sourced


from within the company.
 Data can be sourced externally, existing
outside an organisation e.g from external
databases.
 Most commonly, researchers use primary
data that is recorded for the first time at
source.
 Sometimes secondary data is used, data that
already exist.
1-30
Primary and Secondary data

Question: discuss the advantages and


disadvantages of using:
i. Primary data.
ii. Secondary data.

1-31
Data collection methods

i. Observation.
ii. Panels
iii. Focus groups
iv. Surveys
 Face-to-face interviews
 Telephone interviews
 Computer assisted interviews
 personal interviews,
 e-surveys).
1-32
What are the advantages and
disadvantages of gathering data
using each method listed
above?

1-33
Describing Data:
Frequency Tables, Frequency
Distributions, and Graphic Presentation

McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved.
Frequency Table and Frequency Distribution

FREQUENCY TABLE A grouping of qualitative data


into mutually exclusive classes showing the number Class interval: The class interval is obtained by
subtracting the lower limit of a class from
of observations in each class. the lower limit of the next class.
Class frequency: The number of observations
in each class.
Class midpoint: A point that divides a class into
two equal parts. This is the average of the
upper and lower class limits.

FREQUENCY DISTRIBUTION A grouping of data into mutually exclusive


classes showing the number of observations in each class.
1-35
Pie Charts and Bar Charts
PIE CHART A chart that shows the proportion or percent that each class
represents of the total number of frequencies.

BAR CHART A graph in which the classes are reported on the


horizontal axis and the class frequencies on the vertical axis. The
class frequencies are proportional to the heights of the bars.
1-36
Relative Class Frequencies

 Class frequencies can be converted to


relative class frequencies to show the
fraction of the total number of observations in
each class.
 A relative frequency captures the relationship
between a class total and the total number of
observations.

1-37
EXAMPLE – Creating a Frequency
Distribution Table

Ms. Kathryn Ball of AutoUSA


wants to develop tables, charts,
and graphs to show the typical
selling price on various dealer
lots. The table on the right
reports only the price of the 80
vehicles sold last month at
Whitner Autoplex.

1-38
Constructing a Frequency Table - Example

 Step 3: Set the individual class limits

 Step 4: Tally the vehicle selling prices


into the classes.

 Step 5: Count the number of items in


each class.

1-39
Relative Frequency Distribution

To convert a frequency distribution to a relative frequency


distribution, each of the class frequencies is divided by the
total number of observations.

1-40
Graphic Presentation of a Frequency
Distribution

The three commonly used graphic forms are:


 Histograms
 Frequency polygons
 Cumulative frequency distributions

1-41
Histogram

HISTOGRAM A graph in which the classes are marked on the


horizontal axis and the class frequencies on the vertical axis. The
class frequencies are represented by the heights of the bars and
the bars are drawn adjacent to each other.

1-42
Frequency Polygon

 A frequency polygon
also shows the shape
of a distribution and is
similar to a histogram.

 It consists of line
segments connecting
the points formed by
the intersections of the
class midpoints and the
class frequencies.

1-43
Histogram Versus Frequency Polygon

 Both provide a quick picture of the main characteristics of the


data (highs, lows, points of concentration, etc.)
 The histogram has the advantage of depicting each class as a
rectangle, with the height of the rectangular bar representing
the number in each class.
 The frequency polygon has an advantage over the histogram. It
allows us to compare directly two or more frequency
distributions.

1-44
Cumulative Frequency Distribution

1-45
Describing Data:
Numerical Measures

McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved.
Parameter Versus Statistics

PARAMETER A measurable characteristic of a population.

STATISTIC A measurable characteristic of a sample.

1-47
Population Mean
For ungrouped data, the population mean is the sum of all the population values divided by the total number of
population values. The sample mean is the sum of all the sample values divided by the total number of sample
values.

EXAMPLE:

1-48
The Median

MEDIAN The midpoint of the values after they have been ordered from the smallest to the largest, or the largest to
the smallest.

PROPERTIES OF THE MEDIAN


1. There is a unique median for each data set.
2. It is not affected by extremely large or small values and is therefore a valuable measure of central tendency
when such values occur.
3. It can be computed for ratio-level, interval-level, and ordinal-level data.
4. It can be computed for an open-ended frequency distribution if the median does not lie in an open-ended
class.

EXAMPLES:
The ages for a sample of five college students are: The heights of four basketball players, in inches, are:

76, 73, 80, 75


21, 25, 19, 20, 22
Arranging the data in ascending order gives:
Arranging the data in ascending order gives:
73, 75, 76, 80.
19, 20, 21, 22, 25.
Thus the median is 75.5

Thus the median is 21.

1-49
The Mode

MODE The value of the observation that appears most frequently.

1-50
The Relative Positions of the Mean,
Median and the Mode

1-51
The Geometric Mean

 Useful in finding the average change of percentages, ratios, indexes, or growth rates over time.
 It has a wide application in business and economics because we are often interested in finding the
percentage changes in sales, salaries, or economic figures, such as the GDP, which compound or
build on each other.
 The geometric mean will always be less than or equal to the arithmetic mean.
 The formula for the geometric mean is written:

EXAMPLE:
Suppose you receive a 5 percent increase in salary this year and a 15 percent
increase next year. The average annual percent increase is 9.886, not 10.0. Why is
this so? We begin by calculating the geometric mean.

GM  ( 1.05 )( 1.15 )  1.09886

1-52
Measures of Dispersion
 A measure of location, such as the mean or the median, only describes the center of the data. It is valuable from
that standpoint, but it does not tell us anything about the spread of the data.
 For example, if your nature guide told you that the river ahead averaged 3 feet in depth, would you want to wade
across on foot without additional information? Probably not. You would want to know something about the variation
in the depth.
 A second reason for studying the dispersion in a set of data is to compare the spread in two or more distributions.

 RANGE

 MEAN DEVIATION

 VARIANCE AND STANDARD DEVIATION

1-53
EXAMPLE – Mean Deviation

EXAMPLE:
The number of cappuccinos sold at the Starbucks location in the Orange Country
Airport between 4 and 7 p.m. for a sample of 5 days last year were 20, 40, 50, 60,
and 80. Determine the mean deviation for the number of cappuccinos sold.

Step 1: Compute the mean x


x 
20  40  50  60  80
 50
n 5

Step 2: Subtract the mean (50) from each of the observations, convert to positive if difference
is negative

Step 3: Sum the absolute differences found in step 2 then divide by the number of
observations

1-54
Variance and Standard Deviation

VARIANCE The arithmetic mean of the squared deviations from the mean.

STANDARD DEVIATION The square root of the variance.

 The variance and standard deviations are nonnegative and are zero only
if all observations are the same.
 For populations whose values are near the mean, the variance and
standard deviation will be small.
 For populations whose values are dispersed from the mean, the
population variance and standard deviation will be large.
 The variance overcomes the weakness of the range by using all the
values in the population. PLEASE NOTE THAT THE DENOMINATOR IS
“N” for a population but “n-1” for sample variance
1-55
EXAMPLE – Population Variance and
Population Standard Deviation

The number of traffic citations issued during the last five months in Beaufort County, South Carolina, is
reported below:

What is the population variance?

Step 1: Find the mean.   x  19  17  ...  34  10  348  29


N 12 12

Step 2: Find the difference between each observation and the mean, and square that difference.
Step 3: Sum all the squared differences found in step 3

Step 4: Divide the sum of the squared differences by the number of items in the population.

2   ( X  ) 2


1,488
 124
N 12

1-56
Sample Variance and
Standard Deviation

Where :
s 2 is the sample variance
X is the value of each observatio n in the sample
X is the mean of the sample
n is the number of observatio ns in the sample

EXAMPLE
The hourly wages for a sample of part-time
employees at Home Depot are: $12, $20,
$16, $18, and $19.

What is the sample variance? PLEASE NOTE


THAT THE DENOMINATOR IS “N” for a
population but “n-1” for sample variance

1-57
Chebyshev’s Theorem and Empirical Rule

The arithmetic mean biweekly amount


contributed by the Dupree Paint
employees to the company’s profit-
sharing plan is $51.54, and the standard
deviation is $7.51. At least what percent
of the contributions lie within plus 3.5
standard deviations and minus 3.5
standard deviations of the mean?

1-58
The Arithmetic Mean and Standard
Deviation of Grouped Data

EXAMPLE:
Determine the arithmetic mean vehicle selling
EXAMPLE
price given in the frequency table below.
PLEASE NOTE THAT THE Compute the standard deviation of the vehicle
DENOMINATOR IS “N” for a population selling prices in the frequency table below.
variance but “n-1” for sample variance

1-59
Describing Data:
Displaying and Exploring Data

McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved.
Dot Plots

 A dot plot groups the data as little as possible and the identity of an individual observation is not lost.
 To develop a dot plot, each observation is simply displayed as a dot along a horizontal number line
indicating the possible values of the data.
 If there are identical observations or the observations are too close to be shown individually, the dots are
“piled” on top of each other.

EXAMPLE
Reported below are the number of vehicles sold in the last 24 months at Smith Ford Mercury Jeep, Inc.,
in Kane, Pennsylvania, and Brophy Honda Volkswagen in Greenville, Ohio. Construct dot plots
and report summary statistics for the two small-town Auto USA lots.

1-61
Stem-and-Leaf
 Stem-and-leaf display is a statistical technique to present a set of data. Each numerical value is divided
into two parts. The leading digit(s) becomes the stem and the trailing digit the leaf. The stems are located
along the vertical axis, and the leaf values are stacked against each other along the horizontal axis.
 Two disadvantages to organizing the data into a frequency distribution:
(1) The exact identity of each value is lost
(2) Difficult to tell how the values within each class are distributed.

EXAMPLE
Listed in Table 4–1 is the number of 30-second radio advertising spots purchased by each of the 45
members of the Greater Buffalo Automobile Dealers Association last year. Organize the data into a
stem-and-leaf display. Around what values do the number of advertising spots tend to cluster?
What is the fewest number of spots purchased by a dealer? The largest number purchased?

1-62
Quartiles, Deciles and Percentiles
 The standard deviation is the most widely used measure of dispersion.

 Alternative ways of describing spread of data include determining the location of values that divide a set of
observations into equal parts.

 These measures include quartiles, deciles, and percentiles.

 To formalize the computational procedure, let Lp refer to the location of a desired percentile. So if we wanted
to find the 33rd percentile we would use L33 and if we wanted the median, the 50th percentile, then L50.

 The number of observations is n, so if we want to locate the median, its position is at (n + 1)/2, or we could
write this as (n + 1)(P/100), where P is the desired percentile

1-63
Percentiles - Example
EXAMPLE
Listed below are the commissions earned last month by a sample of 15 brokers at Salomon
Smith Barney’s Oakland, California, office.

$2,038 $1,758 $1,721 $1,637 $2,097 $2,047 $2,205 $1,787


$2,287 $1,940 $2,311 $2,054 $2,406 $1,471 $1,460

Locate the median, the first quartile, and the third quartile for the commissions earned.

Step 1: Organize the data from lowest to largest value

$1,460 $1,471 $1,637 $1,721 $1,758 $1,787 $1,940


$2,038 $2,047 $2,054 $2,097 $2,205 $2,287 $2,311
$2,406

Step 2: Compute the first and third quartiles. Locate L25 and L75 using:

25 75
L25  (15  1) 4 L75  (15  1)  12
100 100
Therefore, the first and third quartiles are located at the 4th and 12th
positions, respective ly
L25  $1,721
L75  $2,205
1-64
Boxplot - Example

Step1: Create an appropriate scale along the horizontal axis.

Step 2: Draw a box that starts at Q1 (15 minutes) and ends at Q3 (22
minutes). Inside the box we place a vertical line to represent the median (18 minutes).

Step 3: Extend horizontal lines from the box out to the minimum value (13
minutes) and the maximum value (30 minutes).

1-65
Skewness
 Another characteristic of a set of data is the shape.
 There are four shapes commonly observed: symmetric, positively skewed, negatively skewed, bimodal.

 The coefficient of skewness can range from -3 up to 3.


– A value near -3, indicates considerable negative skewness.
– A value such as 1.63 indicates moderate positive skewness.
– A value of 0, which will occur when the mean and median are equal, indicates the distribution is
symmetrical and that there is no skewness present.

1-66
Contingency Tables – An Example

A manufacturer of preassembled windows produced 50 windows yesterday. This morning


the quality assurance inspector reviewed each window for all quality aspects. Each was
classified as acceptable or unacceptable and by the shift on which it was produced.
Thus we reported two variables on a single item. The two variables are shift and
quality. The results are reported in the following table.

Using the contingency table able, the quality of the three shifts can be compared. For
example:
1. On the day shift, 3 out of 20 windows or 15 percent are defective.
2. On the afternoon shift, 2 of 15 or 13 percent are defective and
3. On the night shift 1 out of 15 or 7 percent are defective.
4. Overall 12 percent of the windows are defective
1-67
Probability Concepts

McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved.
Probability, Experiment, Outcome, Event:
Defined

PROBABILITY A value between zero and


one, inclusive, describing the relative
possibility (chance or likelihood) an event will
occur.

 An experiment is a process
that leads to the occurrence
of one and only one of several
possible observations.
 An outcome is the particular
result of an experiment.
 An event is the collection of
one or more outcomes of an
experiment.

1-69
Approaches in Probability

 Classical Approach
 It is an approach in which the sampling
distribution of all outcomes is known with
certainty i.e the outcomes from a given
experiment are fixed.
 The probabilities of the events are equally
likely e.g tossing a die, tossing a coin (head
or tail)

1-70
Relative Approach

 Historical data is used in assessing or


determining the probability of an event
happening.
 It doesn’t yield equally likely probabilities.
 The probabilities are not constant/ static as
compared to the Classical approach e.g the
number of accidents occurring, recurrence of
drought after every 10 yrs,

1-71
Classical and Empirical Probability

Consider an experiment of rolling a six-sided


die. What is the probability of the event The empirical approach to probability is based on what is
“an even number of spots appear face called the law of large numbers. The key to establishing
up”? probabilities empirically is that more observations will
provide a more accurate estimate of the probability.
The possible outcomes are:

EXAMPLE:
On February 1, 2003, the Space Shuttle Columbia
exploded. This was the second disaster in 113 space
missions for NASA. On the basis of this information,
what is the probability that a future mission is
successfully completed?

There are three “favorable” outcomes (a two,


a four, and a six) in the collection of six Probabilit y of a successful flight 
Number of successful flights
equally likely possible outcomes. Total number of flights
111
  0.98
113

1-72
Subjective/ Judgemental Approach

 Probabilities are derived based on


someone’s personal judgement/ evaluation
of the chance of an event of interest
happening.
 Is impulsive, the probabilities generated are
debatable of subject to scrutiny.

1-73
Mutually Exclusive Events and
Collectively Exhaustive Events

 Events are mutually exclusive if the occurrence of any one event means that
none of the others can occur at the same time.
 Events are collectively exhaustive if at least one of the events must occur
when an experiment is conducted.
 The sum of all collectively exhaustive and mutually exclusive events is 1.0 (or
100%)

collectively exhaustive
and mutually exclusive
events

 Events are independent if the occurrence of one event does not affect the
occurrence of another.

1-74
Subjective Probability - Example

 If there is little or no past experience or information on which to


base a probability, it may be arrived at subjectively.

 Illustrations of subjective probability are:


1. Estimating the likelihood the New England Patriots will play in the
Super Bowl next year.
2. Estimating the likelihood you will be married before the age of 30.
3. Estimating the likelihood the U.S. budget deficit will be reduced by
half in the next 10 years.

1-75
Summary of Types of Probability

1-76
Rules of Addition
Rules of Addition EXAMPLE:
 Special Rule of Addition - If two events
A and B are mutually exclusive, the An automatic Shaw machine fills plastic bags with a mixture of
probability of one or the other event’s beans, broccoli, and other vegetables. Most of the bags contain
occurring equals the sum of their the correct weight, but because of the variation in the size of the
probabilities.
beans and other vegetables, a package might be underweight or
P(A or B) = P(A) + P(B) overweight. A check of 4,000 packages filled in the past month
revealed:
 The General Rule of Addition - If A
and B are two events that are not
mutually exclusive, then P(A or B) is
given by the following formula:
P(A or B) = P(A) + P(B) - P(A and B)

What is the probability that a particular package will be either


underweight or overweight?

P(A or C) = P(A) + P(C) = .025 + .075 = .10

1-77
The Complement Rule
The complement rule is used to determine EXAMPLE
the probability of an event occurring An automatic Shaw machine fills plastic bags with a mixture of
by subtracting the probability of the beans, broccoli, and other vegetables. Most of the bags contain
event not occurring from 1. the correct weight, but because of the variation in the size of the
P(A) + P(~A) = 1 beans and other vegetables, a package might be underweight or
or P(A) = 1 - P(~A). overweight. Use the complement rule to show the probability of
a satisfactory bag is .900

P(B) = 1 - P(~B)
= 1 – P(A or C)
= 1 – [P(A) + P(C)]
= 1 – [.025 + .075]
= 1 - .10
= .90

1-78
The General Rule of Addition and Joint Probability

The Venn Diagram shows the result of a


survey of 200 tourists who visited Florida JOINT PROBABILITY A probability that measures
during the year. The survey revealed that 120 the likelihood two or more events will happen
went to Disney World, 100 went to Busch concurrently.
Gardens and 60 visited both.

What is the probability a selected person


visited either Disney World or Busch
Gardens?

P(Disney or Busch) = P(Disney) + P(Busch) - P(both Disney and Busch)


= 120/200 + 100/200 – 60/200
= .60 + .50 – .80

1-79
Special and General Rules of Multiplication

 The special rule of multiplication requires that The general rule of multiplication is used to find the joint
two events A and B are independent. probability that two independent events will occur.
 Two events A and B are independent if the
occurrence of one has no effect on the
probability of the occurrence of the other.
 This rule is written: P(A and B) = P(A)P(B)
EXAMPLE
EXAMPLE
A golfer has 12 golf shirts in his closet. Suppose 9 of these shirts
A survey by the American Automobile are white and the others blue. He gets dressed in the dark,
association (AAA) revealed 60 percent of its so he just grabs a shirt and puts it on. He plays golf two
members made airline reservations last year. days in a row and does not do laundry.
Two members are selected at random. Since
What is the likelihood both shirts selected are white?
the number of AAA members is very large,
we can assume that R1 and R2 are
independent. What is the probability both
made airline reservations last year?

Solution: The event that the first shirt selected is white is W1. The
The probability the first member made an airline probability is P(W1) = 9/12
reservation last year is .60, written as P(R1) = The event that the second shirt (W2 )selected is also white. The
.60
conditional probability that the second shirt selected is white,
The probability that the second member selected
made a reservation is also .60, so P(R2) = .60. given that the first shirt selected is also white, is
Since the number of AAA members is very large, you P(W2 | W1) = 8/11.
may assume that R1 and R2 are independent. To determine the probability of 2 white shirts being selected we
use formula: P(AB) = P(A) P(B|A)
P(R1 and R2) = P(R1)P(R2) = (.60)(.60) = .36 P(W1 and W2) = P(W1)P(W2 |W1) = (9/12)(8/11) = 0.55
1-80
Contingency Tables
A CONTINGENCY TABLE is a table used to classify sample observations according to two or more identifiable
characteristic

EXAMPLE:
A sample of executives were surveyed about
their loyalty to their company. One of the Event A1 happens if a randomly selected executive
questions was, “If you were given an offer will remain with the company despite an equal or
by another company equal to or slightly slightly better offer from another company. Since
better than your present position, would there are 120 executives out of the 200 in the
you remain with the company or take the survey who would remain with the company
other position?” The responses of the 200 P(A1) = 120/200, or .60.
executives in the survey were cross- Event B4 happens if a randomly selected executive
classified with their length of service with has more than 10 years of service with the
the company. What is the probability of company. Thus, P(B4| A1) is the conditional
randomly selecting an executive who is probability that an executive with more than 10
loyal to the company (would remain) and years of service would remain with the company.
who has more than 10 years of service? Of the 120 executives who would remain 75 have
more than 10 years of service, so
P(B4| A1) = 75/120.

1-81
Tree Diagrams

A tree diagram is useful for portraying


conditional and joint probabilities. It is
particularly useful for analyzing business
decisions involving several stages.
A tree diagram is a graph that is helpful in
organizing calculations that involve
several stages. Each segment in the tree
is one stage of the problem. The branches
of a tree diagram are weighted by
probabilities.

1-82
Permutation and Combination

A permutation is any arrangement of r A combination is the number of ways to


objects selected from n possible
choose r objects from a group of n objects
objects. The order of arrangement is without regard to order.
important in permutations.

EXAMPLE EXAMPLE
Suppose that in addition to selecting the There are 12 players on the Carolina Forest High
group, he must also rank each of the School basketball team. Coach Thompson
players in that starting lineup according to must pick five players among the twelve on
their ability. the team to comprise the starting lineup.
How many different groups are possible?

12! 12!
12 P 5   95,040 12 C5   792
(12  5)! 5!(12  5)!

1-83
Probability Distributions

McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved.
What is a Probability Distribution?
PROBABILITY DISTRIBUTION A listing of all the outcomes of an experiment and the probability associated
with each outcome.

CHARACTERISTICS OF A PROBABILITY DISTRIBUTION


1. The probability of a particular outcome is between 0 and 1 inclusive.
2. The outcomes are mutually exclusive events.
3. The list is exhaustive. So the sum of the probabilities of the various events is equal to 1.

Experiment:

Toss a coin three times. Observe the number of


heads. The possible results are: Zero heads, One
head, Two heads, and Three heads.

What is the probability distribution for the number


of heads?

1-85
Random Variables
RANDOM VARIABLE A quantity resulting from an experiment that, by chance, can assume different values.

DISCRETE RANDOM VARIABLE A random CONTINUOUS RANDOM VARIABLE can assume an


variable that can assume only certain clearly infinite number of values within a given range. It is
separated values. It is usually the result of usually the result of some type of measurement
counting something.

EXAMPLES EXAMPLES
1. The number of students in a class. 1. The length of each song on the latest Tim McGraw
2. The number of children in a family. album.
3. The number of cars entering a carwash in a 2. The weight of each student in this class.
hour. 3. The temperature outside as you are reading this
4. Number of home mortgages approved by book.
Coastal Federal Bank last week. 4. The amount of money earned by each of the more
than 750 players currently on Major League Baseball
team rosters.
1-86
The Mean and Variance of a Discrete
Probability Distribution

MEAN
•The mean is a typical value used to represent the central location of a probability distribution.
•The mean of a probability distribution is also referred to as its expected value.

VARIANCE AND STANDARD DEVIATION


• Measures the amount of spread in a distribution
• The computational steps are:
1. Subtract the mean from each value, and square this difference.
2. Multiply each squared difference by its probability.
3. Sum the resulting products to arrive at the variance.

The standard deviation is found by taking the positive square root of the variance.

1-87
Mean, Variance, and Standard
Deviation of a Probability Distribution - Example

John Ragsdale sells new cars for


Pelican Ford. John usually MEAN
sells the largest number of
cars on Saturday. He has
developed the following
probability distribution for
the number of cars he
expects to sell on a
particular Saturday.

VARIANCE

STANDARD    2  1.290  1.136


DEVIATION
1-88
Binomial Probability Distribution
 A Widely occurring discrete probability distribution EXAMPLE
 Characteristics of a Binomial Probability Distribution There are five flights daily from Pittsburgh via
1. There are only two possible outcomes on a particular US Airways into the Bradford,
trial of an experiment. Pennsylvania, Regional Airport. Suppose
the probability that any flight arrives late is
2. The outcomes are mutually exclusive, .20.
3. The random variable is the result of counts. What is the probability that none of the flights
4. Each trial is independent of any other trial are late today?

What is the average number of late flights?


What is the variance of the number of late
flights?

1-89
Binomial Distribution - Example
EXAMPLE Binomial – Shapes for Varying  (n constant)
Five percent of the worm gears produced by
an automatic, high-speed Carter-Bell
milling machine are defective.

What is the probability that out of six gears


selected at random none will be
defective? Exactly one? Exactly two?
Exactly three? Exactly four? Exactly
five? Exactly six out of six?

Binomial – Shapes for Varying n ( constant)

1-90
Poisson Probability Distribution
The Poisson probability distribution describes the number of times some event occurs during a specified
interval. The interval may be time, distance, area, or volume.

Assumptions of the Poisson Distribution


(1) The probability is proportional to the length of the interval.
(2) The intervals are independent.

Examples include:
• The number of misspelled words per page in a newspaper.
• The number of calls per hour received by Dyson Vacuum Cleaner Company.
• The number of vehicles sold per day at Hyatt Buick GMC in Durham, North Carolina.
• The number of goals scored in a college soccer game.

1-91
Poisson Probability Distribution - Example

EXAMPLE
Assume baggage is rarely lost by Northwest Airlines.
Suppose a random sample of 1,000 flights shows a
total of 300 bags were lost. Thus, the arithmetic
mean number of lost bags per flight is 0.3
(300/1,000). If the number of lost bags per flight
follows a Poisson distribution with u = 0.3, find the
probability of not losing any bags.
.
Use Appendix B.5 to find the probability that no bags will
be lost on a particular flight.

What is the probability exactly one bag will be lost on a


particular flight?

1-92
More About the Poisson Probability
Distribution

•The Poisson probability distribution is always positively skewed and the random variable has no
specific upper limit.

•The Poisson distribution for the lost bags illustration, where µ=0.3, is highly skewed.

•As µ becomes larger, the Poisson distribution becomes more symmetrical.

1-93
Normal Probability Distribution

1. It is bell-shaped and has a single peak at the


center of the distribution.
2. It is symmetrical about the mean Family of Distributions
3. It is asymptotic: The curve gets closer and closer
to the X-axis but never actually touches it.
4. The location of a normal distribution is
determined by the mean,, the dispersion or
spread of the distribution is determined by the
standard deviation,σ .
5. The arithmetic mean, median, and mode are
equal Different Means and Equal Means and
6. The total area under the curve is 1.00; half the Standard Deviations Different Standard
area under the normal curve is to the right of this Deviations
center point and the other half to the left of it

Different Means and Equal Standard Deviations

1-94
The Standard Normal Probability Distribution

 The standard normal distribution is a


normal distribution with a mean of 0 and
a standard deviation of 1.
 It is also called the z distribution.
 A z-value is the signed distance
between a selected value, designated X,
and the population mean , divided by
the population standard deviation, σ.
 The formula is:

1-95
The Normal Distribution – Example

The weekly incomes of shift


foremen in the glass
industry follow the
normal probability
distribution with a mean
of $1,000 and a
standard deviation of
$100.
What is the z value for the
income, let’s call it X, of
a foreman who earns
$1,100 per week? For a
foreman who earns
$900 per week?

1-96
The Empirical Rule - Example

As part of its quality assurance


program, the Autolite Battery
Company conducts tests on
battery life. For a particular
D-cell alkaline battery, the
mean life is 19 hours. The
useful life of the battery
follows a normal distribution
with a standard deviation of
1.2 hours.

Answer the following questions.


1. About 68 percent of the
batteries failed between
what two values?
2. About 95 percent of the
batteries failed between
what two values?
3. Virtually all of the batteries
failed between what two
values?

1-97
Normal Distribution – Finding Probabilities

EXAMPLE
The mean weekly income of a shift
foreman in the glass industry is
normally distributed with a mean of
$1,000 and a standard deviation of
$100.

What is the likelihood of selecting a


foreman whose weekly income is
between $1,000 and $1,100?

1-98
Normal Distribution – Finding Probabilities
(Example 2)

Refer to the information regarding the weekly income


of shift foremen in the glass industry. The
distribution of weekly incomes follows the
normal probability distribution with a mean of
$1,000 and a standard deviation of $100.

What is the probability of selecting a shift foreman in What is the probability of selecting a shift foreman in the
the glass industry whose income is: glass industry whose income is:
Between $790 and $1,000? Between $840 and $1,200

1-99
Using Z in Finding X Given Area - Example

Layton Tire and Rubber Company wishes to set a


minimum mileage guarantee on its new MX100
tire. Tests reveal the mean mileage is 67,900
Solve X using the formula :
with a standard deviation of 2,050 miles and
that the distribution of miles follows the normal x -  x  67,900
z 
probability distribution. Layton wants to set the  2,050
minimum guaranteed mileage so that no more
than 4 percent of the tires will have to be
replaced. The value of z is found using the 4% informatio n
What minimum guaranteed mileage should Layton The area between 67,900 and x is 0.4600, found by 0.5000 - 0.0400
announce? Using Appendix B.1, the area closest to 0.4600 is 0.4599, which
gives a z alue of - 1.75. Then substituti ng into the equation :

x - 67,900
- 1.75  , then solving for x
2,050

- 1.75(2,050)  x - 67,900

x  67,900 - 1.75(2,050)

x  64,312

1-100
Normal Approximation to the Binomial

 The normal distribution (a continuous distribution) yields a good approximation of the binomial
distribution (a discrete distribution) for large values of n.

 The normal probability distribution is generally a good approximation to the binomial probability
distribution when n and n(1- ) are both greater than 5. This is because as n increases, a
binomial distribution gets closer and closer to a normal distribution.

1-101
Normal Approximation to the Binomial - Example

Suppose the management of the Santoni Pizza Restaurant found that 70 percent of its new
customers return for another meal. For a week in which 80 new (first-time) customers
dined at Santoni’s, what is the probability that 60 or more will return for another meal?

1-102 P(X ≥ 60) = 0.063+0.048+ … + 0.001 = 0.197


Normal Approximation to the Binomial - Example

Suppose the management of the Santoni Pizza


Restaurant found that 70 percent of its new
customers return for another meal. For a
week in which 80 new (first-time) customers
dined at Santoni’s, what is the probability
that 60 or more will return for another meal?

Step 1. Find the mean and the variance of a


binomial distribution and find the z
corresponding to an X of 59.5 (x-.5, the
correction factor)

Step 2: Determine the area from 59.5 and


beyond

1-103