Beruflich Dokumente
Kultur Dokumente
Chapter 2
DECISION MAKING
DATA ANALYSIS AND
BUSINESS ANALYTICS:
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Populations and Samples
A population includes all of the entities of interest
in a study (people, households, machines, etc.)
Examples:
All potential voters in a presidential election
All subscribers to cable television
All invoices submitted for Medicare reimbursement by
nursing homes
A sample is a subset of the population, often
randomly chosen and preferably representative of
the population as a whole.
Examples: Gallup, Harris, other polls today
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Sets, Variables, and
Observations
A data set is usually a rectangular array of data,
with variables in columns and observations in rows.
A variable (or field or attribute) is a characteristic
of members of a population, such as height, gender,
or salary.
An observation (or case or record) is a list of all
variable values for a single member of a
population.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.1:
Questionnaire Data.xlsx
Objective: To illustrate variables and observations in a typical data
set.
Solution: Data set includes observations on 30 people who
responded to a questionnaire on the president’s environmental
policies.
Variables include: age, gender, state, children, salary, opinion.
Include a row that lists variable names.
Include a column that shows an index of the observation.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Types of Data
(slide 1 of 5)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Types of Data
(slide 2 of 5)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Types of Data
(slide 4 of 5)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Typical Time Series Data Set
(slide 5 of 5)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Descriptive Measures for
Categorical Variables
There are only a few possibilities for describing a
categorical variable, all based on counting:
Count the number of categories.
Give the categories names.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.2:
Supermarket Transactions.xlsx (slide 1 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.2:
Supermarket Transactions.xlsx (slide 2 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.2:
Supermarket Transactions.xlsx (slide 3 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Descriptive Measures for
Numerical Variables
There are many ways to summarize numerical
variables, both with numerical summary measures
and with charts.
To learn how the values of a variable are
distributed, ask:
What are the most “typical” values?
How spread out are the values?
What are the “extreme” values on either end?
Is the chart of the values symmetric about some middle
value, or is it skewed in some direction? Does it have
any other peculiar features besides possible skewness?
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.3:
Baseball Salaries 2011.xlsx (slide 1 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.3:
Baseball Salaries 2011.xlsx (slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Central Tendency
(slide 1 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Central Tendency
(slide 3 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Minimum, Maximum,
Percentiles, and Quartiles
For any percentage p, the pth percentile is the value
such that a percentage p of all values are less than it.
The quartiles divide the data into four groups, each
with (approximately) a quarter of all observations.
The first, second and third quartiles are the percentiles
corresponding to p = 25%, p = 50%,
and p = 75%.
By definition, the second quartile (p = 50%) is equal to the
median.
The minimum and maximum values can be calculated
with Excel’s MIN and MAX functions, and the percentiles
and quartiles with Excel’s PERCENTILE and QUARTILE
functions.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Variability
(slide 1 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Variability
(slide 3 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Calculating Variance and
Standard Deviation
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Empirical Rules for Interpreting
Standard Deviation (slide 1 of 3)
The interpretation of the standard deviation can be
stated as three empirical rules.
If the values of a variable are approximately normally
distributed (symmetric and bell-shaped), then the
following rules hold:
Approximately 68% of the observations are within one
standard deviation of the mean.
Approximately 95% of the observations are within two
standard deviations of the mean.
Approximately 99.7% of the observations are within three
standard deviations of the mean.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Empirical Rules for Baseball Salaries
(slide 2 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Empirical Rules for Interpreting
Standard Deviation (slide 3 of 3)
The mean absolute deviation (MAD) is the
average of the absolute deviations.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Shape
(slide 1 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Shape
(slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Numerical Summary Measures in the
Status Bar and with StatTools
If you select multiple cells, summary measures
appear for the selected cells in the status bar at the
bottom of the Excel window.
You can choose the summary measures that appear by
right-clicking the status bar and selecting your favorites.
Although Excel’s built-in functions can be used to
calculate a number of summary measures, a much
quicker way is to use the StatTools add-in.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.3 (Continued):
Baseball Salaries 2011.xlsx
Objective: To learn the
fundamentals of StatTools and
use it to generate summary
measures of baseball salaries.
Solution: First, define a
StatTools data set, by selecting
any cell in the data set and
clicking the Data Set Manager
button.
Then generate summary
measures for the Salary
variable, by selecting One-
Variable Summary from the
Summary Statistics dropdown
list and filling in the dialog box
that appears.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Charts for Numerical Variables
There are many graphical ways to indicate the
distribution of a numerical variable.
For cross-sectional variables:
Histograms
Box plots
For time series variables:
Time series graphs
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Histograms
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.3 (Continued):
Baseball Salaries 2011.xlsx (slide 1 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.3 (Continued):
Baseball Salaries 2011.xlsx (slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.4:
Late or Lost Baggage.xlsx (slide 1 of 2)
Objective: To fine-tune a
histogram for a variable with
integer counts.
Solution: Data set lists the number
of bags that were either late or
lost for 456 flights.
In the Histogram dialog box,
request 9 bins and set the
minimum and maximum to -0.5
and 8.5.
StatTools divides the range into 9
equal-length bins.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.4:
Late or Lost Baggage.xlsx (slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plots
A box plot (or box-whisker plot) is an alternative
type of chart for showing the distribution of a
variable.
The elements of a generic box plot are shown below:
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.3 (Continued):
Baseball Salaries 2011.xlsx
Objective: To illustrate the features of a box plot,
particularly how it indicates skewness.
Solution: In StatTools, select Box-Whisker Plot from
the Summary Graphs dropdown list and fill in the
dialog box.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Time Series Data
Our main interest in time series variables is how
they change over time, and this information is lost in
traditional summary measures and in histograms or
box plots.
For time series data, a time series graph is used.
This is a graph of the values of one or more time
series, using time on the horizontal axis.
This is always the place to start a time series analysis.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.5:
Crime in US.xlsx (slide 1 of 3)
Objective: To see how time series graphs help to detect trends in crime
data.
Solution: Data set contains annual data on violent and property crimes for
the years 1960 to 2010.
In StatTools, designate a StatTools data set.
Then select Times Series Graph from the Time Series and Forecasting
dropdown list and fill in the resulting dialog box.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.5:
Crime in US.xlsx (slide 2 of 3)
Population Totals
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.5:
Crime in US.xlsx (slide 3 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.6:
DJIA Monthly Close.xlsx (slide 1 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.6:
DJIA Monthly Close.xlsx (slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Outliers
An outlier is a value or an entire observation (row)
that lies well outside of the norm.
Some statisticians define an outlier as any value more
than three standard deviations from the mean, but this
is only a rule of thumb.
Even if values are not unusual by themselves, there
still might be unusual combinations of values.
When dealing with outliers, it is best to run the
analyses two ways: with the outliers and without
them.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Missing Values
Most real data sets have gaps in the data.
There are two issues: how to detect these missing
values and what to do about them.
The more important issue is what to do about them:
One option is to simply ignore them. Then you will have to
be aware of how the software deals with missing values.
Another option is to fill in missing values with the average of
nonmissing values, but this isn’t usually a very good option.
A third option is to examine the nonmissing values in the row
of a missing value; these values might provide clues on what
the missing value should be.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Excel Tables for Filtering,
Sorting, and Summarizing
Tables are a tool introduced in Excel 2007.
You now have the ability to designate a rectangular
data set as a table and then employ a number of
powerful tools for analyzing tables.
These tools include:
Filtering
Sorting
Summarizing
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.7:
Catalog Marketing.xlsx (slide 1 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.7:
Catalog Marketing.xlsx (slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Filtering
Finding records that match particular criteria is called
filtering.
One way to filter is to create an Excel table, which
automatically provides dropdown arrows next to the
field names that allow you to filter.
There are also three ways to filter on any rectangular
data set with variable names:
1. Use the Filter button from the Sort & Filter dropdown list
on the Home ribbon.
2. Use the Filter button from the Sort & Filter group on the
Data ribbon.
3. Right-click any cell in the data set and select Filter. You
get several options, the most popular of which is Filter by
Selected Cell’s Value.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.7 (Continued):
Catalog Marketing.xlsx (slide 1 of 2)
Objective: To investigate the types of filters that can be
applied to the HyTex data.
Solution: There is almost no limit to the filters you can
apply, but here are a few possibilities:
Filter on one or more values in a field.
Filter on more than one field.
Filter on a continuous numerical field.
Top 10 and Above/Below Average filters.
Filter on a text field.
Filter on a date field.
Filter on color or icon.
Use a custom filter.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.7 (Continued):
Catalog Marketing.xlsx (slide 2 of 2)
Results from a Typical Filter
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.