Sie sind auf Seite 1von 106

Quantitative Methods in

Management
Term II
4 credits
MGT 408
Business Statistics
A First course
David M.Levine
Kathryn A.Szabat
David F.Stephan
P.K.Viswanathan
PEARSON PUBLICATIONS 7e
Additional Readings
• Statistics for Business and Economics- Anderson, Sweeney , Williams

• Business statistics- Ken Black

• Business statistics – J.K. Sharma


Assessment
• Assignments – ONE 40 marks

• End term Objective questions 60

( concepts and simple problems)


Subject Outline
• Introduction ch-1
• Data collection, classification and presentation ch-2
• Measures of central tendencies and dispersion ch-3
• Correlation and Regression analysis ch-12
• Probability concepts ch-04
• Probability distributions – Binomial and Poissonch-05
• Probability distribution – Normal ch-06
• Sampling techniques ch-07
• Estimation and Inference statistics ch-08
• Testing of Hypothesis – Non Parametric (Chi square) ch-9, 11
• Bayesian Analysis and decision theory ch-15
Quantitative Methods in Management
• Introduction
• Definition
• Importance and limitations
• Applications
• Terminologies
• Scale of measurement
• Type of variables
• Qualitative, quantitative
• Time series and cross sectional
• Types of statistics
• Sources of data
• Classification of data
• Statistical software
INTRODUCTION
DAY 1
PAGES : 1-32
Introduction….
• Of the 18000 foodmakers, the largest 20 now account
for nearly 54% of checkout sales.
• The Consumer Price Index (CPI) declined 0.3% in April.
• The average compensation package for CEOs across 50
large corporations was Rs. 1 million.
• E-commerce sites spend an average of Rs. 5000 to
acquire each customer.
• Stocks account for 75% of the average investor’s
portfolio.
• The Hindu reaches 46% of the region’s households
during weekdays and 61% on Sundays
• CFO’s were asked as to which initiative they would
put on hold in an uncertain economy
• 32% Expansion
• 23% M&A
• 10% New Product Launch
• 18% Technology upgrade
• 9% None
• 8% Any other
Are These Numbers Useful In Making Decisions
• A survey of 1,179 adults 18 and over reported that 54% thought that 15
seconds was an acceptable online ad length before seeing free content.

• A survey reported women were more likely than men to cite seeing photos
or videos, sharing with man people at one, seeing entertaining or funny
posts, learning about ways to help others, and receiving support from
people in your network as reasons to use Facebook.

• A study found the number of times a specific product was mentioned in


comments in the Twitter social messaging service could be used to make
accurate predictions of sales trends for that product.
Without Statistics You Can’t

• Determine if the numbers in these studies are useful information

• Validate claims of predictability or causality

• See patterns that large amounts of data sometimes reveal


In Today’s Business World we Cannot
Escape From Data

• In today’s digital world ever increasing amounts of data are gathered,


stored, reported on, and available for further study.

• Data are facts about the world and are constantly reported as numbers
by an ever increasing number of sources.

• Information based decision making using statistical analysis is


absolutely essential in the present environment characterized by
intense competition, onslaught of new products and services,
globalization, and revolution of information technology.
Each Business Person Faces A Choice Of How To Deal
With This Explosion Of Data

• They can ignore it and hope for the best.

• They can count on other people’s summaries of data and hope they
are correct.

• They can develop their own capability and insight into data by
learning about statistics and its application to business.
Statistics Is Evolving So Businesses Can Use The
Vast Amount Of Data Available

The emerging field of Business Analytics makes


“extensive use of:
• Data
• Statistical and quantitative analysis
• Explanatory & predictive models
• Fact based management
to drive decisions and actions.”
DATA
Data- facts about the world ( a value associated with
something, or collective, a list of values associated with
something).
De
cisi
on
ma
kin
Knowledge
g
Information
DATA
What is Statistics?
“Statistics is a way to get information from data”
Statistics

Data Information
The word Statistics derived from the Latin word ‘status’
meaning a state
Statistics is a tool for creating new understanding from a set of
numbers.
Statistics – A way of thinking
Methods that allow to work with data effectively
Method which help to make better decisions
DEFINITION
STATISTICS
COLLECTION
COMPILATION
CLASSIFICATION
PRESENTATION
ANALYSIS &
INTERPRETATION OF DATA
Statistics
• Art and Science of Collecting and Understanding DATA:
• DATA = Recorded Information
• e.g., Sales, Productivity, Quality, Costs, Return, …
• Why? Because you want:
• Best use of imperfect information:
• e.g., 50,000 customers, 1,600 workers, 386,000 transactions,…
• Good decisions in uncertain conditions:
• e.g., new product launch: Fail? OK? Make you rich?
• Competitive Edge
• e.g., for you and your business!
To Properly Apply Statistics Follow A Framework To Minimize
Possible Errors
DCOVA

• Define the data you want to study in order to solve a


problem or meet an objective
• Collect the data from appropriate sources
• Organize the data collected by developing tables
• Visualize the data by developing charts
• Analyze the data collected to reach conclusions and
present results
Using The DCOVA Framework Helps To Apply
Statistics To:

• Summarize & visualize business data

• Reach conclusions from those data

• Make reliable predictions about business activities

• Improve business processes


Changing face of statistics
• Business analytics
• Use SM to analyze and explore data to uncover unforeseen relationships.
• Use MS methods to develop optimization models that impact an organization’s
strategy, planning, and operations

• Big data
• Collections of data that cannot be easily browsed or analyzed using traditional
methods.
• Use information systems’ methods to collect and process data sets of all sizes,
including very large data sets that would otherwise be hard to examine efficiently

• Integral role of software in statistics


The Growth Of “Big Data” Spurs The Use Of
Business Analytics
• “Big Data” is still a fuzzy concept.

• Very large data sets are arising because of the automatic


collection of high volumes of data at very fast rates.

• Attributes that distinguish “Big Data” from well structured


large data sets are “volume” of data, “velocity” of the data
collection, and “variety” of the data.
Business Analytics Has Already Been Applied In Many
Business Decision-Making Contexts

• Human resource managers (HR) understanding relationships


between HR drivers, key business outcomes, employee skills,
capabilities, and motivation.

• Financial analysts determining why certain trends occur to


predict future financial environments.

• Marketers driving loyalty programs and customer marketing


decisions to drive sales.

• Supply chain managers planning and forecasting based on


product distribution and optimizing sales distribution based
on key inventory measures.
Statistics: An Important Part of Your Business
Education
• You need analytical skills for the increasingly data-
driven environment of business.

• Studies show an increase in productivity, innovation,


and competitiveness for organizations that embrace
business analytics.

• To quote Hal Varian, the chief economist at Google


Inc., “the sexy job in the next 10 years will be
statisticians. And I’m not kidding.”
Activities of Statistics
1. Designing the study:
• First step
• Plan for data-gathering
• Random sample (control bias and error)

2. Exploring the data:


• First step (once you have data)
• Look at, describe, summarize the data
• Are you on the right track?
Activities of Statistics (continued)
3. Modeling the data
• A framework of assumptions and equations
• Parameters represent important aspects of the data
• Helps with estimation and hypothesis testing
4. Estimating an unknown:
• Best “guess” based on data
• Wrong - but by how much?
• Confidence interval - “we’re 95% sure that the unknown is between …”
Activities of Statistics (continued)
5. Hypothesis testing:
• Data decide between two possibilities
• Does “it” really work? [or is “it” just randomly better?]
• Is financial statement correct? [or is error material?]
• Whiter, brighter wash?
• Is the difference statistically significant?
Why a Manager Needs to Know about
Statistics
• To know how to properly present information
• To know how to draw conclusions about populations
based on sample information
• To know how to improve processes
• To know how to obtain reliable forecasts
Why Learn Statistics?
to make better sense of the ubiquitous use of numbers:
• Business memos
• Business research
• Technical reports
• Technical journals
• Newspaper articles
• Magazine articles
Statistical View of the World
• Data are imperfect
• We do the best we can -- Statistics helps!
• Events are random
• Can’t be right 100% of the time
• Use statistical methods
• Along with common sense and good judgment
• Be skeptical!
• Statistics can be used to support contradictory conclusions
• Look at who funded the study?
Applications in
Business and Economics
• Accounting
Public accounting firms use statistical sampling
procedures when conducting audits for their clients.
 Economics
Economists use statistical information in making
forecasts about the future of the economy or some
aspect of it.
Applications in
Business and Economics
 Marketing
Electronic point-of-sale scanners at retail checkout
counters are used to collect data for a variety of
marketing research applications.
 Production
A variety of statistical quality control charts are used
to monitor the output of a production process.
Statistics in Business: Examples
• Advertising
• Effective? Which commercial? Which markets?
• Quality control
• Defect rate? Cost? Are improvements working?
• Finance
• Risk - How high? How to control? At what cost?
• Accounting
• Audit to check financial statements. Is error material?
• Other
• Economic forecasting, background info, measuring and controlling productivity
(human and machine), …
• IMPORTANCE OF STATISTICS • LIMITATIONS OF STATISTICS

• It simplifies complexity • Only quantitative data


• It measures periodic changes • Does not study individual
• Facts are properly presented events
• Formulation of policies • Results are true only on
• Enlarge human experience averages
and knowledge • Does not give importance to
• Helps in comparison all items
• Forecasting • Can be misused
• Testing a hypothesis • Single purpose only
Defining and collecting data
Chapter 1
TERMS AND
TERMINOLOGIES
Define - the variables that you want to study to solve a problem or meet an
objective
Collect - the data for those variables from appropriate sources.
* Data are the facts and figures collected, analyzed, and summarized for
presentation and interpretation.
• Data Set:
• Measurements of items
• e.g., Yearly sales volume for your 23 salespeople
• e.g., Cost and number produced, daily, for the past month
• All the data collected in a particular study are referred to as the data set for the study
• Elementary Units:
• The items being measured
• e.g., Salespeople, Days, Companies, Catalogs, …
• Elements are the entities on which data are collected
• A Variable:
• The type of measurement being done
• e.g., Sales volume, Cost, Productivity, Number of defects, …
• A variable is a characteristic of interest for the elements.
• The set of measurements obtained for a particular element is called an observation
• A data set with n elements contains n observations

* The total number of data values in a complete data set is the number of
elements multiplied by the number of variables.
Data, Data Sets,
Elements, Variables, and Observations
Observation Variables

Element
Names Stock Annual Earn/
Company Exchange Sales($M) Share($)

Dataram NQ 73.10 0.86


EnergySouth N 74.00 1.67
Keystone N 365.70 0.86
LandCare NQ 111.40 0.33
Psychemedics N 17.60 0.13

Data Set
How Many Variables?
• Univariate data set: One variable measured for each
elementary unit
• e.g., Sales for the top 30 computer companies.
• Can do: Typical summary, diversity, special features
• Bivariate data set: Two variables
• e.g., Sales and # Employees for top 30 computer firms
• Can also do: relationship, prediction
• Multivariate data set: Three or more variables
• e.g., Sales, # Employees, Inventories, Profits, …
• Can also do: predict one from all other variables
Types of Variables
 Categorical (qualitative) variables have values that can only be placed
into categories, such as “yes” and “no.”

 Numerical (quantitative) variables have values that represent


quantities.

 Time series or cross sectional data


LEVELS OF
MEASUREMENTS
Scales of Measurement
Scales
Scales of
of measurement
measurement include:
include:
Nominal Interval

Ordinal Ratio

The
The scale
scale determines
determines thethe amount
amount of
of information
information
contained
contained in
in the
the data.
data.

The
The scale
scale indicates
indicates the
the data
data summarization
summarization and
and
statistical
statistical analyses
analyses that
that are
are most
most appropriate.
appropriate.
Levels of Data Measurement

• Nominal — Lowest level of measurement


• Ordinal
• Interval
• Ratio — Highest level of measurement
Nominal Level Data
• Numbers are used to classify or categorize
• Data are labels or names used to identify an attribute of the element.
• A nonnumeric label or numeric code may be used.

• Students of a university are classified by the school in which they are enrolled using a nonnumeric
label such as Business, Humanities, Education, and so on.

• Alternatively, a numeric code could be used for the school variable (e.g. 1 denotes Business, 2
denotes Humanities, 3 denotes Education, and so on).

Example: Employment Classification


• 1 for Educator
• 2 for Construction Worker
• 3 for Manufacturing Worker
Example: Ethnicity
• 1 for African-American
• 2 for Anglo-American
• 3 for Hispanic-American
Ordinal Level Data
• Numbers are used to indicate rank or order
• Relative magnitude of numbers is meaningful
• Differences between numbers are not comparable
• The data have the properties of nominal data and the order or rank of the data is meaningful.
• A nonnumeric label or numeric code may be used.

Example: Ranking productivity of employees


Example: Taste test ranking of three brands of soft drink
Example: Position within an organization
• 1 for President
• 2 for Vice President
• 3 for Plant Manager
• 4 for Department Supervisor
• 5 for Employee

• Students of a university are classified by their class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.

• Alternatively, a numeric code could be used for the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).
Example of Ordinal Measurement

1 f
6 i
2 n
4 i
s
3
h
5
Ordinal Data

Faculty and staff should receive preferential


treatment for parking space.

Strongly Agree Neutral Disagree Strongly


Agree Disagree

1 2 3 4 5
Numbers or Categories?
• Quantitative Variable: Meaningful numbers
• e.g., Sales, # Employees
• Can add, rank, count
• Qualitative Variable: Categories
• Ordinal Variable: Categories with meaningful ordering
• e.g., Bond rating (AA, A, B, …), Diamonds (VSI, SI, …)
• Can rank, count
• Nominal Variable: categories without meaningful ordering
• e.g., State, Type of business, Field of study
• Can count
Interval Level Data
• Distances between consecutive integers are equal
• The data have the properties of ordinal data, and the interval between observations is
expressed in terms of a fixed unit of measure.
• Interval data are always numeric.

• Relative magnitude of numbers is meaningful


• Differences between numbers are comparable
• Location of origin, zero, is arbitrary
• Vertical intercept of unit of measure transform function is not zero
• Example:
Melissa has an SAT score of 1205, while Kevin has an SAT score of 1090. Melissa scored 115 points
more than Kevin.

Example: Fahrenheit Temperature


Example: Calendar Time
Example: Monetary Utility
Ratio Level Data
• Highest level of measurement
• Relative magnitude of numbers is meaningful
• Differences between numbers are comparable
• Location of origin, zero, is absolute (natural)
• Vertical intercept of unit of measure transform function is zero

Examples: Height, Weight, and Volume


Example: Monetary Variables, such as Profit and Loss, Revenues,
and Expenses
Example: Financial ratios, such as P/E Ratio, Inventory Turnover,
and Quick Ratio.
• The data have all the properties of interval data and the ratio of
two values is meaningful.
• Variables such as distance, height, weight, and time use the
ratio scale.
• This scale must contain a zero value that indicates that nothing
exists for the variable at the zero point.
Categorical and Quantitative Data

Data
Data can
can be
be further
further classified
classified as
as being
being categorical
categorical
or
or quantitative.
quantitative.

The
The statistical
statistical analysis
analysis that
that is
is appropriate
appropriate depends
depends
on
on whether
whether the
the data
data for
for the
the variable
variable are
are categorical
categorical
or
or quantitative.
quantitative.

In
In general,
general, there
there are
are more
more alternatives
alternatives for
for statistical
statistical
analysis
analysis when
when the
the data
data are
are quantitative.
quantitative.
Categorical Data
Labels
Labels or
or names
names used
used to
to identify
identify an
an attribute
attribute of
of
each
each element
element

Often
Often referred
referred to
to as
as qualitative
qualitative data
data

Use
Use either
either the
the nominal
nominal or
or ordinal
ordinal scale
scale of
of
measurement
measurement

Can
Can be
be either
either numeric
numeric or
or nonnumeric
nonnumeric

Appropriate
Appropriate statistical
statistical analyses
analyses are
are rather
rather limited
limited
Quantitative Data

Quantitative
Quantitative data
data indicate
indicate how
how many
many or
or how
how much:
much:

discrete,
discrete, ifif measuring
measuring how
how many
many

continuous,
continuous, if
if measuring
measuring how
how much
much

Quantitative
Quantitative data
data are
are always
always numeric.
numeric.

Ordinary
Ordinary arithmetic
arithmetic operations
operations are
are meaningful
meaningful for
for
quantitative
quantitative data.
data.
Scales of Measurement

Data

Categorical Quantitative

Numeric Non-numeric Numeric

Nominal
Nominal Ordinal Nominal Ordinal Interval Ratio
Types of Data
Data

Categorical Numerical

Examples:
 Marital Status
 Political Party Discrete Continuous
 Eye Color
(Defined categories)
Examples: Examples:
 Number of Children  Weight
 Defects per hour  Voltage
(Counted items) (Measured characteristics)
Example

Firm Sales Industry Group S&P Rating


IBM 66,346 Office Equipment A
Exxon 59,023 Fuel A-
GE 40,482 Conglomerates A+
AT&T 34,357 Telecommunications A-
ExampleCro(continued)
ss-
ction
al
Se
Multivariate Data (3 variables)
Firm Sales Industry Group S&P Rating
IBM 66,346 Office Equipment A
Exxon 59,023 Fuel A-
GE 40,482 Conglomerates A+
AT&T 34,357 Telecommunications A-

Elementary Quantitativ Nominal Ordinal


units e variable Qualitative Qualitative
variable variable
Usage Potential of Various
Levels of Data
Ratio
Interval
Ordinal

Nominal
Data Level, Operations,
and Statistical Methods
Statistical
Data Level Meaningful Operations
Methods

Nominal Classifying and Counting Nonparametric

Ordinal All of the above plus Ranking Nonparametric

Interval All of the above plus Addition, Parametric


Subtraction, Multiplication, and
Division

Ratio All of the above Parametric


Cross-Sectional Data

Cross-sectional
Cross-sectional data
data are
are collected
collected at
at the
the same
same or
or
approximately
approximately the
the same
same point
point in
in time.
time.

Example:
Example: data
data detailing
detailing the
the number
number ofof building
building
permits
permits issued
issued in
in February
February 2010
2010 in
in each
each of
of the
the
counties
counties of
of Ohio
Ohio
Time Series Data

Time
Time series
series data
data are
are collected
collected over
over several
several time
time
periods.
periods.

Example:
Example: datadata detailing
detailing the
the number
number of
of building
building
permits
permits issued
issued in
in Lucas
Lucas County,
County, Ohio
Ohio in
in each
each of
of
the
the last
last 36
36 months
months
Time-Series or Cross-Sectional?
• Time-Series Data: Data values recorded in meaningful sequence
• Elementary units might be days or quarters or years
• e.g., Daily Dow-Jones stock market average close for the past 90 days
• e.g., Your firm’s quarterly sales over the past 5 years
• Cross-Sectional Data: No meaningful sequence
• e.g., Sales of 30 companies
• e.g., Productivity of each sales division
• Easier than time series!
Example
Year Unemployment Rate
2003 5.7%
2004 5.4%
2005 4.9%
2006 4.4%
2007 5.0%
2008 7.3%
2009 9.9%
2010 9.4%
Time
Example serie
s

Year Unemployment Rate


2003 5.7%
2004 5.4%
2005 4.9%
2006 4.4%
2007 5.0%
2008 7.3%
2009 9.9%
2010 9.4%

Elementary unit
defined by “year” Quantitative data
Stock Market – Time Series
• Dow Jones Stock Index, monthly since 1928

Dow Jones Industrial Stock Market Index, Monthly from 1928 to early 2011

16,000
14,000
12,000
10,000
8,000
6,000
4,000
2,000
0
1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
Year
Basic Vocabulary of
Statistics
Basic Vocabulary of Statistics

POPULATION
A population consists of all the items or individuals about which
you want to draw a conclusion.

SAMPLE
A sample is the portion of a population selected for analysis.

PARAMETER
A parameter is a numerical measure that describes a
characteristic of a population.

STATISTIC
A statistic is a numerical measure that describes a characteristic of
a sample.
Population vs. Sample
Population Sample

Measures used to describe the Measures computed from


population are called parameters sample data are called statistics
Population - the set of all elements of interest in a
particular study
Sample - a subset of the population

Statistical inference - the process of using data obtained


from a sample to make estimates
and test hypotheses about the
characteristics of a population
Census - collecting data for the entire population

Sample survey - collecting data for a sample


Population Sample

Subset

Parameter Statistic
Populations have Parameters Samples have Statistics.
Descriptive measures of population descriptive measures of sample

A census is a complete enumeration of every item in a population.


Symbols for
Population Parameters
 denotes population parameter


2
denotes population variance
 denotes population standard deviation
Symbols for
Sample Statistics

x denotes sample mean


S
2
denotes sample variance
S denotes sample standard deviation
Types of statistics
Descriptive
inferential
Types of Statistics
• Statistics
• The branch of mathematics that transforms data into useful
information for decision makers.

Descriptive Statistics Inferential Statistics

Collecting, summarizing, and Drawing conclusions and/or


describing data making decisions concerning a
population based only on sample
data
Descriptive Statistics

• Collect data
• e.g., Survey

• Present data
• e.g., Tables and graphs

• Characterize data
• e.g., Sample mean =
X i

n
Inferential Statistics
• Estimation
• e.g., Estimate the population
mean weight using the sample
mean weight
• Hypothesis testing
• e.g., Test the claim that the
population mean weight is 120
pounds
Drawing conclusions about a large group of individuals based on a subset of the
large group.
Descriptive Statistics

Most of the statistical information in newspapers,


magazines, company reports, and other publications
consists of data that are summarized and presented
in a form that is easy to understand.

Such summaries of data, which may be tabular,


graphical, or numerical, are referred to as descriptive
statistics.
Probability
• “Inverse” of statistics
Statistics
The You
world see
Probability

• Statistics: generalizes from data to the world


• Probability: “What if …” Assuming you know how the world works, what data
are you likely to see?
• Examples of probability:
• Flip coin, stock market, future sales, IRS audit, …
• Foundation for statistical inference
Statistical Inference
Statistical inference is the process of making an estimate, prediction, or
decision about a population based on a sample.

Population

Sample

Inference

Statistic
Parameter

What can we infer about a Population’s Parameters


based on a Sample’s Statistics?
Process of Inferential Statistics
Calculate x
to estimate 
Population Sample
 x
(parameter ) (statistic )

Select a
random sample
Sources of data collection
Collecting Data Correctly Is A Critical Task
DCOVA
 Need to avoid data flawed by biases,
ambiguities, or other types of errors.

 Results from flawed data will be suspect or in


error.

 Even the most sophisticated statistical


methods are not very useful when the data is
flawed.
Developing Operational Definitions Is Crucial To
Avoid Confusion / Errors
DCOVA
• An operational definition is a clear and precise
statement that provides a common understanding of
meaning

• In the absence of an operational definition


miscommunications and errors are likely to occur.

• Arriving at operational definition(s) is a key part of


the Define step of DCOVA
Why to Collect Data?
 A marketing research analyst needs to assess the effectiveness of a new television
advertisement.

 A pharmaceutical manufacturer needs to determine whether a new drug is more


effective than those currently in use.

 An operations manager wants to monitor a manufacturing process to find out whether


the quality of the product being manufactured is conforming to company standards.

 An auditor wants to review the financial transactions of a company in order to


determine whether the company is in compliance with generally accepted accounting
principles.
Sources of Data
 Primary Sources: The data collector is the one using the data for analysis
 Data from a political survey
 Data collected from an experiment
 Observed data Production data from your factory
 Your firm’s marketing studies

 Secondary Sources: The person performing data analysis is not the data collector
 Analyzing census data
 Examining data from print journals or data published on the internet.
 Government data: economics and demographics
 Media reports – TV, newspapers, Internet
 Companies that specialize in gathering data
Sources of data fall into five categories
DCOVA
• Data distributed by an organization or an individual

• The outcomes of a designed experiment

• The responses from a survey

• The results of conducting an observational study

• Data collected by ongoing business activities


Examples Of Data Distributed By
Organizations or Individuals
DCOVA
• Financial data on a company provided by investment
services.

• Industry or market data from market research firms


and trade associations.

• Stock prices, weather conditions, and sports


statistics in daily newspapers.
Examples of Data From A Designed
Experiment
DCOVA
• Consumer testing of different versions of a product
to help determine which product should be pursued
further.

• Material testing to determine which supplier’s


material should be used in a product.

• Market testing on alternative product promotions to


determine which promotion to use more broadly.
Data Sources
• Statistical Studies - Experimental

In
In experimental
experimental studies
studies the
the variable
variable of
of interest
interest is
is
first
first identified.
identified. Then
Then one
one or
or more
more other
other variables
variables
are
are identified
identified and
and controlled
controlled so
so that
that data
data can
can be
be
obtained
obtained about
about how
how they
they influence
influence the
the variable
variable of
of
interest.
interest.

The
The largest
largest experimental
experimental study
study ever
ever conducted
conducted is
is
believed
believed to
to be
be the
the 1954
1954 Public
Public Health
Health Service
Service
experiment
experiment forfor the
the Salk
Salk polio
polio vaccine.
vaccine. Nearly
Nearly two
two
million
million U.S.
U.S. children
children (grades
(grades 1-
1- 3)
3) were
were selected.
selected.
Examples of Survey Data
DCOVA
• A survey asking people which laundry detergent has
the best stain-removing abilities

• Political polls of registered voters during political


campaigns.

• People being surveyed to determine their


satisfaction with a recent product or service
experience.
Examples of Data Collected From
Observational Studies
DCOVA
• Market researchers utilizing focus groups to elicit
unstructured responses to open-ended questions.

• Measuring the time it takes for customers to be


served in a fast food establishment.

• Measuring the volume of traffic through an


intersection to determine if some form of advertising
at the intersection is justified.
Data Sources

 Statistical Studies - Observational


In
In observational
observational (nonexperimental)
(nonexperimental) studies
studies no
no
attempt
attempt is
is made
made toto control
control or
or influence
influence the
the
variables
variables of
of interest.
interest. a survey is a good example

Studies
Studies of
of smokers
smokers and
and nonsmokers
nonsmokers are
are
observational
observational studies
studies because
because researchers
researchers
do
do not
not determine
determine or
or control
control
who
who will
will smoke
smoke and
and who
who will
will not
not smoke.
smoke.
Examples of Data Collected From Ongoing
Business Activities
DCOVA
• A bank studies years of financial transactions to help
them identify patterns of fraud.

• Economists utilize data on searches done via Google


to help forecast future economic conditions.

• Marketing companies use tracking data to evaluate


the effectiveness of a web site.
Structured Data Follows An Organizing Principle &
Unstructured Data Does Not
DCOVA
• A Stock Ticker Provides Structured Data:
• The stock ticker repeatedly reports a company name, the number of shares last
traded, the bid price, and the percent change in the stock price.
• Due to their inherent structure, data from tables and forms are
structured data.
• E-mails from five people concerning stock trades is an example of
unstructured data.
• In these e-mails you cannot count on the information being shared in a specific
order or format.
• This book deals exclusively with structured data
All Of The Methods In our study Deal With
Structured Data
DCOVA

• To use the techniques in this book on unstructured


data you need to convert the unstructured into
structured data.

• For many of the questions you might want to answer,


the starting point can / will be tabular data.
Data Can Be Formatted and / or Encoded In
More Than One Way
DCOVA
• Some electronic formats are more readily usable
than others.

• Different encodings can impact the precision of


numerical variables and can also impact data
compatibility.

• As you identify and choose sources of data you need


to consider / deal with these issues
Data Cleaning Is Often A Necessary Activity
When Collecting Data
DCOVA
• Often find “irregularities” in the data
• Typographical or data entry errors
• Values that are impossible or undefined
• Missing values
• Outliers
• When found these irregularities should be
reviewed / addressed
• Both Excel & Minitab can be used to address
irregularities
After Collection It Is Often Helpful To Recode
Some Variables
DCOVA
• Recoding a variable can either supplement or replace the
original variable.
• Recoding a categorical variable involves redefining categories.
• Recoding a quantitative variable involves changing this
variable into a categorical variable.
• When recoding be sure that the new categories are mutually
exclusive (categories do not overlap) and collectively
exhaustive (categories cover all possible values).
Data Acquisition Considerations

Time Requirement
• Searching for information can be time consuming.
• Information may no longer be useful by the time it
is available.
Cost of Acquisition
• Organizations often charge for information even
when it is not their primary business activity.
Data Errors
• Using any data that happen to be available or were
acquired with little care can lead to misleading
information.
Examples of Types of Variables
DCOVA

Question Responses Variable Type

Do you have a Facebook


profile? Yes or No Categorical (Qualitative)

How many text messages Numerical


have you sent in the past --------------- (discrete)
three days?
How long did the mobile Numerical
app update take to --------------- (continuous)
download?
• For each of the following variables, determine whether the variable is
categorical or numerical. If the variable is numerical, determine
whether the variable is discrete or continuous.
• Number of cellphones in the household.
• Monthly data usage ( in MB)
• Number of text messages exchanged per month
• Voice usage per month ( in minutes)
• Whether the cellphone is used for email.
• Name of the internet service provider
• Time, in hours, spend surfing the internet per week
• Whether the individual uses a mobile phone to connect to the internet
• Number of online purchases made in a month
• Organizing the data … Editing/ Coding/
Data Mining
• Search for patterns in large data sets
• Businesses data: marketing, finance, production ...
• Collected for some purpose, often useful for others
• From government or private companies
• Makes use of
• Statistics – all the basic activities, and
• Prediction, classification, clustering
• Computer science – efficient algorithms (instructions) for
• Collecting, maintaining, organizing, analyzing data
• Optimization – calculations to achieve a goal
• Maximize or minimize (e.g. sales or costs)
Computers and Statistical Analysis

 Statisticians often use computer software to perform


the statistical computations required with large
amounts of data.
 To facilitate computer usage, many of the data sets
in this book are available on the website that
accompanies the text.
 The data files may be downloaded in either Minitab
or Excel formats.
 Also, the Excel add-in StatTools can be downloaded
from the website.
 Chapter ending appendices cover the step-by-step
procedures for using Minitab, Excel, and StatTools.
Statistical software
• MS- EXCEL
• Minitab
• SAS
• SPSS
• StatTools

( chapter 1 pages : 1-32)

Das könnte Ihnen auch gefallen