Basic 2

Statistics for Decision Making
Dr S G Deshmukh
Mechanical Engineering Department Indian Institute of Technology Delhi

sgdeshmukh@indiatimes.com
What is a Decision?
Decision
A reasoned choice among alternatives
Examples:
Where to advertise a new product What stock to buy What movie to see Where to go for dinner Where to locate a new plant Which mode of transportation to choose
Decision Elements
Decision Statement
What are we trying to decide?
Alternative:
What are the options?
Decision Criteria:
How are we going to judge the merits of each alternative?
Decision making process

Intelligence
Sensing, finding, identifying, and defining problem/opportunity
Design
Diagnosing the problem/opportunity Generating alternatives
Choice
Choosing the best alternative
Types of Decisions
Type of structure - Nature of task
Structured Unstructured
Level of decision making - Scope

Strategic Managerial Operational
5
Nature of Decision
Structured Problems Routine and repetitive with standard solution Well defined decision making procedure Given a well-defined set of input, a well defined set of output is defined Semi-structured Problems Has some structured aspect Some of the inputs or outputs or procedures are not well defined Unstructured Problems All phases of decision making process are unstructured Not well defined input, output set and procedures
6
Scope of Decision
Operational Planning and Control:
Focus on efficient and effective execution of specific tasks. They affect activities taking place right now E.g... What should be today's production level
Management Control and Tactical Planning

Focus on effective utilization of resources more longer range planning horizon E.g... What is next years production level
Strategic Planning
Long-range goals and policies for resource allocation E.g... What new products should be offered
7
DECISION PROCESS Turbulent & Difficult Environment Quantitative MODELS
Complex & ill Structured Problems
Intuition
Judgment
Placid & Uniform Environment
Simple & Well-structured
INTELLIGENCE, INFORMATION AND DATA

8
Intuitive Decision Making

High Level of Uncertainty Little Precedent to Follow Variables Less Scientifically Predictable When Facts Are Limited When Facts Do Not Clearly Point the Way to Go When Analytical Data Are of Little Use Several Alternatives, With Good Arguments for Each Time Is Limited
Satisficing Model Implicit Favorite Model
Observation..

We face numerous decisions in life & business. We can use Statistics to analyze the potential outcomes of decision alternatives.
10
Few examples..1..
AMUL (Largest Milk producer in India)
Must determine product mix Schedules must meet timely requirements for perishable items Developed optimization model to determine the above (integrated Production-distribution) model:
increase in annual revenue Better utilization of capacity Pricing schedules

11
Role of statistics here

Formats for Data collection Categorization of data Analysis of data Interpretation Uncertainties with coefficients in the model Sensitivity analysis
12
Few examples..2..
Samsung Electronics Leading consumer electronics manufacturer Semiconductor facilities cost $ 2-3 billion High equipment utilization is key Developed comprehensive planning and scheduling system to control WIP Benefits: Cut cycle times in half
13
Role of Statistics here..

Data formats Formulation of hypotheses Compare the Existing vs Proposed scheduling system Conclude with certain degree of confidence Simulation of various scenarios
14
Quantitative Analysis
Quantitative Analysis Process Model Development Data Preparation Model Solution Report Generation
Transforming Model Inputs into Output

Uncontrollable Inputs (Environmental Factors)
Controllable Inputs (Decision Variables)
Mathematical Model
Output (Projected Results)
Real - World System

Assumed Realworld System
MODEL
Definition of the Problem * Construction of the Model Solution of the Model * Validation of the Model Implementation of the Final Result
Assumptions Approximations REALITY MODEL Algorithm Heuristic
Implementation
INTERPRETATION
SOLUTION ANALYSIS
General Modeling Scheme

18
Characteristics of Models

Model is an abstraction of reality Models are usually simplified versions of the things they represent A valid model accurately represents the relevant characteristics of the object or decision being studied
19
Benefits of Modeling

Economy - it is often less costly to analyze decision problems using models. Timeliness - models often deliver needed information more quickly than their realworld counterparts. Feasibility - models can be used to do things that would be impossible. Models give us insight & understanding that improves decision making.
20
What are Statistician supposed to do?
Statisticians collect and analyze data, then calculate results using a specific design. They are able to draw conclusions and make decisions in the face of uncertainty.
21
What Statisticians Do
Statisticians look for patterns in data to help make decisions in business, industry, and the biological, physical, psychological, and social sciences. Statisticians help make important advances in scientific research and work in opinion polling, market research, survey management, data analysis, statistical experiments, and education. Statisticians use quantitative abilities, statistical knowledge, and computing and communication skills to collaborate with other scientists to work on challenging problems
22
Statistics
The science of data to answer research questions

Formulate
a research question(s) (hypothesis) Collect data Analyze and summarize data Draw conclusions to answer research question(s)
Statistical Inference
In
the presence of variation
23
Answers Questions from Everyday Life

Business: Will a new marketing strategy be profitable? Industry: Will a products life exceed the warranty period? Medicine: Will this years Dengue vaccine reduce the chance of Dengue? Education: Will technology improve learning? Government: Will a change in interest rates affect inflation?
24
Variation
What if everyone:
Looked
the same Thought the same Believed the same
How many people would you have to interview to know everything about the population with regard to looks, thoughts, and beliefs?
25
Variation
Populations with variation

Everyone
looks different Everyone thinks different Everyone believes different
Interviews or observations are required on multiple members of the population for valid conclusions about population characteristics.
26
Variation
Variation is everywhere
Individuals Repeated
individual Almost everything varies over time
measurements on the same
Because variation is everywhere, statistical conclusions are not certain.

Probability
statement Confidence statement Margin of error
27
Can Statistics Be Trusted?

There are three kinds of lies: Lies, damned lies, and statistics.
--Mark Twain
It is easy to lie with statistics. But it is easier to lie without them.

--Frederick Mosteller
Figures wont lie but liars will figure.

--Charles Grosvenor
28
Where the Data Come From is Important

Good data intelligent human effort Bad data laziness, lack of understanding, or a desire to mislead Know where the data come from Understand statistics Example: Did you know that 45% of statistics are made up on the spot????
29
Manipulating the Facts

Data collection sampling and measurement biases, ignoring influential variables Data summarization graphically misrepresenting data, choosing misleading statistics Statistical Inference reporting invalid conclusions and interpretations
30
Manipulating Data Collection
Sampling biases:
One
group in a population is overrepresented compared to another. Example: New Longitudinal Study Finds that Having a Working Mother Does No Significant Harm to Children. The sample was not representative of average or higher income families.
31
Manipulating Data Collection
Sampling biases:
One
group in a population is overrepresented compared to another. Example: Ms Agony asked readers of her column in Readers Digest whether they would have children again if they had it to do over. 70% of respondents said NO. Was the sample representative of all parents? Her invitation attracted parents that regretted having children. Most parents do not regret having children based on scientific studies 32 selecting random samples of parents.
Manipulating Data Production
Ignoring influential variables:

Reporting
results without considering important influential variables.
Example Differences in pay due to gender

As
of 2004, full-time employed women earned on average only about 76 percent as much as full-time employed men Does this difference show that women are discriminated against? Occupation has been ignored. More men have received training for higher paying jobs. 33
Manipulating Data Summarization
Graphically misrepresenting data
34
Manipulating Statistical Inference

Reporting invalid conclusions and interpretations Example: New Jail Decreases Crime
Did
the new jail really cause the decrease in crime? Or did the decrease just happen when the new jail opened?
35
Understanding Data Individuals & Variables
Individuals objects described by a set of data. May be people, animals, or things

Also
called subjects or units.
Variables any characteristic of an individual. A variable can take different values for different individuals.
36
Variables
A variable can be: Numerical/Quantitative:
age:
21 years, 12 weeks length: 5 cm, 24.2 miles pulse: 72 bpm
Categorical/Qualitative:
sex:
Male/Female color: red/blue/green/. . . political party: Republican/Democrat/other

37
Variables
Can variables described by numbers ever be categorical?

Ranges
of numbers
Age categories
PAN
Number PIN Code
38
Variables
What variables would we be interested in? Are they categorical or numerical?

Who
supports expanded Bus schedules? Does a new diet help weight loss? Does taking aspirin prevent heart attacks? Which rivers are polluted?
39
Statistical Concepts & Tools

Data representation Various Probability Distributions Discrete (Binomial, Geometric, Poisson, Uniform etc.) Continuous (Uniform, Exponential, Normal etc.) Central Limit Theorem Moment generating functions Distribution of Sample Means Point Estimates Confidence Interval Type I and Type II errors Hypothesis Testing Regression Anova DOE Non-parametric tests
40
Population Versus Sample
Population the whole

a
collection of persons, objects, or items under study
Census gathering data from the entire population Sample a portion of the whole
subset of the population

41
Parameter vs. Statistic
Parameter descriptive measure of the population

Usually
represented by Greek letters
Statistic descriptive measure of a sample

Usually
represented by Roman letters

42
Levels of Data Measurement

Nominal Lowest level of measurement Ordinal Interval Ratio Highest level of measurement
43
Producing Data/Collecting Data

Sample Surveys vs. Experiments Impose treatment on subjects/units Observe response to imposed treatment
Population Snapshot
Common concern: Bias Bias: Systematically favors certain outcomes

44
Commonly used tables

Standard normal variate t Chi-square F Non-parametric
45
Central Limit Theorem
Most theory about sample means depends on assumptions that the mean comes from a normal distribution. The Central Limit Theorem says that for any population, if the sample size is large enough, the sample means will be approximately normally distributed with the mean equal to the population mean and standard deviation equal to the population standard deviation divided by the square root of n (/n).
46
Normal Distribution
Mother of all !
normal variate (Z) ~ N(, 2 ) 2 : Chi-Square Square of Z t distribution small sample size F Distribution ~ Ratio of 2
Standard Approximation
to Discrete : Binomial etc.

47
Confidence Interval to Estimate when n is Large

Point
estimate
X X=
n
XZ n or XZ X+Z n n
48
Interval
Estimate
Distribution of Sample Means for (1-)% Confidence
2
1
Z
2
2
X Z
49
Z
2
Probability Interpretation of the Level of Confidence

Pr ob[ X Z
2
X + Z ] = 1 2 n n
50
95% Confidence Intervals for
95%
X X X X X X
51
Estimating the Population Variance

Population Parameter 2 Estimator of 2
( X X )
n 1
2 formula for Single Variance
( n 1) S =
2
degrees of freedom = n - 1
52
Confidence Interval for 2
( n 1) S
2 2
( n 1) S
1 2
df = n 1
= 1 level of confidence
53
Selected 2 Distributions
df = 3 df = 5 df = 10
54
Statistical Significance
Significance is a statistical term that tells how sure you are that a difference or relationship exists. To say that a significant difference or relationship exists only tells half the story. We might be very sure that a relationship exists, but is it a strong, moderate, or weak relationship? After finding a significant relationship, it is important to evaluate its strength. Significant relationships can be strong or weak. Significant differences can be large or small. It just depends on your sample size.
55
One-Tailed and Two-Tailed Significance Tests
One important concept in significance testing is whether you use a onetailed or two-tailed test of significance. The answer is that it depends on your hypothesis. When your research hypothesis states the direction of the difference or relationship, then you use a one-tailed probability. For example, a one-tailed test would be used to test these null hypotheses: Females will not score significantly higher than males on an IQ test. Blue collar workers are will not buy significantly more product than white collar workers. Superman is not significantly stronger than the average person. In each case, the null hypothesis (indirectly) predicts the direction of the difference. A two-tailed test would be used to test these null hypotheses: There will be no significant difference in IQ scores between males and females. There will be no significant difference in the amount of product purchased between blue collar and white collar workers. There is no significant difference in strength between Superman and the average person. The one-tailed probability is exactly half the value of the two-tailed probability.
56
Steps in a Test of Hypothesis
1. Define problem. :Determine H0 and HA. Select Alpha .

2. Collect data 3. Calculate xbar as an estimate of and s as an estimate of . 4. Check assumptions: Sample size n is reasonably large (n 30) so can use normal distribution and estimate with s. Check for outliers or strong skewness in pop. dist. 5. Calculate Standard Score 6. Compare with Tabulated value to make conclusions. 7. Make conclusions in context of the problem.
If your statistic is higher than the critical value from the table
Your finding is significant.
You
reject the null hypothesis. The probability is small that the difference or relationship happened by chance, and p is less than the critical alpha level (p < alpha ).
58
If your statistic is lower than the critical value from the table
Your finding is not significant.
You
fail to reject the null hypothesis. The probability is high that the difference or relationship happened by chance, and p is greater than the critical alpha level (p > alpha ).
59
Partition of Total Sum of Squares in Partition of Total Sum of Squares in RBD RBD
SST (Total Sum of Squares)
SSE (Error Sum of Squares)
SSC (Treatment Sum of Squares)
SSR (Sum of Squares Blocks)
SSE (Sum of Squares Error)
Regression and Correlation
Regression analysis is the process of constructing a mathematical model or function that can be used to predict or determine one variable by another variable. Correlation is a measure of the degree of relatedness of two variables.
61
Simple Regression Analysis
bivariate (two variables) linear regression -- the most elementary regression model dependent variable, the variable to be predicted, usually called Y independent variable, the predictor or explanatory variable, usually called X
62
Regression Models
x
Deterministic Regression Model Y = 0 + 1X
Probabilistic Regression Model Y = 0 + 1X + 0 and 1 are population parameters 0 and 1 are estimated by sample statistics b0 and b1
63
Equation of the Simple Regression Line

Y = b0 + b1 X where :
1
= the sample intercept
b = the sample slope

Y = the predicted value of Y
64
Least Squares Analysis

( X X )( Y Y ) b= ( X X )
1 2
XY nXY = X n X
2
( X )( Y ) XY
2
X n
Y X b =Y b X = n b n
0 1 1
65
Least Squares Analysis

SS XY = ( X X )( Y Y ) = XY SS XX =
( X )( Y )
n
2
( X X )
= X
X n
b1 =
SS XY SS XX
Y X b = Y b X = n b n
0 1 1
66
Parametric vs Nonparametric Statistics
Parametric Statistics are statistical techniques based on assumptions about the population from which the sample data are collected. Assumption that data being analyzed are randomly selected from a normally distributed population. Requires quantitative measurement that yield interval or ratio level data. Nonparametric Statistics are based on fewer assumptions about the population and the parameters. Sometimes called distribution-free statistics. A variety of nonparametric statistics are available for use with nominal or ordinal data.

RUN TEST MANN-WHITNEY KOLMOGOROV-SMIRINOV CHI-SQUARE

KRUSKAL-WALLIS
Etc.
67
Which Test to use?

Goal Measurement (from Rank, Score, or Measurement Gaussian (from Non- Gaussian Population) Population) Mean, SD One-sample t test Median, interquartile range Wilcoxon test Describe one group
Compare one group to a hypothetical value
Compare two unpaired Unpaired t test groups Compare two paired groups Paired t test
Mann-Whitney test Wilcoxon test Kruskal-Wallis test
Compare three or more One-way ANOVA unmatched groups
Compare three or more Repeated-measures Friedman test matched groups ANOVA

68
Web based Decision Tree to choose a Statistical test
http://www.edu.rcsed.ac.uk/statistics/A %20simple%20algorithm%20to%20help %20decide%20the%20statistical%20test %20to%20use.htm
69
Applications of statistics..
Statistical Quality control Simulation Six-sigma
70
Checklist for A Statistical Project ..1..

Statement of purpose/question of interest Summary of data collection e.g. random sample, stratified sample, available data identify possible sources of bias Why do you believe sample was representative? Summarize the data (concise, well-labeled, easy to read) Numerical or quantitative data Graphs: Pie diagram or histogram measures of central tendency (e.g. mean or median) measures of spread (e.g. range, SD, IQR) a check for outliers (e.g. z scores,) a check for normality (prob. plot, 68-95-99.7 rule) if needed by your analysis Quantitative data Graphs: pie chart or bar graph 71 Proportion in each category
Checklist for A Statistical Project :2..

Statistical inference Quantitative data e.g. confidence intervals for mean(s), hypothesis test for mean(s), regression, ANOVA Qualitative data Include a discussion of why our method is appropriate Diagnostics Verification of any assumptions made during statistical inference Interpretation/Explanation of results What does it all mean? Use the above summaries to justify your interpretation Suggest reasons for what you have observed Overall conclusion, recommendations, future questions
72
Observation..
The objective of all experimental design, as well as of statistical methods in general, is to get the greatest amount of accurate information for the outlay of manpower, time, and money. Without a working knowledge of statistical methods no analyst can expect to reach that goal.
73
Statistics about the course MEL761

LTP Structure :3-0-2 Registered students : 27 Lab sessions : 12 Quizzes : 3 Minors: 2 Mini project : 1 Major :1
74

Basic 2

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Basic 2

Hochgeladen von

Copyright:

Verfügbare Formate

Statistics for Decision Making

Mechanical Engineering Department Indian Institute of Technology Delhi

Decision making process

Level of decision making - Scope

Management Control and Tactical Planning

DECISION PROCESS Turbulent & Difficult Environment Quantitative MODELS

Complex & ill Structured Problems

Placid & Uniform Environment

Simple & Well-structured

INTELLIGENCE, INFORMATION AND DATA

Intuitive Decision Making

Satisficing Model Implicit Favorite Model

increase in annual revenue Better utilization of capacity Pricing schedules

Role of statistics here

Role of Statistics here..

Transforming Model Inputs into Output

Controllable Inputs (Decision Variables)

Output (Projected Results)

Real - World System

Assumptions Approximations REALITY MODEL Algorithm Heuristic

General Modeling Scheme

What are Statistician supposed to do?

The science of data to answer research questions

the presence of variation

Answers Questions from Everyday Life

the same Thought the same Believed the same

Populations with variation

looks different Everyone thinks different Everyone believes different

individual Almost everything varies over time

measurements on the same

Because variation is everywhere, statistical conclusions are not certain.

statement Confidence statement Margin of error

Can Statistics Be Trusted?

It is easy to lie with statistics. But it is easier to lie without them.

Figures wont lie but liars will figure.

Where the Data Come From is Important

Manipulating the Facts

Manipulating Data Collection

Manipulating Data Collection

Manipulating Data Production

Ignoring influential variables:

results without considering important influential variables.

Example Differences in pay due to gender

Manipulating Data Summarization

Graphically misrepresenting data

Manipulating Statistical Inference

Understanding Data Individuals & Variables

Individuals objects described by a set of data. May be people, animals, or things

called subjects or units.

21 years, 12 weeks length: 5 cm, 24.2 miles pulse: 72 bpm

Male/Female color: red/blue/green/. . . political party: Republican/Democrat/other

Can variables described by numbers ever be categorical?

Number PIN Code

What variables would we be interested in? Are they categorical or numerical?

Statistical Concepts & Tools

Population Versus Sample

Population the whole

collection of persons, objects, or items under study

subset of the population

Parameter vs. Statistic

Parameter descriptive measure of the population

represented by Greek letters

Statistic descriptive measure of a sample