Sie sind auf Seite 1von 74

Statistics for Decision Making

Dr S G Deshmukh

Mechanical Engineering Department Indian Institute of Technology Delhi


sgdeshmukh@indiatimes.com

What is a Decision?
Decision
A reasoned choice among alternatives

Examples:
Where to advertise a new product What stock to buy What movie to see Where to go for dinner Where to locate a new plant Which mode of transportation to choose

Decision Elements
Decision Statement
What are we trying to decide?

Alternative:
What are the options?

Decision Criteria:
How are we going to judge the merits of each alternative?

Decision making process


Intelligence
Sensing, finding, identifying, and defining problem/opportunity

Design
Diagnosing the problem/opportunity Generating alternatives

Choice
Choosing the best alternative

Types of Decisions
Type of structure - Nature of task
Structured Unstructured

Level of decision making - Scope


Strategic Managerial Operational
5

Nature of Decision
Structured Problems Routine and repetitive with standard solution Well defined decision making procedure Given a well-defined set of input, a well defined set of output is defined Semi-structured Problems Has some structured aspect Some of the inputs or outputs or procedures are not well defined Unstructured Problems All phases of decision making process are unstructured Not well defined input, output set and procedures
6

Scope of Decision
Operational Planning and Control:
Focus on efficient and effective execution of specific tasks. They affect activities taking place right now E.g... What should be today's production level

Management Control and Tactical Planning


Focus on effective utilization of resources more longer range planning horizon E.g... What is next years production level

Strategic Planning
Long-range goals and policies for resource allocation E.g... What new products should be offered
7

DECISION PROCESS Turbulent & Difficult Environment Quantitative MODELS

Complex & ill Structured Problems

Intuition

Judgment

Placid & Uniform Environment

Simple & Well-structured

INTELLIGENCE, INFORMATION AND DATA


8

Intuitive Decision Making


High Level of Uncertainty Little Precedent to Follow Variables Less Scientifically Predictable When Facts Are Limited When Facts Do Not Clearly Point the Way to Go When Analytical Data Are of Little Use Several Alternatives, With Good Arguments for Each Time Is Limited

Satisficing Model Implicit Favorite Model

Observation..

We face numerous decisions in life & business. We can use Statistics to analyze the potential outcomes of decision alternatives.

10

Few examples..1..
AMUL (Largest Milk producer in India)
Must determine product mix Schedules must meet timely requirements for perishable items Developed optimization model to determine the above (integrated Production-distribution) model:

increase in annual revenue Better utilization of capacity Pricing schedules


11

Role of statistics here


Formats for Data collection Categorization of data Analysis of data Interpretation Uncertainties with coefficients in the model Sensitivity analysis

12

Few examples..2..
Samsung Electronics Leading consumer electronics manufacturer Semiconductor facilities cost $ 2-3 billion High equipment utilization is key Developed comprehensive planning and scheduling system to control WIP Benefits: Cut cycle times in half

13

Role of Statistics here..


Data formats Formulation of hypotheses Compare the Existing vs Proposed scheduling system Conclude with certain degree of confidence Simulation of various scenarios

14

Quantitative Analysis
Quantitative Analysis Process Model Development Data Preparation Model Solution Report Generation

Transforming Model Inputs into Output


Uncontrollable Inputs (Environmental Factors)

Controllable Inputs (Decision Variables)

Mathematical Model

Output (Projected Results)

Real - World System


Assumed Realworld System

MODEL

Definition of the Problem * Construction of the Model Solution of the Model * Validation of the Model Implementation of the Final Result

Assumptions Approximations REALITY MODEL Algorithm Heuristic

Implementation

INTERPRETATION

SOLUTION ANALYSIS

General Modeling Scheme


18

Characteristics of Models

Model is an abstraction of reality Models are usually simplified versions of the things they represent A valid model accurately represents the relevant characteristics of the object or decision being studied

19

Benefits of Modeling

Economy - it is often less costly to analyze decision problems using models. Timeliness - models often deliver needed information more quickly than their realworld counterparts. Feasibility - models can be used to do things that would be impossible. Models give us insight & understanding that improves decision making.
20

What are Statistician supposed to do?

Statisticians collect and analyze data, then calculate results using a specific design. They are able to draw conclusions and make decisions in the face of uncertainty.

21

What Statisticians Do

Statisticians look for patterns in data to help make decisions in business, industry, and the biological, physical, psychological, and social sciences. Statisticians help make important advances in scientific research and work in opinion polling, market research, survey management, data analysis, statistical experiments, and education. Statisticians use quantitative abilities, statistical knowledge, and computing and communication skills to collaborate with other scientists to work on challenging problems
22

Statistics

The science of data to answer research questions


Formulate

a research question(s) (hypothesis) Collect data Analyze and summarize data Draw conclusions to answer research question(s)

Statistical Inference

In

the presence of variation

23

Answers Questions from Everyday Life


Business: Will a new marketing strategy be profitable? Industry: Will a products life exceed the warranty period? Medicine: Will this years Dengue vaccine reduce the chance of Dengue? Education: Will technology improve learning? Government: Will a change in interest rates affect inflation?

24

Variation

What if everyone:
Looked

the same Thought the same Believed the same

How many people would you have to interview to know everything about the population with regard to looks, thoughts, and beliefs?
25

Variation

Populations with variation


Everyone

looks different Everyone thinks different Everyone believes different

Interviews or observations are required on multiple members of the population for valid conclusions about population characteristics.
26

Variation

Variation is everywhere
Individuals Repeated

individual Almost everything varies over time

measurements on the same

Because variation is everywhere, statistical conclusions are not certain.


Probability

statement Confidence statement Margin of error

27

Can Statistics Be Trusted?


There are three kinds of lies: Lies, damned lies, and statistics.
--Mark Twain

It is easy to lie with statistics. But it is easier to lie without them.


--Frederick Mosteller

Figures wont lie but liars will figure.


--Charles Grosvenor
28

Where the Data Come From is Important


Good data intelligent human effort Bad data laziness, lack of understanding, or a desire to mislead Know where the data come from Understand statistics Example: Did you know that 45% of statistics are made up on the spot????

29

Manipulating the Facts


Data collection sampling and measurement biases, ignoring influential variables Data summarization graphically misrepresenting data, choosing misleading statistics Statistical Inference reporting invalid conclusions and interpretations

30

Manipulating Data Collection

Sampling biases:
One

group in a population is overrepresented compared to another. Example: New Longitudinal Study Finds that Having a Working Mother Does No Significant Harm to Children. The sample was not representative of average or higher income families.
31

Manipulating Data Collection

Sampling biases:
One

group in a population is overrepresented compared to another. Example: Ms Agony asked readers of her column in Readers Digest whether they would have children again if they had it to do over. 70% of respondents said NO. Was the sample representative of all parents? Her invitation attracted parents that regretted having children. Most parents do not regret having children based on scientific studies 32 selecting random samples of parents.

Manipulating Data Production

Ignoring influential variables:


Reporting

results without considering important influential variables.

Example Differences in pay due to gender


As

of 2004, full-time employed women earned on average only about 76 percent as much as full-time employed men Does this difference show that women are discriminated against? Occupation has been ignored. More men have received training for higher paying jobs. 33

Manipulating Data Summarization

Graphically misrepresenting data

34

Manipulating Statistical Inference


Reporting invalid conclusions and interpretations Example: New Jail Decreases Crime

Did

the new jail really cause the decrease in crime? Or did the decrease just happen when the new jail opened?

35

Understanding Data Individuals & Variables

Individuals objects described by a set of data. May be people, animals, or things


Also

called subjects or units.

Variables any characteristic of an individual. A variable can take different values for different individuals.

36

Variables
A variable can be: Numerical/Quantitative:
age:

21 years, 12 weeks length: 5 cm, 24.2 miles pulse: 72 bpm

Categorical/Qualitative:
sex:

Male/Female color: red/blue/green/. . . political party: Republican/Democrat/other


37

Variables

Can variables described by numbers ever be categorical?


Ranges

of numbers

Age categories

PAN

Number PIN Code

38

Variables

What variables would we be interested in? Are they categorical or numerical?


Who

supports expanded Bus schedules? Does a new diet help weight loss? Does taking aspirin prevent heart attacks? Which rivers are polluted?

39

Statistical Concepts & Tools


Data representation Various Probability Distributions Discrete (Binomial, Geometric, Poisson, Uniform etc.) Continuous (Uniform, Exponential, Normal etc.) Central Limit Theorem Moment generating functions Distribution of Sample Means Point Estimates Confidence Interval Type I and Type II errors Hypothesis Testing Regression Anova DOE Non-parametric tests
40

Population Versus Sample

Population the whole


a

collection of persons, objects, or items under study

Census gathering data from the entire population Sample a portion of the whole

subset of the population


41

Parameter vs. Statistic

Parameter descriptive measure of the population


Usually

represented by Greek letters

Statistic descriptive measure of a sample


Usually

represented by Roman letters


42

Levels of Data Measurement


Nominal Lowest level of measurement Ordinal Interval Ratio Highest level of measurement

43

Producing Data/Collecting Data


Sample Surveys vs. Experiments Impose treatment on subjects/units Observe response to imposed treatment

Population Snapshot

Common concern: Bias Bias: Systematically favors certain outcomes


44

Commonly used tables


Standard normal variate t Chi-square F Non-parametric

45

Central Limit Theorem

Most theory about sample means depends on assumptions that the mean comes from a normal distribution. The Central Limit Theorem says that for any population, if the sample size is large enough, the sample means will be approximately normally distributed with the mean equal to the population mean and standard deviation equal to the population standard deviation divided by the square root of n (/n).
46

Normal Distribution

Mother of all !
normal variate (Z) ~ N(, 2 ) 2 : Chi-Square Square of Z t distribution small sample size F Distribution ~ Ratio of 2
Standard Approximation

to Discrete : Binomial etc.


47

Confidence Interval to Estimate when n is Large


Point

estimate

X X=
n
XZ n or XZ X+Z n n
48

Interval

Estimate

Distribution of Sample Means for (1-)% Confidence

2
1
Z
2

2
X Z
49

Z
2

Probability Interpretation of the Level of Confidence


Pr ob[ X Z
2

X + Z ] = 1 2 n n

50

95% Confidence Intervals for

95%

X X X X X X

51

Estimating the Population Variance


Population Parameter 2 Estimator of 2

( X X )
n 1

2 formula for Single Variance

( n 1) S =
2

degrees of freedom = n - 1
52

Confidence Interval for 2

( n 1) S

2 2

( n 1) S

1 2

df = n 1

= 1 level of confidence
53

Selected 2 Distributions
df = 3 df = 5 df = 10

54

Statistical Significance

Significance is a statistical term that tells how sure you are that a difference or relationship exists. To say that a significant difference or relationship exists only tells half the story. We might be very sure that a relationship exists, but is it a strong, moderate, or weak relationship? After finding a significant relationship, it is important to evaluate its strength. Significant relationships can be strong or weak. Significant differences can be large or small. It just depends on your sample size.

55

One-Tailed and Two-Tailed Significance Tests

One important concept in significance testing is whether you use a onetailed or two-tailed test of significance. The answer is that it depends on your hypothesis. When your research hypothesis states the direction of the difference or relationship, then you use a one-tailed probability. For example, a one-tailed test would be used to test these null hypotheses: Females will not score significantly higher than males on an IQ test. Blue collar workers are will not buy significantly more product than white collar workers. Superman is not significantly stronger than the average person. In each case, the null hypothesis (indirectly) predicts the direction of the difference. A two-tailed test would be used to test these null hypotheses: There will be no significant difference in IQ scores between males and females. There will be no significant difference in the amount of product purchased between blue collar and white collar workers. There is no significant difference in strength between Superman and the average person. The one-tailed probability is exactly half the value of the two-tailed probability.

56

Steps in a Test of Hypothesis

1. Define problem. :Determine H0 and HA. Select Alpha .


2. Collect data 3. Calculate xbar as an estimate of and s as an estimate of . 4. Check assumptions: Sample size n is reasonably large (n 30) so can use normal distribution and estimate with s. Check for outliers or strong skewness in pop. dist. 5. Calculate Standard Score 6. Compare with Tabulated value to make conclusions. 7. Make conclusions in context of the problem.

If your statistic is higher than the critical value from the table
Your finding is significant.
You

reject the null hypothesis. The probability is small that the difference or relationship happened by chance, and p is less than the critical alpha level (p < alpha ).

58

If your statistic is lower than the critical value from the table
Your finding is not significant.
You

fail to reject the null hypothesis. The probability is high that the difference or relationship happened by chance, and p is greater than the critical alpha level (p > alpha ).

59

Partition of Total Sum of Squares in Partition of Total Sum of Squares in RBD RBD
SST (Total Sum of Squares)

SSE (Error Sum of Squares)

SSC (Treatment Sum of Squares)

SSR (Sum of Squares Blocks)

SSE (Sum of Squares Error)

Regression and Correlation

Regression analysis is the process of constructing a mathematical model or function that can be used to predict or determine one variable by another variable. Correlation is a measure of the degree of relatedness of two variables.
61

Simple Regression Analysis

bivariate (two variables) linear regression -- the most elementary regression model dependent variable, the variable to be predicted, usually called Y independent variable, the predictor or explanatory variable, usually called X
62

Regression Models
x

Deterministic Regression Model Y = 0 + 1X

Probabilistic Regression Model Y = 0 + 1X + 0 and 1 are population parameters 0 and 1 are estimated by sample statistics b0 and b1

63

Equation of the Simple Regression Line


Y = b0 + b1 X where :
1

= the sample intercept

b = the sample slope


Y = the predicted value of Y

64

Least Squares Analysis


( X X )( Y Y ) b= ( X X )
1 2

XY nXY = X n X
2

( X )( Y ) XY
2

X n

Y X b =Y b X = n b n
0 1 1

65

Least Squares Analysis


SS XY = ( X X )( Y Y ) = XY SS XX =

( X )( Y )
n
2

( X X )

= X

X n

b1 =

SS XY SS XX

Y X b = Y b X = n b n
0 1 1
66

Parametric vs Nonparametric Statistics

Parametric Statistics are statistical techniques based on assumptions about the population from which the sample data are collected. Assumption that data being analyzed are randomly selected from a normally distributed population. Requires quantitative measurement that yield interval or ratio level data. Nonparametric Statistics are based on fewer assumptions about the population and the parameters. Sometimes called distribution-free statistics. A variety of nonparametric statistics are available for use with nominal or ordinal data.

RUN TEST MANN-WHITNEY KOLMOGOROV-SMIRINOV CHI-SQUARE


KRUSKAL-WALLIS

Etc.

67

Which Test to use?


Goal Measurement (from Rank, Score, or Measurement Gaussian (from Non- Gaussian Population) Population) Mean, SD One-sample t test Median, interquartile range Wilcoxon test Describe one group
Compare one group to a hypothetical value

Compare two unpaired Unpaired t test groups Compare two paired groups Paired t test

Mann-Whitney test Wilcoxon test Kruskal-Wallis test

Compare three or more One-way ANOVA unmatched groups

Compare three or more Repeated-measures Friedman test matched groups ANOVA


68

Web based Decision Tree to choose a Statistical test

http://www.edu.rcsed.ac.uk/statistics/A %20simple%20algorithm%20to%20help %20decide%20the%20statistical%20test %20to%20use.htm

69

Applications of statistics..
Statistical Quality control Simulation Six-sigma

70

Checklist for A Statistical Project ..1..


Statement of purpose/question of interest Summary of data collection e.g. random sample, stratified sample, available data identify possible sources of bias Why do you believe sample was representative? Summarize the data (concise, well-labeled, easy to read) Numerical or quantitative data Graphs: Pie diagram or histogram measures of central tendency (e.g. mean or median) measures of spread (e.g. range, SD, IQR) a check for outliers (e.g. z scores,) a check for normality (prob. plot, 68-95-99.7 rule) if needed by your analysis Quantitative data Graphs: pie chart or bar graph 71 Proportion in each category

Checklist for A Statistical Project :2..


Statistical inference Quantitative data e.g. confidence intervals for mean(s), hypothesis test for mean(s), regression, ANOVA Qualitative data Include a discussion of why our method is appropriate Diagnostics Verification of any assumptions made during statistical inference Interpretation/Explanation of results What does it all mean? Use the above summaries to justify your interpretation Suggest reasons for what you have observed Overall conclusion, recommendations, future questions

72

Observation..
The objective of all experimental design, as well as of statistical methods in general, is to get the greatest amount of accurate information for the outlay of manpower, time, and money. Without a working knowledge of statistical methods no analyst can expect to reach that goal.

73

Statistics about the course MEL761


LTP Structure :3-0-2 Registered students : 27 Lab sessions : 12 Quizzes : 3 Minors: 2 Mini project : 1 Major :1

74

Das könnte Ihnen auch gefallen