Sie sind auf Seite 1von 83

AMITY

We nurture talent
Quantitative Techniques in
Management
ADL-07

Arun Sharma
Professor
Amity Institute of Information Technology
Amity University
---Uttar Pradesh---
arunsharma@aiit.amity.edu
Quantitative Techniques in
Management

• Statistics

• Optimization Techniques
Statistics

•Measures of Central Tendency

•Correlation Regression

•Probability

•Sampling Theory and Distribution

•Decision Theory
Quantitative Techniques and Decision
Making
Since the complexity of business environment makes
the process of decision making difficult, the decision
maker can not rely only on his observations, experience
or evaluation to make a decision.

Decisions have to be made upon data/quantities which


show relationship, indicate trends and show rates of
changes I various relevant variables.
What is statistics?

“…the collection and analysis of numerical data


in large quantities.” – Oxford English Dictionary
“The mathematics of the collection,
organization, and interpretation of numerical
data, especially the analysis of population
characteristics by inference from sampling.” –
American Heritage Dictionary

“Statistics: the mathematical theory of


ignorance.” – Morris Kline
What is statistics? Contd…

• “Statistical thinking will one day be as necessary for


efficient citizenship as the ability to read and write.”
– H.G. Wells

• “A set of procedures and rules for reducing large


masses of data into manageable proportions
allowing us to draw conclusions from those data.”
McCarthy
Why statistics?
• It presents the data in a definite form.
• It simplifies complex mass of data.
• It classifies numerical facts.
• It furnishes techniques to compare.
• Ii results to interpret conditions of better
decision making.

…Therefore we collect samples as proxies of


the greater population of individuals or items
that make up the phenomenon we are
interested in.
Use of Statistics
e.g. in business

Statistical information is needed from the business is


launched (financial/production plan of the proposed
unit) till the time of its exit (marketing plan).

All the factors that are likely to affect judgment on these


matters are quantitatively weighted and statistically
analyzed before taking any decision.
Limitation of Statistics

• Statistical Results are only approximates and sometimes may


results in poor decisions.
• Deals only with aggregation of facts and no importance is
attached to individual items. Therefore, it is well suited only
where group characteristics are desired to be studied.
• It deals only those problems, which are capable of being
quantitatively measured and numerically expressed.
However, subjects like Intelligence, Health are not
measurable directly and hence are not suitable for statistical
analysis.
Two Types of Statistics

• Descriptive statistics of a POPULATION

• Inferential statistics of SAMPLES from a population.


Descriptive Statistics
• Definition: Quantitative methods of
organizing, summarizing, and presenting data
numerical data in an informative way.
• Describe the overall characteristics of a
sample (and hence the population?)
• Transform raw data into more easily
understood forms
• Central tendency – “average” character of the
data.
• Relevant notation (Greek):
µ mean; N population size; ∑ sum
Inferential (Analytical)
Statistics
•Definition: The branch of statistics used to
make inferences or judgments about a larger
population based on the data collected from a
smaller sample drawn from the population.

•Assumptions are made that the sample


reflects the population in an unbiased form.
Roman Notation:
X mean; n sample size; ∑ sum
Basic Ter ms

Measurement –assignment of a number to something


Data –collection of measurements
Sample –collected data Population –all possible data
Variable –a property with respect to which data from a
sample differ in some measurable way.
Samples
• Definition: A subset of the target population
Random Samples:
– The individuals in the samples are randomly
selected
– Each member of the population has a known, but
possibly non-equal, chance of being included in the
sample
Independent Samples:
– A sample should have no effect on and is not
affected by other samples selected from the same
population, or from different populations
Types of Variables
Independent Variable –controlled or manipulated by the
researcher; causes a change in the dependent variable.
(x-axis)

Dependent Variable –the variable being measured (y-


axis)

Discreet Variable –has a fixed value

Continuous Variable -can assume any value


Grouped Data

Large quantities of data are easier to handle if we


group them in a frequency table. Grouped data
does not enable exact values for the mean, median
and mode to be calculated. Alternative methods of
analyzing the data have to be used.
An estimate for the mean can be obtained by
assuming that each of the raw data values takes the
midpoint value of the interval in which it has been
placed.
Grouped Data contd..
Data is grouped into 8 class intervals of width 4.

number of laps frequency (x)


1-5 2
6 – 10 9
11 – 15 15
16 – 20 20
21 – 25 17
26 – 30 25
31 – 35 2
36 - 40 1
Frequency Distribution
• The spread of data along its range
– Either a mathematical description
– And/or a visual description…
• To create a frequency histogram
– Define the categories, intervals or classes
– Count the number of measurements that fall
into each class
– Plot classes along x-axis
– Plot the counts (frequencies) on y-axis
4. Frequency Distribution

200 Grades for 1st Stats Practical (1991-2002)


180
160
140
Frequency

120
100
80
60
40
20
0
25 30 35 40 45 50 55 60 65 70 75 80 85
Grade (in percent)
Variable X
Measures of Central Tendency

– Mean, Median, Mode, Range, Standard


Deviation, Variance, Min, Max, etc.
Measures of Centr al
Tendency

These measures tap into the average distribution of a set


of scores or values in the data.
–Mean
–Median
–Mode
Mean
The “mean” of some data is the average score or value,
such as the average age of students or average weight of
professors that like to eat donuts.

Inferential mean of a sample: X=(∑X)/n


Mean of a population: µ =(∑X)/N
Find the Mean
• Let’s try it! Find the mean for the following numbers
5, 3, 8, 7, 4, 3, 5

• Hint : To find the mean, add all the data and divide by
the number of data

35 ÷ 7 = 5

The mean is 5.
Pr oblem of being “mean”

• The main problem associated with the mean value


of some data is that it is sensitive to outliers.

• Example, the average weight of political science


professors might be affected if there was one in the
department that weighed 600 pounds.
Heavy Weight Professors
Professor Weight Weight

Schmuggles 165 165


Bopsey 213 213
Pallitto 189 410
Homer 187 610
Schnickerson 165 165
Levin 148 148
Honkey-Doorey 251 251
Zingers 308 308
Boehmer 151 151
Queenie 132 132
Googles-Boop 199 199
Calzone 227 227
194.6 248.3
Grouped
Data
First find the midpoints of each class. Then
multiply by the frequency. Find the totals, and
then divide to find the estimate for the mean.

number of laps frequency midpoint(x) mp x f


1-5 2 3 6
6 – 10 9 8 72
11 – 15 15 13 195
16 – 20 20 18 360
21 – 25 17 23 391
26 – 30 25 28 700
31 – 35 2 33 66
36 - 40 1 38 38
∑f = 91 ∑ fx = 1828
Mean estimate = 1828/91 = 20.1 laps
T he Median
• Because the mean average can be sensitive to
extreme values, the median is sometimes
useful and more accurate.

• The median is simply the middle value among


some scores of a variable. (no standard
formula for its computation)
Median
Professor Weight Weight

Schmuggles 165 Rank order and


132
Bopsey 213 choose middle value.
148
Pallitto 189
151
Homer 187 If even then average
between two in the 165
Schnickerson 165
middle 165
Levin 148
187
Honkey-Doorey 251
Zingers 308 189
Boehmer 151 199
Queenie 132 213
Googles-Boop 199 227
Calzone 227 251
194.6 308
Median
• For an even number of values…

Raw data Sorted data


4 1
2 2
5 4 ← Median
1 5 (4 + 5) / 2 = 4.5
7 6
10 7
Grouped Data
Find the class which contains the median value:

number of laps frequency (x) Cum. frequency ∑f = 91


1-5 2 2
(91+1)/2 =
6 – 10 9 11
46
11 – 15 15 26

16 – 20 20 46

21 – 25 17 63

26 – 30 25 88

31 – 35 2 90

36 - 40 1 91

The 46th data value is in the 16 – 20 class


Median for Grouped Data
N/2 -C

Median = L+ *i
f

L: Lower Limit of the Median Class


N: Sum of all the frequencies
C: Cumulative Frequency of the class preceding to the
median class
F: Frequency of the Median Class
i: Class Interval
T he Mode
• The most frequent response or value for a
variable.

200 Modal Class


180
160
140
Frequency

120
100
80
60
40
20
0
25 30 35 40 45 50 55 60 65 70 75 80 85

Variable X
Figuring the Mode
Professor Weight

Schmuggles 165 What is the mode?


Bopsey 213
Pallitto 189
Homer 187 Answer: 165
Schnickerson 165
Levin 148 Important descriptive
Honkey-Doorey 251
information that may help
Zingers 308
inform your research and
Boehmer 151
diagnose problems like lack
Queenie 132
Googles-Boop 199
of variability.
Calzone 227
Grouped Data
Find the modal class:

number of laps frequency (x)


1-5 2
6 – 10 9
11 – 15 15
16 – 20 20 Modal Class 26 - 30
21 – 25 17
26 – 30 25
31 – 35 2
36 - 40 1
Mode for Grouped Data
d1

Mode = L+ *i
d1+d2

L: Lower Limit of the Mode Class


d1: Difference between the frequency of the mode class
and frequency of the previous class to mode class
d2: Difference between the frequency of the mode class
and frequency of the next class to mode class
i: Class Interval
Percentiles
• If we know the median, then we can go up or
down and rank the data as being above or
below certain thresholds.

• You may be familiar with standardized tests.


90th percentile, your score was higher than 90%
of the rest of the sample.
Measur es of Disper sion
• Measures of dispersion tell us about variability
in the data. Also univariate.

• Basic question: how much do values differ for a


variable from the min to max, and distance
among scores in between. We use:
– Range
– Standard Deviation
– Variance
Positive Skewness

Mean > Median > Mode


Mean < Median < Mode
Measures of dispersion give us information about how
much our variables vary from the mean. Dispersion is
also known as the spread or range of variability.
T he Range

r=h–l
– Where h is high and l is low

• In other words, the range gives us the value between the


minimum and maximum values of a variable.

• Understanding this statistic is important in understanding your


data, especially for management and diagnostic purposes.
T he Standar d Deviation

• A standardized measure of distance from the mean.

• Very useful and something you do read about when


making predictions or other statements about the data.
For mula for Standar d
Deviation

S = ∑( X − X ) 2

(n - 1)
=square root
∑=sum (sigma)
X=score for each point in data
_
X=mean of scores for the variable
n=sample size (number of
observations or cases
X X- mean x-mean squared
Smuggle 165 -29.6 875.2
Bopsey 213 18.4 339.2
Pallitto 189 -5.6 31.2
Homer 187 -7.6 57.5
Schnickerson 165 -29.6 875.2
Levin 148 -46.6 2170.0
Honkey-Doorey 251 56.4 3182.8
Zingers 308 113.4 12863.3
Boehmer 151 -43.6 1899.5
Queeny 132 -62.6 3916.7
Googles-boop 199 4.4 19.5
Calzone 227 32.4 1050.8
Mean 194.6 2480.1 49.8
Variance

Variance (σ) = S2
Organizing and Graphing Data
Goal of Gr aphing?

1. Presentation of Descriptive Statistics


2. Presentation of Evidence

3. Some people understand subject matter better


with visual aids

4. Provide a sense of the underlying data


generating process (scatter-plots)
W hat is the
Distribution?
• Gives us a picture of
the variability and
central tendency.

• Can also show the


amount of skewness.
Graphing Data: Types
Creating Frequencies
• We create frequencies by sorting data by
value or category and then summing the
cases that fall into those values.
Ranking of Donut-eating Profs.
Zingers
(most to least)
308
Honkey-Doorey 251
Calzone 227
Bopsey 213
Googles-boop 199
Pallitto 189
Homer 187
Schnickerson 165
Smuggle 165
Boehmer 151
Levin 148
Queeny 132
Here we have placed the Professors into
weight classes and depict with a histogram in
columns.
Weight Class Intervals of Donut-Munching Professors

3.5
3
2.5
2
Number
1.5
1
0.5
0
130-150 151-185 186-210 211-240 241-270 271-310 311+
Here it is another histogram depicted
as a bar graph.

Weight Class Intervals of Donut-Munching Professors

311+
271-310
241-270
211-240 Number
186-210
151-185
130-150

0 0.5 1 1.5 2 2.5 3 3.5


Pie Charts:
Proportions of Donut-Eating Professors by Weight Class

130-150
151-185
186-210
211-240
241-270
271-310
311+
Proportions of Donut-Eating Professors by Weight Class

130-150
151-185
186-210
211-240
241-270
271-310
311+

See Excel for other options!!!!


Approval
19
81

0
10
20
30
40
50
60
70
80
90
100
19
82
19
83
19
84
19
85
19
86
19
87
19
88
19
89
19
90
19
91
19
92
Economic approval

19

Month
93
19
94
19
95
19
Line Graphs: A Time Series

96
Approval

19
97
19
98
19
99
20
00
20
01
Correlation and Regression
Correlation
• Correlation
– A measure of association between two
numerical variables.
• Example (positive correlation)
– Typically, in the summer as the temperature
increases people are thirstier.
Specific Example
Water
Temperature Consumption
For seven (F) (ounces)
random summer
days, a person 75 16
recorded the 83 20
temperature and
their water 85 25
consumption, 85 27
during a three- 92 32
hour period spent
outside. 97 48
99 48
How would you describe the graph?
Measuring the Relationship
Pearson’s Sample Correlation
Coefficient, r

measures the direction and the


strength of the linear association
between two numerical paired
variables.
Calculation of r

sx= standard deviation of x’s

sy= standard deviation of y’s


Strength of Linear Association
r
value
Interpretation

perfect positive linear


1
relationship
0 no linear relationship
perfect negative linear
-1
relationship
Regression
• Regression

– Specific statistical methods for finding the


“line of best fit” for one response
(dependent) numerical variable based on one
or more explanatory (independent)
variables.
Curve Fitting vs. Regression
• Regression

– Includes using statistical methods to


assess the "goodness of fit" of the model.
(ex. Correlation Coefficient)
Simple Linear Regression

• Statistical method for finding


– the “line of best fit”

– for one response (dependent) numerical variable

– based on one explanatory (independent) variable.


Least Squares Regression
• GOAL -
minimize the
sum of the
square of the
errors of the
data points.

This minimizes the Mean Square Error


y = ax + b
The values ‘a’ and ‘b’ of the linear equation
. Is obtained by solving the normal
equations.
x : x1 x2 x3 . . .xn
y : y1 y2 y3 . . . yn
• Solution:
• We fit a straight line of the form y=a + bx
according to Least Square Principle, i.e.
the sum of squares of difference between
actual values and the observed values is least.
y1=b + ax1 x1y1=b x1+ ax12
y2=b + ax2 x2y2=b x2+ ax22
y3=b + ax3 x3y3=b x3+ ax32
… …
yn=b + axn
xnyn=b xn+ axn2

i =n i =n i =n i =n i =n

∑y i = a ∑ xi + nb ∑x y
i =1
i i = a ∑ x + b∑ x i
i =1
2
i
i =1
i =1 i =1
• Two equations
i=
n =i n

∑∑
y =
i=
a
1
i
=i 1
x+
i nb

i=
n = i n = i n

∑x y ∑
i =1
=a ∑
ix +b
i
= i 1
i
=
2

i 1
xi
By solving these two equations, we obtain
values of the parameters a and b, hence the
best straight line .
An Illustrative example:
Fit a straight line of the form y=a+bx to the
data given below:
x: 6 2 10 4 8
y: 9 11 5 8 7
Solution: As we have to fit a line of the form
y = ax + b, the two normal equations are
i=
n =i n

∑∑
y =
i=1
a i
=i 1
x+
i nb

and
i =n =i n = i n

∑x y =∑
i =1
a i i
=i 1
∑b
xi +
2

= i 1
xi
xi yi x 2
i
xiy i
6 9 36 54
2 11 4 22
10 5 100 50
4 8 16 32
8 7 64 56

∑ xi = 30
i
∑ yi = 40
i
∑ i = 220
x 2
∑x y
i
i i = 214
i
The two normal equations are:
30 a +
5 b=40
220a + 30b = 214
By Solving these two equations, yields,
a = -1.3 and b = 15. 8. Therefore,
required straight line matching the given
data is y = -1.3x + 15.8
Thanks

Das könnte Ihnen auch gefallen