Sie sind auf Seite 1von 80

16/2016

16/2016

MBF- 122
Quantitate Methods in Finance
by
Dr.MawMaw Khin
Professor/Head
DepartmentofStatistics
YangonUniversityofEconomics

16/2016

LearningObjective
Understand why we study statistics
Explain what is meant by descriptive statistics and inferential statistics
Distinguish between a quantitative variable and a qualitative variable.
Describe how a discrete variable is differ from a continuous variable.
Distinguish among the nominal, ordinal, interval, and ratio levels of
measurement.

16/2016

Why study statistics?

Business
Psychologists
Engineers
Education
Finance

profits, hours worked and wages


test score
how many units are manufactured on a
particular machine?
EMIS
price earning ratio,

Whyisstatisticsrequiredinsomany
majors?

t reason is that
erical information is everywhere.
newspapers, news magazines,
magazines(People),...

business

magazines,

or

general

e national average weekly income rose 2.5% from $598 in 2007 to $613 in
n increased just over 0.1 % during the same period.
s conducted by various organizations.

16/2016

second reason for taking a statistics course is that


tical techniques are used to make decisions that affect our

s they affect our personal welfare.


ance companies use statistical analysis to set rates for h
mobile, life, and health insurance.
ar old female has 60.25 years of life remaining, an 87 yea
an 4.56 years remaining,..
surance premiums are established based on these estimates
tancy.

16/2016

ird reason for taking a statistics course is that the knowledge of sta
ds will help you understand how decisions are made and give you a
standing of how they affect you.
nalysis is helpful.

16/2016

at is Statistics?

istics is the science of collecting, organiz


yzing, interpreting and presenting data.

16/2016

AStatistic

stic is a single measure, reported as a number, used to summarize a s


et. Many different measures can be used to summarize data sets.

16/2016

TwoTypesofStatistics:

iptive statistics refers to the collection, organization, presentation


mary of data (either using charts and graphs or using a numerical summ
ential statistics refers to generalizing from a sample to a popu
ating unknown parameters, drawing conclusions and making decisions

16/2016

16/2016

Figure1

OverviewofStatistics

Statistics
MakingInferences
fromSamples

Collectingand
DescribingData
Sampling
and
Surveys

Visual Numerical
Displays Summaries

Probability
Models

Estimating
Parameters

Testing
Hypotheses

Regressio
nand
Trend

Quality
Control

16/2016

Figure 2: Data Types


TypesofData

Categorical
(qualitative)

Numerical
(quantitative)

VerbalLabel

Coded

Vehicletype

Vehicletype

Discrete

X=car,truck,SUV

X=1,2,3

Gender(binary)

Gender(binary)

Brokeneggsina
carton

X=male,female

X=0,1

Continuous

X=0,1,2,3,,12

Patientwaiting
time
X=14.27
minutes

Annualdentalvisits

Customer
satisfaction

X=0,1,2,3,

X=85.2%

CategoricalData

ical data (also called qualitative) have values that are described by words rath
rs.
obile style
= full, midsize, compact, subcompact).

16/2016

numbers to represent categories to facilitate statistical analy


coding. For example, a database might classify movies
rical codes:
ction,
2 = Classic , 3 = Comedy, 4 = Horror,
omance, 6= Science Fiction, 7 = Western, 8 = Other.

oding a category as a number does not make the data num


e codes are assigned arbitrarily, and the codes generally do not im
ng.
ver, sometimes codes do imply a ranking:
achelor's, 2 = Master's, 3 = Doctorate

ngs may exist if we are measuring an underlying continuum, su


cal orientation:
beral, 2 = Moderate, 3 = Conservative

16/2016

binaryvariable hasonlytwovalues,indicatingthepresence(1)orabsence(0)of
characteristicofinterest.Forexample,foranindividual:
Employment
1=employed
0=notemployed
Education
1=collegegraduate
0=notcollegegraduate
Maritalstatus
1=currentlymarried
0=notcurrentlymarried
Thecodesarearbitrary.Avariablelikegendercouldbecodedinmanyways:

16/2016

LikeThis
1=female
0=male
OrlikeThis
0=female
1=male
OrlikeThis
1=female
2=male

hecodingitselfhasnonumericalmeaning,sobinaryvariablesa
categoricaldata.

16/2016

16/2016

NumericalData

Numerical or quantitative data arise from counting, measuring


something, or from some kind of mathematical operation. For example:
Sales for last quarter (e.g., X = $4,920)

MeasurementScales
1.NominalMeasurement

inal measurement is the weakest level of measuremen


asiest to recognize.
inal data (from Latin nomen meaning "name") m
ify a category. "Nominal" data are the same
itative," "categorical," or "classification" data.
sually code nominal data numerically. However, the c
rbitrary placeholders with no numerical meaning, so
oper to perform mathematical analysis on them.

16/2016

xample, we would not calculate an average using the laptop d


gh 12). With nominal data, the only permissible mathem
tions are counting (e.g., frequencies) and a few simple statistics s
ode.
ample, the following survey questions yield nominal data:
kind of laptop do you own?
r
2. Apple
3. Compaq 4. Dell 5. Gateway 6.HP
7
ron 9. Sony 10. Toshiba 11. Other
12. None

16/2016

2.OrdinalMeasurement

al data codes mean a ranking of data values.


xample:
size automobile do you usually drive?
size 2. Compact
3. Subcompact

16/2016

a 2 (compact) implies a larger car than a 3 (Subcompact).


ominal data, these ordinal numerical codes lack the properties th
ed to compute many statistics, such as the average.
cally, there is no clear meaning to the distance between 1 and 2, or be
3.
l data can be treated as nominal, but not vice versa.
l data are especially common in social sciences, marketing, and h
ces research.
are many useful statistical tests for ordinal data.

16/2016

3.IntervalMeasurement

xt step up the measurement scale is interval data, which not only is a rank but a
gful intervals between scale points.
es are the Celsius or Fahrenheit scales of temperature.
tervals between numbers represent distances, we can do mathematical operatio
ng an average. But because the zero point of these scales is arbitrary, we can't s
twice as warm as 30F or that is 30F is 50 percent warmer than 20F.
ratios are not meaningful for interval data.
ence of a meaningful zero is a key characteristic of interval data.

16/2016

4.RatioMeasurement

measurement is the strongest level of measurement.


data have all the properties of the other three data types,
on possess a meaningful zero that represents the absence o
ity being measured.
use of the zero point, ratios of data values are meaningful
$20 million in profit is twice as much as $10 million).

16/2016

ce sheet data, income statement data, financial ratios, ph


s, scientific measurements, and most engineering measurem
atio data because zero has meaning (e.g., a company with
sold nothing).
g a zero point does not restrict us to positive data. For exa
is a ratio variable (e.g., $4 million is twice $2 million) yet firm
negative profit.

16/2016

16/2016

Figure 3:

Measurement Level Illustrated

Measurement
Level

Ratio
Nominal
Vehicletype
X=SUV,car,truck
Binarydata
X=0,1(male,
female)

Weeklypay

Ordinal

X=$457.14

Rateanewsong
X=poor,OK,good,great
*Likert scale
Rateyourdormfood
Very12345Very

Interval
TemperatureF
X=72.3F

Annual
dentalvisits
X=0,1,2,
3,

PoorGood

Likert Scales

cales are used to collect information on attitudes, including degree of agreement


ent, frequency of use, import of an issue, quality, and likelihood.
cale is a special case that is frequently used in survey research.
y, a statement is made and the respondent is asked to indicate his
ent/disagreement on a fivepoint or sevenpoint scale using verbal anchors.
rseness of a Likert scale refers to the number of scale points (typically 5 or 7).

16/2016

Statisticsisadifficultsubject.

16/2016

Stronglyagree
Slightlyagree
Neitheragreenordisagree
Slightlydisagree
Stronglydisagree

DescribingData
I.PicturesofData

gram
andleafdisplay
resofonecategoricalvariables:
BarChartsandPieChart
graphsfornumericaldata:
Scatterplot
Sequentialdata:timeseriesplots
orialdatathatcomeinpairs:
Contingencytables
Reportingpercentages

16/2016

Histogram

histogram is a graphical representation o


uency distribution.
stogram is a bar chart whose Yaxis shows
ber of data values (or a percentage) within e
of a frequency distribution and whose Xaxis
w the end points of each bin.
re should be no gaps between bars (except w
e are no data in a particular bin) as shown in Fi

16/2016

histogram is a graph that displays the data by using contig


al bars (unless the frequency of a class is 0) of various heig
sent the frequencies of the classes.

16/2016

1Drawandlabelthexandyaxes.
Thexaxisisalwaysthe
horizontalaxis,andtheyaxisis
alwaystheverticalaxis.
2Representthefrequency onthe
yaxisandtheclass boundaries
onthexaxis.
3Usingthefrequenciesasthe
heights,drawverticalbars for
eachclass.

16/2016

16/2016

Figure 1:

Histogram

18
16
14
12
frequency

10
8
6
4
2
0

classboundary

16/2016

Figure2:
Histogram
50
45
40
35
30
25
20
15
10
5

outlier

FrequencyPolygon

requency polygon is a graph that displays the data by using


onnect points plotted for the frequencies at the midpoint o
s.
equencies are represented by the heights of the points.
ves the same purpose as a histogram, but is attractive whe
to compare two data sets (since more than one frequency po
e plotted on the same scale).

16/2016

Findthemidpoint ofeachclass.
Drawthexandyaxes.Labelthexaxis
iththemidpointofeachclass,andthen
seasuitablescaleontheyaxisforthe
equencies.
Usingthemidpointsforthexvaluesand
hefrequenciesastheyvalues,plotthe
oints.
Connectadjacentpointswithline
gments.Drawalinebacktothexaxis
thebeginning andend ofthegraph,at
esamedistancethatthepreviousand
extmidpointswouldbelocated.

16/2016

frequency polygon and histogram are two diffe


s to represent the same data set.

16/2016

ple(3)

nstructafrequencydistributionforthese32observationsont
erofcustomerstouseadowntownCitiBank ATMduringthen
on32consecutiveworkdays.Using6classes.
keahistogram,afrequencypolygon.

16/2016

253918333732269 2321341626261832
3019313540273542 2532211526253324

ForATMdata,

As the histogram shows, the class with the


greatest number of data value (11) is 21.526.5
followed by 6 for 26.532.5. The graph also has
one peak with the data clustering around it.

16/2016

16/2016

Exercise(1)
ThesedatarepresenttherecordhightemperaturesinF. foreachofthe
50states.(i)Constructagroupedfrequencydistributionforthedatausing
7classes.(ii)Drawahistogram.(iii)Commentonyourresults.
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114

16/2016

Range=134100=34
Width=34/7=4.9 5
Class limit

Class boundary

frequency

Cumulative
frequency

100-104

99.5-104.5

105-109

104.5-109.5

10

110-114

109.5-114.5

18

28

115-119

114.5-119.5

13

41

120-124

119.5-124.5

48

125-129

124.5-129.5

49

130-134

129.5-134.5

50

frequency distribution shows that the class 109


5 contains the largest number of temperatures
wed by the class 114.5 - 119.5 with 13 temperat

ce, most of the temperatures (18+13=31) fall betw


5
F
and
119.5

16/2016

Cumulative frequencies are used to sho


w many data values are accumulated up
including a specific class.
28 of the total record high temperature
than or equal to 114 F. Forty-eight of
l record high temperatures are less than
al to 124 F.

16/2016

Stem and leaf plot

umber of stories in two selected samples of tall buildings in A


hiladelphia are shown.
ruct a back-to-back stem and leaf plot,
ompare the distributions.

16/2016

Atlanta

Philadelphia

55

70

44

36

40

61

40

38

32

30

63

40

44

34

38

58

40

40

25

30

60

47

52

32

32

54

40

36

30

30

50

53

32

28

31

53

39

36

34

33

52

32

34

32

50

50

38

36

39

32

26

29

Solution

1
Arrange the data for both data sets in order.
2
Construct a stem and leaf plot using the same digits as s
the digits for the leaves for Atlanta on the left side of the stem
igits for the leaves for Philadelphia on the right side, as sh

16/2016

AtlantaPhiladelphia
9 8 6
2
5
8 6 4 4 2 2 2 2 2 1

0 0 0 0 2 2 3 4 6 6 6 8 8 9 9

74 4 0 0

0 0 0 0

5 3 2 2 0 0

0 3 4 8

3 0

3 Compare the distributions. The buildings in Atlanta have a


riation in the number of stories per building. Although
tributions are peaked in the 30-to 39-story class, Philadelphi
re buildings in this class. Atlanta has more buildings that ha
more stories than Philadelphia does.

16/2016

BarChart

chart is probably the most common type of data display in business.


e data is typically displayed using a bar chart.
r represents a category or attribute.
gth of each bar reflects the frequency of that category.
r has a label showing a category or time period.
4 shows simple bar charts comparing market shares among tire manufacturers.
r is separated from its neighbors by a slight gap to improve legibility.
bar charts are the most common, but horizontal bar charts can be useful when
re long or when there are many categories.

16/2016

16/2016

Figure4: BarCharts

U.S/Canada Original Equipment


(OE) Light Vehicle Tire Market
Share

Goodyear
Firestone
Michelin
General
BFGoodrich
Bridgestone
Uniroyal
Continental
Dunlop

40
35

Persent

U.S/Canada OE Light Vehicle Tire Market


Share

30
25
20
15
10
5
0

(a) Vertical bars

10

15

20

25

30

35

40

(b) Horizonal bars

PieChart

graph is a circle that is divided into sections or slices accord


ercentage of frequencies in each category of the distribution.

16/2016

Figure 3:

16/2016

Pie Chart

Where Did You Buy Your Statistics


Textbook?
Web
(e.g., Ama
zon)
18%
Campus
Bookstor
e
54%

Retail
Outlet
25%
Another
Student
3%

PieGraph

l company wants to open a new service station to serv


ent population of a city. There are four possible sites which
W, NE, SW and SE quarters of the city respectively. In an
y the company stops 30 motorists in the city centre and asks
site they would be most likely to use.
The results of the survey are:

16/2016

ruct a pie graph showing the blood types of the nurses in a


frequency
distribution
is
repeated

16/2016

Class
A
B
O
AB

Frequency
5
7
9
4
25

Percent
20
28
36
16
100

ution

16/2016

Step1Findthenumberofdegreesforeachclass,
usingtheformula

Degree=

Foreachclass,then,thefollowingresultsare
obtained.

A= .

B= .

O= .

AB= .

=
=
=
=

Findthepercentages.
Grapheachsectionandwriteitsnameandcorrespondingpercentage,asshowni
gFigure.

16/2016

Blood Types for Nurses


Type AB
Type A
16% 20%

Type B
Type O 28%
36%

16/2016

Customer
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Quarter
NW
NE
SE
NW
NW
SW
NE
NE
NW
SW
SW
NW
SE
SW
NW

Customer
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Quarter
NW
NE
NW
SE
SW
NW
SW
SE
SW
NW
SW
SE
SE
NW
NE

hefrequenciesandrelativefrequencie
are:

16/2016

Quarter
NE
NW
SE
SW

Frequency
5
11
6
8
30

Relative
frequency
0.167
0.367
0.200
0.267
1.00

egoricaldatacanalsobereportedaspercentagesasshownbe

16/2016

Quarter
NE
NW
SE
SW
Total

Frequency
5
11
6
8
30

Percentage
16.7
36.7
20.0
26.7
100

Sequentialdata:timeseriesplots

a given below is quarterly primary fuel consumption in the UK from 1965 to 19


al software to obtain a time series plot of the data. What does it tell you abo
ption?
n
series plot shows that the data are clearly seasonal more energy is consumed
quarter and less in the summer.

16/2016

FuelConsumption

Year

Consumption

9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83

A time series plot shows that the data are clearly


seasonal more energy is consumed in the winter
quarter and less in the summer.

16/2016

Categoricaldatathatcomeinpairs

xample, the data below shows some data from a survey o


ates six months after leaving education. For each graduat
two pieces of categorical data:
ether the graduate was male or female (gender); and
whether he or she is in a permanent job, temporary jo
ployed (employment status).

16/2016

16/2016

Graduateno.

Gender

EmploymentStatus

MALE

UNEMP

MALE

TEMP

FEMALE

TEMP

FEMALE

PERM

FEMALE

TEMP

MALE

PERM

7
.

MALE

PERM
.

16/2016

As gender can take two different values and


employment status three, only six possible pairs of
values are possible and we can summarize the data
with a list of the frequencies of each pair. The
frequencies for all the graduates are given below.
Gender
MALE
MALE
MALE
FEMALE
FEMALE
FEMALE

Employment
Status
PERM
TEMP
UNEMP
PERM
TEMP
UNEMP

Frequency
170
28
20
57
27
8

16/2016

Whilst this list summarizes the data completely it


does not make it easy to draw conclusions about the
relationship between gender and employment status
or even each one individually.
It is therefore more usual to display the frequencies
in a contingency table or cross tabulation as shown
below.

EmploymentStatus
Permanent

temporary

Unemployed

Male

170

28

20

Female

57

27

Gender

tice that each row corresponds to category of gender and each colum
ry of employment status and the numbers in the cells of the table a
ncies of the corresponding combination of gender and employment statu

example, of the 310 graduates, 170 are male and in permanent emplo
as 27 are female and in temporary employment.

16/2016

Itisusualtoincluderowandcolumntotalsasfollows
Male
Female
Total

Permanent

Temporary

Unemployed

Total

170
57
227

28
27
55

20
8
28

218
92
310

e that the row total gives the frequencies of each category of gende
lumn totals the frequencies of each category of employment statu

The frequencies alone, however, dont tell us a great deal and w


y more interested in the proportions or percentages of the dat
various qualities or attributes.

16/2016

VisualDescription
FrequencyDistributionandHistograms

ency distribution is a table formed by classifying n data values into k classes cal
opt this terminology from Excel).
limits define the values to be included in each bin. Usually, all the bin widths

le shows the frequency of data values within each bin.


ncies can also be expressed as relative frequencies or percentages of the total nu
tions.

16/2016

ConstructingFrequencyDistribution

16/2016

-choose the number of class (k)


(or)
Sturges rule
k = 1+ 3.3 log (n)
-common class with
= (max- min)/ k

CategoricalFrequencyDistribution

categorical frequency distribution is used


that can be placed in specific categories, s
ominal or ordinallevel data.
example, data such as political affiliat
ious affiliation, or major field of study would
gorical frequency distributions.

16/2016

16/2016

Example(2)
Twentyfivestudentsweregivenabloodtestto
determinetheirbloodtype.Thedatasetis
A
B
B
AB O
O
O
B
AB B
B
B
O
A
O
A
O
O
O
AB
AB A
O
B
A
Constructafrequencydistributionforthedata.

16/2016

Class

Tally

Frequency

Percent

////

20

//// //

28

//// ////

36

////

16

25

100

AB
Total

For the sample, more students have type O blood than any other type.

16/2016

GroupedFrequencyDistributions
Class boundaries
Lower limit 0.5
Upper limit +0.5

Class width
The class width for a class in a frequency distribution is found
by subtracting the lower (or upper) class limit of one class
form the lower (or upper) class limit of the next class.
Upper class boundary Lower class boundary

16/2016

Class midpoint
Theclassmidpoint Xm isobtainedbyaddingthe
lowerandupperboundariesanddividingby2,or
addingthelowerandupperlimitsanddividingby2:

II.Summarizingdata
1.MeasuresofCentralTendenc

n
mean is the sum of the values, divided by the total numb
s.

16/2016

Rawdata

Thepopulationmean

Thesamplemean

16/2016

GroupedData

Where,
f =classfrequency
xm =classmidpoint
n=totalfrequency

16/2016

Median

median is the halfway point in a data set. Before one can find this poi
must be arranged in order.
the data set is ordered, it is called a data array.
edian either will be a specific value in the data set or will fall betwee
es.
edian is the midpoint of the data array. The symbol for the median is M

16/2016

16/2016

Themedianisthe50th percentile ormidpointofthe


sortedsampledataset.
Steps
Arrangethedatainorder.
Selectthemiddlepoint.

16/2016

Median
Upper 50%

Lower 50%

It separates the upper and lower half of the


sorted data
For example, n is even
Median
11 12

15

17

21

32

MD = (15+17)/2= 16

16/2016

For example, n is odd


Median

12 23 23 25 27 34 41
MD = 25
The position of median in the
sorted array is (n+1)/2

xample,
derthethreestudentsscoresonfivequizzes:
scores:20,40,70,75,80
=57,median=70(Tomwhomeanispulleddownbyafewlow
s)
scores:60,65,70,90,95
=76,median=70(Jakewhomeanispulledupbyafewhighsco
s scores:50,65,70,75,90
=70,median=70(shehassymmetricscores)

16/2016

16/2016

GroupedData

where
n=sumoffrequencies
Fm1 =cumulativefrequencyofclassimmediately
precedingthemedianclass
Cm =widthofmedianclass
fm =frequencyofmedianclass
Lm =lowerboundaryofmedianclass

Das könnte Ihnen auch gefallen