7 views

Uploaded by EL Hafa Abdellah

sq

- Lecture 08 - Probability
- digital communication and probability of error.
- Exercise 1.1
- Syllabus
- Lecture 2
- Lecture - 2 Probablity (1)
- 592
- Hackerrank
- Data Analysing
- 7 QC Tools
- Statistics Assignment Sample 2 with Solutions
- Empirical and Tchebysheff's Theorem
- Tutorial BayesianRisk
- UT Dallas Syllabus for mech3341.0u1.10u taught by John Fonseka (kjp)
- Exam 1
- ab-001-abcanalysisexample.pdf
- Math6365-fall11-syll
- Probability
- Syllabus for Sigrun Andradottir
- histogram

You are on page 1of 4

Spring 2009

This handout is about the construction and motivation of scaled relative frequency histograms for purposes of comparison and graphical

overlay with theoretical density functions. This is one of the main ways that

histograms are used descriptively: to ask whether the shape of the distribution of observed data are sufficiently normal or sufficiently close to the

gamma or students t or whatever other distribution family is conjectured

to provide a good fit to the data.

First we define what is meant by histogram, successively modeified by the

adjectives frequency, relative frequency, and scaled relative frequency.

Imagine that a dataset X1 , X2 , . . . , Xn of observations has been drawn iid

(indepently and with identical distribution) from a density function f (x).

Let (a, b] be an interval on the measurement axis large enough to contain

all of the points Xi , i.e. such that

a < min Xi < max Xi < b

1in

1in

Then divide this interval into a number m of equal length class intervals,

each of width w = (b a)/m, where the kth such interval, Jk , is

(a+(k 1)w, a+kw]. (Note that each of these intervals is defined to contain

its right endpoint but not its left endpoint. Also note that by definition, the

rightmost endpoint of Jm is a + mw = b.) All of the different kinds of

histograms are pictorial representations of the frequency counts

nk = count in kth interval =

n

X

I[XiaJk ]

(1)

i=1

The counts n1 , n2 , . . . , nm of numbers of observations Xi falling in the

respective intervals Jk , could be arranged simply in a table,

Interval J1

Count

n1

J2

n2

...

Jk

nk

Jm

. . . nm

in) over each interval Jk (with base w), with height labelled equal to the

1

the vertical units: the height of the bar over Jk is now labelled as nk /n, which

means that the total of bar-heights which was n in the frequency histogram

is 1 in the relative-frequency histogram. However, since totals of bar-heights

is not visually meaningful, the scaled relative frequency histogram redefines the vertical units in such a way that the area of the bar over Jk is

the relative frequency nk /n, which has the consequence that the total area

within the bars of the scaled relative-frequency histogram is 1, making it just

like a probability density function. Thus the definition of the scaled relative

frequency histogram is:

bar-height over

Jk

nk

wn

Area = w

nk

nk

=

wn

n

(2)

and the important consequence of the definition is that the function equal

to the tops of bars, with value fhist (x) = nk /(nw) for x Jk (and is 0

for x 6 (a, b]), is a legitimate probability density function. We argue next

that with large-sample data, when the number of bars is chosen large but

much smaller than the number of data points, this density is actually very

close to the true but generally unknown density function of the Xi random

variables.

Remark 1 The histogram-based probability density function fhist (x)

assigns area and probability nk /n to the interval Jk = (a+(k1)w, a+kw] :

a random variable X with density fhist would fall in the interval Jk with

probability nk /n, and the precise value of X would be generated uniformly

between a + (k 1)w and a + kw.

2

Now for the argument that histograms should track densities closely. Suppose that the true density for the large number n of observed data values

X1 , . . . , Xn is f , and suppose that w = (b a)/m is narrow but is fixed

before we choose a large value of n. Consider the relation of the scaled

histogram bar to the density over Jk = (a + (k 1)w, a + kw]. First, if

nk denotes the number of observations Xi falling within the interval Jk

then the histogram bar has height nk /(nw) and area nk /n, while the area

under the true density function f over Jk is

Z

a+kw

f (x) dx w f (a + (k

a+(k1)w

1

)w)

2

(3)

The approximate equality in this equation holds first of all because the

interval has been assumed narrow, so that the continuous function f (x)

is nearly constant over the extent of Jk and the area is nearly that of the

rectangle with base w and height equal to the value of f at the midpoint of

the interval. A more precise argument, based on Taylor series (with meanvalue-theorem form of remainder), shows that if the second derivative of f

has absolute value bounded above by the constant C (everywhere on some

interval containing Jk ), then the difference between the right-hand and lefthand sides of (3) is 12 C w2. (This is a very small number if m has been

chosen large, making w small.)

1.0

0.0

0.5

density

1.5

2.0

0.0

0.5

1.0

1.5

xaxis

Figure 1: Area under the true density f (shown solid) versus the area of the

scaled relative histogram bar (with top shown dashed) over an interval Jk .

The areas over Jk under the scaled histogram and the true density are

necessarily very close with high probability (when w is small and n large).

This is a consequence of the Law of Large Numbers applied to sequences

3

of coin-toss outcomes. That theorem says that the relative frequency over

independent identically distributed trials indexed i = 1, 2, . . . , n of the

events [Xi Jk ] is close (with high probability, for all large n) to P (X1

Jk ). By definition (1), the relative frequency is nk /n, the area of the bar

in the scaled histogram. The true probability is given by the left-hand side

of (3). Again appealing to (3) (and see also Figure 1), we conclude that the

density height f (a + (k 1/2)w) at the midpoint of the interval must be

very close with very high probability to nk /(nw).

0.15

0.00

0.05

0.10

density

0.20

0.25

10

15

xaxis

Figure 2: Scaled relative frequency histogram with m=32 class intervals for

n = 1000 Gamma(4,1) variates generated in R, with true density function

overlaid as dashed curve.

Figure 2 (generated in the statistical package R, but we will later do the

same thing in SAS) shows the scaled relative histogram (based on m = 32)

generated from 1000 data points generated from the Gamma(4, 1) density,

overlaid with the actual theoretical density plotted as a dashed curve.

- Lecture 08 - ProbabilityUploaded byisurueranda
- digital communication and probability of error.Uploaded bychinmay2882
- Exercise 1.1Uploaded byRodziah Nasir
- SyllabusUploaded byNihar Nanyam
- Lecture 2Uploaded byEliza Chew Shi Yun
- Lecture - 2 Probablity (1)Uploaded byFarhan Muhammad
- 592Uploaded byAryan Smart
- HackerrankUploaded byMartin Bajzek
- Data AnalysingUploaded byOshani perera
- 7 QC ToolsUploaded byshashikantjoshi
- Statistics Assignment Sample 2 with SolutionsUploaded byHebrew Johnson
- Empirical and Tchebysheff's TheoremUploaded byLee Yee Run
- Tutorial BayesianRiskUploaded byPratik Chatse
- UT Dallas Syllabus for mech3341.0u1.10u taught by John Fonseka (kjp)Uploaded byUT Dallas Provost's Technology Group
- Exam 1Uploaded byChristopher Petys
- ab-001-abcanalysisexample.pdfUploaded byAkhilesh Jindal
- Math6365-fall11-syllUploaded byRhealyn Inguito
- ProbabilityUploaded byGirish
- Syllabus for Sigrun AndradottirUploaded byziyuzhang13
- histogramUploaded byBugatti Vishal
- STPM Trials 2009 Math T Paper 2 (Negeri Sembilan)Uploaded byJessica Jones
- quantitative technique analysis notesUploaded bymaana
- probability.pdfUploaded byNimit Mehta
- Tutorial List 1Uploaded byनेपाली नेवरि प्रसन्न
- expected_values_continuous.pdfUploaded byJibran Abbasi
- STATS TextbookUploaded byChristian Sicutad
- STAT151 NotesUploaded byfcvoid
- Thermal Physics Lecture 3Uploaded byOmegaUser
- Lec 2Uploaded bysuhailtambal
- UMA ABORDAGEM BASEADA EM FILTROS DE PARTÍCULAS PARA PROGNÓSTICO DE FALHASUploaded byMurilo Camargos

- Attakmili+Rachat.pdfUploaded byEL Hafa Abdellah
- Attakmili+Rachat.pdfUploaded byEL Hafa Abdellah
- Emplois 2°Sem.(1-7et 7-14)2013-2014 vf du 19-02-2014Uploaded byEL Hafa Abdellah
- PFE_2014Uploaded byEL Hafa Abdellah
- التحديات التي تواجه صناعة التأمين التكافلي الإسلامي بريش عبد القادر وحمدي معمرUploaded byEL Hafa Abdellah
- Capacite Professionnelle en AssuranceUploaded byEL Hafa Abdellah
- Capacite Professionnelle en AssuranceUploaded byEL Hafa Abdellah
- UsbFix ReportUploaded byEL Hafa Abdellah
- PFE_2013Uploaded byEL Hafa Abdellah
- Skype for Business Setup - English v3Uploaded byEL Hafa Abdellah
- All R PackagesUploaded byEL Hafa Abdellah
- Partie II- Emplois Du Temps 2016-2017(S1-S3-S5)Uploaded byEL Hafa Abdellah
- Attakmili+versement+exceptionnelUploaded byEL Hafa Abdellah
- 022Uploaded byEL Hafa Abdellah
- Link Sumatra.txtUploaded byEL Hafa Abdellah
- AssignmentezUploaded byEL Hafa Abdellah
- Capacite Professionnelle en AssuranceUploaded byEL Hafa Abdellah
- Risk Ed GuideUploaded byEL Hafa Abdellah
- projet 2Uploaded byEL Hafa Abdellah
- Dec09joUploaded byEL Hafa Abdellah
- 0190201894 GazUploaded byAnonymous 0qR2NHz9D
- Lecture 080Uploaded byEL Hafa Abdellah
- Web CapmktresponseUploaded byEL Hafa Abdellah
- Exercice CorrigéUploaded byNouali Med
- STATS308ch04Uploaded byEL Hafa Abdellah
- 6 - BodeaUploaded byEL Hafa Abdellah
- CoverUploaded byEL Hafa Abdellah
- Lecture 23Uploaded byEL Hafa Abdellah
- Tutorial LatexUploaded byRabeb BA
- Tutorial LatexUploaded byRabeb BA

- 93 Eng 4Uploaded byAmy Chia-Hui Chou
- RSA Algorithm.pptUploaded byEswin Angel
- A Brief History of the ComputerUploaded byAaqib Nazir
- Etienne Balibar Citizen Subject PDFUploaded byJennifer
- Chapter12PPNew (1)Uploaded byKatieBays
- Employee Satisfaction Level at Hindalco Industries Project ReportUploaded byBabasab Patil (Karrisatte)
- Beginners Python Cheat Sheet Pcc DjangoUploaded byghh
- professional growth plan - edited for privacyUploaded byapi-202006339
- Tornado Lobe Pump BrochureUploaded byAnonymous T7zEN6iLH
- kuat-geser-tanah-direct-shear.pptUploaded byZulmy Rhamaditya
- RichardsonUploaded bypja752
- research proposal uwrtUploaded byapi-385913029
- photogram unitUploaded byapi-242481603
- 237132498-MODUL5-KULIAH-Lateral-Thinking.pptUploaded byAhmad F. Hasan
- Version 6 1 Computers Final Program RequirementsUploaded byDanial Cool
- Caterpillar Flash Files [04.2014]Uploaded byasadiqbalansari
- Kuratko 8e Chapter 01Uploaded byMangoStarr Aibelle Vegas
- Risk Management: A Case Study of TheUploaded bywoodculther
- Thesis - A Unfied Framewok and Platform for Designing Cloud Based M-c Health and Mfg SystemsUploaded byajayvg
- AglaSem » DTU » Check Your Departmental RankUploaded by22613609asd
- UT Dallas Syllabus for chem4473.001.07s taught by Lynn Melton (melton)Uploaded byUT Dallas Provost's Technology Group
- Solid Waste ManagementUploaded byEditor IJTSRD
- The Big Promise of Big DataUploaded byLuis Jesus Malaver Gonzalez
- FYBMS Environmental Management Notes fullUploaded byrohan_jangid8
- Barry Lewis - Using Not Implementing COBIT 5Uploaded byAnonymous 9d1jFv
- Aquinas the NeoplatonistUploaded byScott Sullivan
- Storage_Display_Metalwork_2ndPP.pdfUploaded bygmj
- CREATING Tourism Awareness at.pptxUploaded byMonette de Guzman
- Abstract for BCI Oriented Object RecognitionUploaded bypr1f4
- Latour-Give Me a GunUploaded byUrsula68