Sie sind auf Seite 1von 147

OUTLINE

# Day Date Topic/Activity

1 Thu 7 Feb Introduction & exercises

2 Fri 8 Feb Least squares fitting

3 Thu 14 Feb Histograms

4 Fri 15 Feb Distributions

5 Fri 22 Feb Class test


Les Kirkup (1994) John R. Taylor (1997) 2
The ‘GUM’
Guide to the Expression of Uncertainty in
Measurement

• Until recently, there has been a lack of consistency and


clarity in the texts that set out the concepts and
terminology of measurement.

• To address this, in the mid 1990s, international bodies


responsible for measurement standards published the
‘GUM’.

www.BIPM.org
The Big Picture

Measurements made during an experiment


generate ‘raw’ data which must be collected,
presented, and interpreted.
Typical Thesis Structure
CHAPTER 1: Introduction

CHAPTER 2: Theory/Literature Review

CHAPTER 3: Methodology

CHAPTER 4: Data Analysis

CHAPTER 5: Findings
Lecture #1
Analyse
To examine (something) methodically and in detail,
typically in order to explain and interpret it.

• With the data in hand, a most important question is


asked: ‘What do the data tell us?’

• An attempt to answer this question is the essence of


data analysis.
Introduction

• Discussion is organised under the headings:

1. Data Collection
2. Data Presentation
3. Data Interpretation

• These can be thought of as the different stages in the


process of data analysis in science.

• Different levels of analysis are involved in the different


stages of data handling.

• ‘Data Presentation’ possibly involves the most analysis.


8
1. Data Collection

Overview

The better the record that has been made of what has
been done, the easier will be the task of presenting the
work.

9
1. Data Collection
Significant figures
• Significant: “having a particular meaning; indicative of
something.”

o “meaningful”

• If an experimental data value is recorded as 6.12, this


implies that the actual value lies between 6.11 and 6.13

• If the value is written as 6.124, then this implies that the


actual value lies between 6.123 and 6.125

10
1. Data Collection
Significant figures
• Writing a value as 6.12 is to give it to three significant
figures, and to write it as 6.124 is to give it to four
significant figures.

• Significant figures are the figures that lie between the first
non-zero figure and the last figure inclusive.

11
Check Your Neighbour
How many significant figures appear in the following numbers?

A. 1.654 ?

B. 0.00437 ?

C. 64 000 ?

D. 1.20 ?

E. 0.100 007 38 ?

12
Check Your Neighbour
How many significant figures appear in the following numbers?
Answer:
A. 1.654 Four

B. 0.00437 ?

C. 64 000 ?

D. 1.20 ?

E. 0.100 007 38 ?

13
Check Your Neighbour
How many significant figures appear in the following numbers?
Answer:
A. 1.654 Four

B. 0.00437 Three

C. 64 000 ?

D. 1.20 ?

E. 0.100 007 38 ?

14
Check Your Neighbour
How many significant figures appear in the following numbers?
Answer:
A. 1.654 Four

B. 0.00437 Three

C. 64 000 Two

D. 1.20 ?

E. 0.100 007 38 ?

15
Check Your Neighbour
How many significant figures appear in the following numbers?
Answer:
A. 1.654 Four

B. 0.00437 Three

C. 64 000 Two

D. 1.20 Three

E. 0.100 007 38 ?

16
Check Your Neighbour
How many significant figures appear in the following numbers?
Answer:
A. 1.654 Four

B. 0.00437 Three

C. 64 000 Two

D. 1.20 Three

E. 0.100 007 38 Eight

17
Check Your Neighbour
How many significant figures are implied by the way the
following numbers are written?

A. 3.24 ?

B. 0.0023 ?

C. 83 400 ?

D. 1.010 ?

E. 10.5 ?

18
Check Your Neighbour
How many significant figures are implied by the way the
following numbers are written?
Answer:

A. 3.24 Three

B. 0.0023 ?

C. 83 400 ?

D. 1.010 ?

E. 10.5 ?

19
Check Your Neighbour
How many significant figures are implied by the way the
following numbers are written?
Answer:

A. 3.24 Three

B. 0.0023 Two

C. 83 400 ?

D. 1.010 ?

E. 10.5 ?

20
Check Your Neighbour
How many significant figures are implied by the way the
following numbers are written?
Answer:

A. 3.24 Three

B. 0.0023 Two

C. 83 400 Three

D. 1.010 ?

E. 10.5 ?

21
Check Your Neighbour
How many significant figures are implied by the way the
following numbers are written?
Answer:

A. 3.24 Three

B. 0.0023 Two

C. 83 400 Three

D. 1.010 Four

E. 10.5 ?

22
Check Your Neighbour
How many significant figures are implied by the way the
following numbers are written?
Answer:

A. 3.24 Three

B. 0.0023 Two

C. 83 400 Three

D. 1.010 Four

E. 10.5 Three

23
1. Data Collection
Significant figures & scientific notation
• It is not always clear how many figures in a number are
significant.

• Example: A time interval of 346 s can be written as 346 000 ms,


346 000 000 µs, etc. In all cases there are three significant
figures. However, if presented with 346 000 ms only, how does
the reader know that the above zeros are not significant?

o Note: It is possible to have a timing device with a resolution of


1 ms, thus making all six figures significant.

• The way to get around this confusion is to present the


numbers in scientific notation.
24
1. Data Collection
Significant figures & scientific notation
• Example: numbers expressed in scientific notation.

Number Scientific notation


12.65 1.265 × 101
0.00023 2.3 × 10−4
342.5 3.425 × 102
34 001 3.4001 × 104

• For numbers expressed in scientific notation, the number of


significant figures is equal to the number of figures that
appear to the left of the multiplication sign. 25
Check Your Neighbour
Give the following numbers in scientific notation to four
significant figures:

# Number Scientific notation


A. 0.005 654 2 ?
B. 125.04 ?
C. 93 842 773 ?
D. 3 400 042 ?
E. 0.000 000 100 092 ?

26
Check Your Neighbour
Give the following numbers in scientific notation to four
significant figures:
Answer:
# Number Scientific notation
A. 0.005 654 2 5.654 × 10−3
B. 125.04 ?
C. 93 842 773 ?
D. 3 400 042 ?
E. 0.000 000 100 092 ?

27
Check Your Neighbour
Give the following numbers in scientific notation to four
significant figures:
Answer:
# Number Scientific notation
A. 0.005 654 2 5.654 × 10−3
B. 125.04 1.250 × 102
C. 93 842 773 ?
D. 3 400 042 ?
E. 0.000 000 100 092 ?

28
Check Your Neighbour
Give the following numbers in scientific notation to four
significant figures:
Answer:
# Number Scientific notation
A. 0.005 654 2 5.654 × 10−3
B. 125.04 1.250 × 102
C. 93 842 773 9.384 × 107
D. 3 400 042 ?
E. 0.000 000 100 092 ?

29
Check Your Neighbour
Give the following numbers in scientific notation to four
significant figures:
Answer:
# Number Scientific notation
A. 0.005 654 2 5.654 × 10−3
B. 125.04 1.250 × 102
C. 93 842 773 9.384 × 107
D. 3 400 042 3.400 × 106
E. 0.000 000 100 092 ?

30
Check Your Neighbour
Give the following numbers in scientific notation to four
significant figures:
Answer:
# Number Scientific notation
A. 0.005 654 2 5.654 × 10−3
B. 125.04 1.250 × 102
C. 93 842 773 9.384 × 107
D. 3 400 042 3.400 × 106
E. 0.000 000 100 092 1.001 × 10−7

31
Check Your Neighbour
Give the following numbers in scientific notation to two
significant figures:

# Number Scientific notation


A. 0.005 654 2 ?
B. 125.04 ?
C. 93 842 773 ?
D. 3 400 042 ?
E. 0.000 000 100 092 ?

32
Check Your Neighbour
Give the following numbers in scientific notation to two
significant figures:
Answer:
# Number Scientific notation
A. 0.005 654 2 5.7 × 10−3
B. 125.04 1.3 × 102
C. 93 842 773 9.4 × 107
D. 3 400 042 3.4 × 106
E. 0.000 000 100 092 1.0 × 10−7

33
1. Data Collection
Significant figures & calculations
• If you are required to perform a calculation in which the
uncertainties in the quantities are not known, the following
rules are useful:

• Rule 1: When multiplying or dividing numbers: give the


result of the calculation to the least number of significant
figures as contained in the quantities involved.

• Example: 3.7 × 3.01 = 11.37.


• Quantity 3.7 has least number of significant figures (two).
• Give answer as 11.

34
1. Data Collection
Significant figures & calculations
• Rule 2: When adding or subtracting numbers: round the
result of the calculation to the least number of decimal
places as contained in the quantities involved.

• Example: 11.24 + 13.1 = 24.34.


• Quantity 13.1 has least number of decimal places (one).
• Give answer as 24.3.

35
Check Your Neighbour
Write down the results of the following calculations to an
appropriate number of significant figures:

# Calculation Answer
A. 1.2 × 8 ?
B. 13.0 × 43.23 ?
C. 0.0104 × 0.023 ?
D. 33 + 435.5 ?
E. 14.1 ÷ 76.3 ?
F. 105.55 – 34.2 ?

36
Check Your Neighbour
Write down the results of the following calculations to an
appropriate number of significant figures:
Answer:
# Calculation Answer
A. 1.2 × 8 10
B. 13.0 × 43.23 562
C. 0.0104 × 0.023 0.000 24
D. 33 + 435.5 469
E. 14.1 ÷ 76.3 0.185
F. 105.55 – 34.2 71.4

37
1. Data Collection
True value, accuracy and precision
• Measuring a quantity is an attempt to find an estimate of the
‘true value’ of that quantity.

• The true value can never be known with absolute precision,


but by gathering more data, we hope to get a better
estimate of the true value.

• If our estimate is close to the true value, then we say that


the measurements are accurate.

• A measurement is precise when the uncertainty on the


value is small, but this does not imply that it is close to the
true value. 38
1. Data Collection
Uncertainties in measurements
• Despite our best efforts or the quality of the equipment we
use, there is going to be an amount of variability in
quantities measured in an experiment.

• Estimates of the uncertainty in measurements should


always accompany the measurement and need to be
recorded in the laboratory notebook.

• For tabulation of data with uncertainties, it is best to write


the uncertainty in the heading of the column in the table
containing the data.

39
1. Data Collection
Uncertainties in measurements
• Example: variation of electrical resistance with temperature
of a copper wire.

Temperature (°C) ± 0.5 °C Resistance (Ω) ± 0.5 Ω


8.0 0.208
16.5 0.213
23.5 0.222
32.0 0.229
40.5 0.232
54.5 0.243
40
1. Data Collection
Uncertainties in measurements
• If we were to make repeated measurements of a particular
quantity, we are likely to find a variation in the observed
values.

• Although it may be possible to reduce an uncertainty by


improved experimental method or the careful use of
statistical techniques, it can never be eliminated.

• We need to be able to identify and quantify the variation,


otherwise the reliability of our experiment is likely to be
questioned, and any conclusions drawn from the experiment
may be of limited value.
41
1. Data Collection
Single measurement: resolution uncertainty
• No instrument exists that can measure a quantity to infinitely
fine resolution.

• All measurements are limited by the instrument you are


using.

• If the quantity measured is stable or varies slowly with time,


it is reasonable to quote the uncertainty as one half the
smallest division on the scale.

• The resolution limit of an instrument represents the smallest


uncertainty that can be quoted in a single measurement of a
quantity. 42
1. Data Collection
Single measurement: reading uncertainty
• It is possible that the quantity under investigation varies by
much more than half the smallest division on the instrument.

• Example: Heating a beaker containing water using


thermometer of resolution of ±1°C. As water is stirred,
thermometer indicates a wide temperature variation: 36°C,
then, 33°C, and then, 35°C.

• Quoting uncertainty of ±1°C would underestimate the


experimental uncertainty.

• We estimate uncertainty to be less than ±5°C, but greater


than ±1°C and choose a compromise between these.
43
1. Data Collection
Single measurement: reading uncertainty
• In this situation, there are no ‘hard and fast’ rules about
quoting uncertainties, and we have to rely on our common
sense.

44
1. Data Collection
Single measurement: calibration uncertainty
• The instruments that you use should have been calibrated
at some time against a standard. For the calibration to
remain valid, the instrument must be checked regularly.

• If scientists around the world are trying to compare their


measurements, they need to be sure that their instruments
‘agree’ on what is a metre, volt, second, etc.

• An uncalibrated, or poorly calibrated, instrument leads to


systematic uncertainty in data and influences all
measurements made with that instrument.

45
1. Data Collection
Repeat measurements
• To be able to get a real feel for the variability in
measurement, more than one measurement should be
made for each quantity.

• Where this is possible we can use statistical tools to allow


us to quantify experimental uncertainties.

46
1. Data Collection
The mean
• Example: Times for an object to fall 25 m

Time of fall (s) 0.64 0.61 0.63 0.53 0.59 0.65 0.60 0.61 0.64 0.71

• We could expect the time that it really took for the object to
fall to lie somewhere between two extreme measured
values, namely between 0.53 s and 0.71 s.

• If a single value for the time of fall is required, we can do no


better than to calculate the average (or mean) of ten
measurements that were made.
47
1. Data Collection
The mean
• The mean (𝑥)ҧ is calculated using the formula:

σ 𝑥𝑖
𝑥ҧ =
𝑛

• Using the data given, the mean time is 0.621 s.

• We could quote the mean to one, two or three significant


figures, that is 0.6 s, 0.62 s or 0.621. Which do we choose?

• We can answer this question only when we have an


estimate for the uncertainty in the mean value.
48
1. Data Collection
Uncertainty in the mean
• A simple method of estimating the uncertainty in the mean
of a set of data involves first calculating the range of the
data:

range = largest value – smallest value

• The uncertainty in the mean is found by dividing the range


by the number of measurements made, n:

range
Uncertainty in mean =
n

49
1. Data Collection
Uncertainty in the mean
• Example: Speed of sound in air at 20°C

Speed (m/s) 341.5 342.4 342.2 345.5 341.1 338.5 340.3 342.7

• The mean is 341.775 m/s; the range 345.5 – 338.5 = 7 m/s.


The uncertainty is 7 ÷ 8 = 0.875 m/s.

• We might be tempted to say that the speed of sound in air is


341.775 m/s with an uncertainty of 0.875 m/s.

• Uncertainties serve to quantify the probable range in which


the value of that quantity lies.
50
1. Data Collection
Uncertainty in the mean
• There is no point, therefore, in quoting the uncertainty to
more than one significant figure.

• If the first figure in the uncertainty is a ‘1’, it is usual to give


the uncertainty to two significant figures.

• In the present example we would round 0.875 m/s up to 0.9


m/s.

• We now further round the mean to the same number of


decimal places as the uncertainty, i.e. 341.775 m/s
becomes 341.8 m/s.
51
1. Data Collection
Uncertainty in the mean
• To summarise, there are four steps in quoting the value of
the quantity:

1. Calculate the mean of the measured values.


2. Calculate the uncertainty in the quantity, making clear
the method used. Round the uncertainty to one
significant figure (or two if the first figure is a ‘1’).
3. Quote the mean and uncertainty to the appropriate
number of figures.
4. State the units of the quantity.
52
1. Data Collection
Uncertainty in the mean
• When an uncertainty in an experimental value is quoted, we
are not saying that the actual or true value of the quantity
must lie between the limits given by (mean + uncertainty) to
(mean – uncertainty).

• The probability is high that it will lie between these limits,


and it actually is possible to quantify that probability.

• The uncertainty that is expressed in the same units as the


quantity being measured is referred to as the absolute
uncertainty in the quantity.

53
1. Data Collection
Fractional and percentage uncertainty
• In some cases you may be required to state the ratio
uncertainty in quantity
quantity

• This ratio is referred to as the fractional uncertainty in the


quantity.

• The percentage uncertainty is found by multiplying the


fractional uncertainty by 100%.

• Fractional or percentage uncertainty are normally quoted to


no more than one significant figure.
54
1. Data Collection
Systematic and random uncertainties
• There are two broad categories of uncertainties that can
occur in an experiment:

1. Systematic uncertainties
2. Random uncertainties

• There are two types of systematic uncertainty which can


exist with measuring instruments:

1. Offset uncertainty
2. Gain uncertainty

55
1. Data Collection
Offset uncertainty
• Example: Melting point of water using thermocouple

Temp. (°C) -7.5 -7.3 -6.9 -7.4 -7.7 -7.6 -7.6 -7.3 -7.6

• The mean is -7.43°C and the uncertainty 0.08°C.

• Clearly there is something wrong here: the melting point of


water should be very close to 0.0°C

• For whatever reason, all measurements are too low by


about 7.5°C.

• We have just exposed an offset uncertainty in our system.


56
1. Data Collection
Gain uncertainty
• The offset uncertainty remains fixed irrespective of the
magnitude of the quantity being measured.

• In contrast, the gain uncertainty is dependent on the


magnitude of the quantity.

• Example: Five calibration mass pieces are placed on a


balance and readings were taken.
Mass piece (g) 20.00 40.00 60.00 80.00 100.00
Reading (g) 20.26 40.65 60.98 81.20 101.52
Difference (g) 0.26 0.65 0.98 1.20 1.52
57
1. Data Collection
Gain uncertainty
• As the mass of the piece increases, so the difference
between the measured and calibrated mass increases.

• The difference increases in direct proportion to the


magnitude of the mass piece located on the balance.

• This establishes the relationship between the calibrated


mass and the measured mass for this particular weighing
balance.

• Future measurements of the mass using this balance can


then be corrected for the gain uncertainty.
58
1. Data Collection
Random uncertainties
• Random uncertainties produce scatter in observed values.

• The cause could be environmental factors such as:


• Electrical interference affecting voltage/current sensitive
measurements.
• Vibrations affecting measurements with sensitive
electronic balance.
• Power supply fluctuations affecting optical
measurements.

• We can use statistical techniques to estimate random


uncertainties and calculate the effect of combining
uncertainties. 59
1. Data Collection
Random uncertainties
• Statistics is the science of assembling, organising and
interpreting numerical data.

• The statistical approach is valid when we have made


sufficient measurements (say in excess of five) to
satisfactorily describe the spread in data.

60
1. Data Collection
Standard deviation (SD)
• If 𝑥𝑖 represents an 𝑖th data value in a set of 𝑛 repeated
measurements, and 𝑥ҧ the mean of the data values, then the
standard deviation, 𝜎 is given by

σ(𝑥𝑖 − 𝑥)ҧ 2
𝜎=
𝑛

• A pocket calculator, or computer software package, with


built-in statistical functions can be very helpful, especially
when there are many numbers to process.

61
1. Data Collection
Standard deviation (SD)
• Example: Time for a body to slide down a plane.

Time (s) 0.64 0.64 0.59 0.58 0.70 0.61 0.68 0.55 0.57 0.63

• For these 10 measurements, 𝝈 = 0.04571 s. When 50


measurements were made, 𝝈 = 0.04364 s.

• The SD of a set of repeat measurements of a quantity


remains almost constant, regardless of how many
measurements are made.

• By making repeat measurements we are trying to get the


best estimate for the quantity and its uncertainty.
62
1. Data Collection
Standard deviation (SD)
• Should the SD be taken as the uncertainty in the mean? If
so, is there any point in increasing repeat measurements?

• The SD is characteristic of the spread of the whole data set


and should not be taken as the uncertainty in the mean.

• The standard deviation of mean (𝝈𝒙ഥ ) is the proper


estimate for the uncertainty in the mean.

• It is proven mathematically that


𝜎
𝜎𝑥ҧ =
𝑛
63
1. Data Collection
Standard deviation of mean (SDOM)
• Example: Volume of water from fluid-flow experiment

Vol. (mL) 33 45 43 42 45 42 41 44 40 42

• Mean is 41.7 mL, with SD of 3.29 mL.

• The experiment was performed eight times, with each one


consisting of 10 repeat measurements.
Mean (mL) 41.0 41.7 40.4 41.5 41.7 40.4 42.5 39.5
SD (mL) 3.13 3.29 3.07 3.11 3.20 2.94 2.73 3.20

• Mean of the means is 41.1 mL, with SDOM of 0.893 mL.


64
1. Data Collection
Standard deviation of mean (SDOM)
• We see now that it is worthwhile to make many repeat
measurements if we want to reduce the uncertainty in the
mean.

• In summary:

• The best estimate is the mean of repeat measurements.

• The SD is a measure of the spread of the measurements


as insensitive to how many measurements are made.

• The SDOM is the uncertainty in the mean and this does


decrease as the number of measurements increases.
65
1. Data Collection
Population and sample
• Although we want to reduce uncertainties in the data we
collect during experiments, we are not able to make an
infinite number of repeat measurements of a quantity.

• The totality of measurements that could be made is called


the population.

• We are only able to make a few repeat measurements


which can be regarded as the sample of all possible
measurements and use these to estimate the population
mean and SD.

66
1. Data Collection
Population and sample
• From this perspective, the SD is the estimate of the
population SD and is calculated using

σ(𝑥𝑖 − 𝑥)ҧ 2
𝜎=
𝑛−1

• This version of the SD equation is preferred in calculations.

• So long as the number of repeat measurements is greater


than 3, both versions of SD equations will usually return the
same number to one significant figure.
67
1. Data Collection
Combining uncertainties
• An experiment may require the determination of several
quantities which are later to be inserted into an equation.

• Example: calculate the density, ρ, of a body of mass, m, of


a body and volume, V. How do uncertainties in m and V
combine to give the uncertainty in ρ?

• We can apply differential calculus to determine this.

• The combination of uncertainties is called the propagation


of uncertainties, or error propagation.

68
1. Data Collection
Combining uncertainties
• Example: Consider a function 𝑉 = 𝑉(𝑎, 𝑏), where 𝑎 and 𝑏
have uncertainties ∆𝑎 and ∆𝑏, respectively.

• To find the uncertainty in V, we compute

𝜕𝑉 𝜕𝑉
∆𝑉 = ∆𝑎 + ∆𝑏
𝜕𝑎 𝜕𝑏

• The brackets around partial derivatives mean that we ignore


any minus sign that may occur after differentiation.

• This avoids a cancellation of terms that could occur.


69
1. Data Collection
Combining uncertainties
• The previous method is satisfactory, but tends to
overestimate the uncertainty in the calculated quantity.

• It is possible for uncertainties ∆𝑎 and ∆𝑏 to partially cancel


out in situations where they are independent of each other.

• Taking the SDOM as the uncertainty in the mean of the


measured values of 𝑉 = 𝑉(𝑎, 𝑏), the propagation becomes

2 2
𝜕𝑉 𝜕𝑉
𝜎𝑉ഥ = 2
𝜎𝑎ത + 𝜎𝑏ത2
𝜕𝑎 𝜕𝑏
70
2. Data Presentation

Overview

When data are presented pictorially, trends can be


detected that we would be unlikely to recognise if the data
were given only in tabular form.

71
2. Data Presentation
x-y graphs
• Pictorial representation of data in a graph is a good way to
summarise many important features of the experiment.

• A graph can indicate:


• the range of measurements made
• the uncertainty in each measurement
• the existence or absence of a trend in the data gathered.
• which data points do not follow the general trend
exhibited by the majority of data.

• An x-y graph possesses horizontal and vertical axes termed


the x- and y-axis, respectively.
72
2. Data Presentation
x-y graphs
• INDEPENDENT VARIABLE: The quantity which is
controlled or deliberately varied during an experiment and is
plotted as the x-coordinate.

• DEPENDENT VARIABLE: The quantity that varies in


response to changes in the independent variable and is
plotted as the y-coordinate.

• TITLE: indicates the relationship being investigated. If it is


stated that quantity ‘A’ is plotted versus or against quantity
‘B’, then quantity ‘A’ is plotted on the y-axis and quantity ‘B’
on the x-axis.
73
2. Data Presentation
x-y graphs
• LABELS & UNITS: indicate names of the quantities under
study and their units of measurement (usually in brackets).

• ORIGINS: There is no rule to say that we must include the


origin on a graph. To do so may cause important information
to be concealed.

74
2. Data Presentation
Linear x-y graphs
• Linear graphs have an important place in the analysis of
experimental data for the following reasons:

• The gradient and y-intercept can be calculated.


• Departure from linearity can be observed.
• Outliers can be identified.
• Predictability of x- or y-quantity for a chosen y- or x-
quantity.

• If we are satisfied that a linear relationship exists between


the x- and y-quantities, it is useful to be able to write down
an equation that represents that relationship.
75
2. Data Presentation
Linear x-y graphs
• An equation representing the relationship between x and y
quantities can be found by first plotting the data on an x-y
graph followed by drawing the ‘best’ line through the points
with a plastic ruler.

• The gradient and intercept of this line can then be


calculated.

• Although positioning a line ‘by eye’ through the data points


gives reasonable estimates of m and c, there are some
difficulties with this method:

76
2. Data Presentation
Linear x-y graphs
i. No two people draw the same ‘best’ line through a
given data set.
ii. If the uncertainty in each data point is different, how do
we take this into account when drawing the best line?
iii. Drawing the best line is difficult for largely-scattered
data.
iv. Finding the uncertainties in m and c is cumbersome.

• In order to avoid the guesswork involved in finding the best


line by eye, we use the method of ‘least squares’ (a.k.a.
linear regression).

77
2. Data Presentation
Least squares method
i. We assume that any random uncertainty in data values is
confined to measurements made of the y-quantity.

ii. We assume that the uncertainty in each measurement of


the y-quantity is the same. This is the unweighted least
squares fit. (The other would be the weighted).

• The following diagram shows part of an x-y graph with a line


passing close to the data points:

78
2. Data Presentation
Least squares method

Observed
experimentally

Calculated:
𝑦𝑖𝑐 = 𝑚𝑥𝑖 + 𝑐

79
2. Data Presentation
Least squares method
• For a particular value of x, labelled 𝑥𝑖 , there are observed
(𝑦𝑖𝑜 ) and calculated (𝑦𝑖𝑐 ) values of y.

• ∆𝑦𝑖 is the difference between the observed and calculated


y-value and is called the residual:

∆𝑦𝑖 = 𝑦𝑖𝑜 − 𝑦𝑖𝑐

• The best position for the line, and therefore the best values
for m and c, is found by minimising the sum of the square of
the residuals.

80
2. Data Presentation
Least squares method
• Writing SS for sum of squares, we say:

𝑆𝑆 = (∆𝑦1 )2 +(∆𝑦2 )2 +(∆𝑦3 )2 + ⋯ + (∆𝑦𝑛 )2

= σ𝑛1 (∆𝑦𝑖 )2

• Replacing ∆𝑦𝑖 by 𝑦𝑖𝑜 − 𝑦𝑖𝑐 and 𝑦𝑖𝑐 by 𝑚𝑥𝑖 + 𝑐 we can write

𝑆𝑆 = ෍[𝑦𝑖𝑜 − 𝑚𝑥𝑖 + 𝑐 ]2

81
2. Data Presentation
Least squares method
• We seek values of m and c that reduce SS to the smallest
possible value. Those are the best values for gradient and
intercept.

• By partially differentiating the above equation with respect to


m and c, equating each result to zero, we get:

෍ 𝑥𝑖 (𝑦𝑖𝑜 −𝑚𝑥𝑖 − 𝑐) = 0

and:
෍(𝑦𝑖𝑜 −𝑚𝑥𝑖 − 𝑐) = 0
82
2. Data Presentation
Least squares method
• The above equations can be expanded and combined to
give the following equations for m and c:

𝑛 σ 𝑥𝑖 𝑦𝑖 − σ 𝑥𝑖 σ 𝑦𝑖
𝑚= 2
𝑛 σ 𝑥𝑖 2 − (σ 𝑥𝑖 )
and:

σ 𝑥𝑖 2 σ 𝑦𝑖 − σ 𝑥𝑖 σ 𝑥𝑖 𝑦𝑖
𝑐= 2
2
𝑛 σ 𝑥𝑖 − (σ 𝑥𝑖 )
83
2. Data Presentation
Least squares method
• The ‘o’ from the subscript of y has been omitted from the
observed values of y that appear in the equations.

• It is not possible to decide how many figures m and c should


be quoted to until the uncertainties in m and c (written as 𝜎𝑚
and 𝜎𝑐 ) have been calculated.

• In order to calculate 𝜎𝑚 and 𝜎𝑐 we assume the following:


i. For each value of x, the corresponding value of y has
some uncertainty.
ii. The uncertainty in each value of y contributes
something to the uncertainties in m and c.
84
2. Data Presentation
Least squares method
• After going through a number of mathematical steps, the
explicit equations for 𝜎𝑚 and 𝜎𝑐 are quoted as follows:

1
𝜎𝑛2
𝜎𝑚 = 1
2 2
𝑛 σ 𝑥𝑖 2 − (σ 𝑥𝑖 )
and:
1
𝜎 σ 𝑥𝑖 2 2
𝜎𝑐 = 1
2 2
𝑛 σ 𝑥𝑖 2 − (σ 𝑥𝑖 )
85
2. Data Presentation
Least squares method
• The 𝜎 is the uncertainty in each y-value of the data point.

• It is usual, when fitting a line to data in which the uncertainty


in each point is constant, to take this uncertainty to be the
standard deviation of the distribution of the y-values about
the fitted line. This is given by:

1
2
1
𝜎= ෍(𝑦𝑖 −𝑚𝑥𝑖 − 𝑐)2
𝑛−2

86
2. Data Presentation
Weighting the fit
• ‘Weighted’ least squares fitting is used for situations in
which the uncertainties in the y-values vary from point to
point.

• The sum of squares is weighted so that, when fitting takes


place, the calculated line lies closest to those points that are
known to the greatest precision.

• Each value of uncertainty (written as 𝜎𝑖 ) must be used in the


calculations of m, 𝜎𝑚 , c, and 𝜎𝑐 .

87
2. Data Presentation
Weighting the fit
• Let
2 2
1 𝑥𝑖 𝑥𝑖
∆= ෍ 2 ෍ 2 − ෍ 2
𝜎𝑖 𝜎𝑖 𝜎𝑖

• The equations for m and c are:

1 𝑥𝑖 𝑦𝑖 𝑥𝑖 𝑦𝑖
σ 2 σ 2 −σ 2 σ 2
𝜎𝑖 𝜎𝑖 𝜎𝑖 𝜎𝑖
𝑚=

88
2. Data Presentation
Weighting the fit

𝑥2 𝑦𝑖 𝑥𝑖 𝑥𝑖 𝑦𝑖
σ 2 σ 2 −σ 2 σ 2
𝜎𝑖 𝜎𝑖 𝜎𝑖 𝜎𝑖
𝑐=

• The equations for 𝜎𝑚 and 𝜎𝑐 are:

1
1 2
σ 2
𝜎𝑖
𝜎𝑚 =

89
2. Data Presentation
Weighting the fit
1
𝑥𝑖 2 2
σ 2
𝜎𝑖
𝜎𝑐 =

• With the obvious good amount of work required in applying
the foregoing equations, it can be of great assistance to use
a computer spreadsheet, e.g. Microsoft Excel.

90
2. Data Presentation
Histograms
• A histogram (or bar chart) is useful for displaying data
from repeat measurements.

• The range of the data is divided into a number of equal


intervals and the number of values that fall into each interval
is plotted vertically with the intervals plotted horizontally.

• Histogram intervals are sometimes referred to as bins (or


channels), and the number of values that fall into each bin
is referred to as the frequency (or counts).

• We can also plot frequency/N versus interval, where N is


the total number of measurements. 91
2. Data Presentation
Histograms
• Step-by-step approach to plotting a histogram:

1. Find the range, R, of the data (max value – min value)

2. Count total number of values in data set, N

3. Calculate 𝑁 and round to the nearest whole number.


This determines the number of intervals

4. Divide R by the number calculated in step 3 to give the


width of each interval

5. Draw up table showing all the intervals covering the


range and the number of data values in each interval
2. Data Presentation
Histograms
• Example: 50 readings of time for body to slide down incline.
Time (s)
0.61 0.60 0.53 0.64 0.58
0.60 0.69 0.56 0.68 0.57
0.61 0.60 0.58 0.64 0.59
0.61 0.65 0.58 0.68 0.61
0.68 0.59 0.55 0.62 0.59
0.58 0.60 0.64 0.59 0.70
0.51 0.66 0.61 0.55 0.63
0.64 0.61 0.62 0.53 0.63
0.63 0.64 0.61 0.62 0.60
0.59 0.53 0.59 0.54 0.55
2. Data Presentation
Histograms
• Data re-arranged: from smallest to largest.
Time (s)
0.51 0.58 0.60 0.61 0.64
0.53 0.58 0.60 0.61 0.64
0.53 0.58 0.60 0.62 0.64
0.53 0.58 0.60 0.62 0.65
0.54 0.59 0.60 0.62 0.66
0.55 0.59 0.61 0.63 0.68
0.55 0.59 0.61 0.63 0.68
0.55 0.59 0.61 0.63 0.68
0.56 0.59 0.61 0.64 0.69
0.57 0.59 0.61 0.64 0.70
2. Data Presentation
Histograms
• Step 1: Find the range, R, of the data

o Range = 0.70 - 0.51 = 0.19

• Step 2: Count total number of values in data set, N

o N = 50

• Step 3: Calculate 𝑁 to determine the number of intervals

o 𝑁 = 50 = 7.1 = 7

95
2. Data Presentation
Histograms
• Step 4: Divide R by the number calculated in step 3 to give
the width of each interval

o Width = R/7 = 0.19/7 = 0.03

• Step 5: Draw up table showing all the intervals covering the


range and the number of data values in each interval

96
2. Data Presentation
Histograms

Interval number Interval range (s) Frequency

1 0.51 – 0.53 4

2 0.54 – 0.56 5

3 0.57 – 0.59 11

4 0.60 – 0.62 15

5 0.63 – 0.65 9

6 0.66 – 0.68 4

7 0.69 – 0.71 2
2. Data Presentation
Histograms
50 readings of time for body to slide down incline
16

14

12

10
Frequency

2 4
0
1 2 3 4 5 6 7
98
Intervals of time (s)
2. Data Presentation
Histograms
50 readings of time for body to slide down incline
16

14

12

10
Frequency

2 5
4
0
1 2 3 4 5 6 7
99
Intervals of time (s)
2. Data Presentation
Histograms
50 readings of time for body to slide down incline
16

14

12

10
Frequency

6
11
4

2 5
4
0
1 2 3 4 5 6 7
100
Intervals of time (s)
2. Data Presentation
Histograms
50 readings of time for body to slide down incline
16

14

12

10
Frequency

8
15
6
11
4 9

2 5
4 4
2
0
1 2 3 4 5 6 7
101
Intervals of time (s)
2. Data Presentation
Histograms
• The histogram displays the following features:

1. ‘Interval #4’ (0.60 to 0.62) has the largest frequency. It


includes the mean (0.60 s). It indicates that a large
proportion of values lie close to the mean.

2. The distribution of values is approximately symmetrical


about Interval #4 containing the mean.

3. There are few values that lie far from the mean.

• These features show up in a wide variety of experimental


data. Such features correspond to Gaussian distribution.
102
2. Data Presentation
Histograms
• Example: ~78,000 readings of gamma-ray energies.

DATA
2. Data Presentation
Histograms
• Example of data presented as histogram.
137Cs Spectrum
800

600
Counts

400

200

0
542
576
46
79
112
145
178
211
245
278
311
344
377
410
443
476
509

609
642
675
708
741
774
807
104
Energy (keV)
2. Data Presentation
Distributions
• In most experiments, as one increases the number of
measurements, the histogram takes on some definite,
continuous curve.

• The curve is called the limiting distribution.

• The limiting distribution is a theoretical construct, which can


never itself be measured exactly, unless we make infinitely
many measurements and use infinitesimally narrow bins.

• There is evidence that almost all measurements have a


limiting distribution.
105
2. Data Presentation
Distributions
Repeated measurements of time of motion of a body
16
14
12
10
Frequency

8
6
4
2
0
1 2 3 4 5 6 7
Intervals of time (s)
2. Data Presentation
Distributions
• A limiting distribution defines some function f(x).

• The fraction of measurements that fall in any small interval x


to (x + dx) equals the area f(x)dx.

• More generally, the number of measurements that fall


between x = a and x = b is the total area
𝑏
න 𝑓 𝑥 𝑑𝑥
𝑎

• This gives the probability that any measurement will fall


between x = a and x = b.
107
2. Data Presentation
Distributions
• If we knew the distribution f(x), then we would know the
probability of obtaining an answer in any interval a ≤ x ≤ b.

• The total probability of obtaining a measurement between


− ∞ and ∞ is one. Therefore

න 𝑓 𝑥 𝑑𝑥 = 1
−∞

• If we knew the limiting distribution f(x), we would also


calculate the mean 𝑥ҧ for the measurements.

108
2. Data Presentation
Distributions
• The mean of any number is the sum of all different values,
𝑥𝑖 , each weighted by the fraction of times it is obtained,

𝑥ҧ = ෍ 𝑥𝑖 𝐹𝑖
𝑖

• For distribution f(x), 𝐹𝑖 = 𝑓 𝑥 𝑑𝑥 thus



𝑥ҧ = න 𝑥𝑓 𝑥 𝑑𝑥
−∞

109
2. Data Presentation
Distributions
• We can also calculate the standard deviation, 𝜎𝑥 , for the
measurements

2
𝜎𝑥 = න (𝑥 − 𝑥)ҧ 2 𝑓 𝑥 𝑑𝑥
−∞

• Not all limiting distributions (e.g. binomial and Poisson


distributions) have a symmetric bell shape characteristic of
the Gaussian (or normal) distribution.

• Nevertheless, many measurements have a symmetric bell-


shaped curve for their limiting distribution.
110
2. Data Presentation
The Normal Distribution
• If a measurement has many small sources of random error
and negligible systematic error, then the measured values
will be distributed on a bell-shaped curve.

• This curve will be centered on the true value of x (denoted


by X).

• The mathematical function that describes the bell-shaped


curve is called the normal (also Gaussian) distribution:

−(𝑥−𝑋) 2 /2𝜎 2
𝑓 𝑥 = 𝑒
111
2. Data Presentation
The Normal Distribution
• The 𝜎 is called the width parameter.

• To satisfy ‫׬‬−∞ 𝑓 𝑥 𝑑𝑥 = 1, this function becomes:

1 −(𝑥−𝑋) 2 /2𝜎 2
𝑓 𝑥 = 𝑒
𝜎 2𝜋

• It follows that, if the limiting distribution is the Gaussian


distribution centered on the true value X, then, after many,
many trials:

𝑥ҧ = 𝑋 and 𝜎𝑥 2 = 𝜎 2
112
2. Data Presentation
The Normal Distribution
• In other words, if we make a large (but finite) number of
trials, then our average, 𝑥,ҧ will be close to X.

• Also, the width parameter, 𝜎, of the Gaussian function is just


the standard deviation that we would obtain after making
many measurements.

• For normally distributed results, almost 70% of the total area


under the whole curve lies between ±𝜎, that is, between 𝑥ҧ −
𝜎 and 𝑥ҧ + 𝜎.

• About 95% of data lie between 𝑥ҧ − 2𝜎 and 𝑥ҧ + 2𝜎.


113
2. Data Presentation
The Normal Distribution
• Table summarises confidence limits and their associated
probability.

Probability that true value lies


Confidence limits
between these limits (%)
𝑥ҧ − 𝜎𝑥ҧ to 𝑥ҧ + 𝜎𝑥ҧ 68.3

𝑥ҧ − 2𝜎𝑥ҧ to 𝑥ҧ + 2𝜎𝑥ҧ 95.4

𝑥ҧ − 3𝜎𝑥ҧ to 𝑥ҧ + 3𝜎𝑥ҧ 99.7

𝑥ҧ − 4𝜎𝑥ҧ to 𝑥ҧ + 4𝜎𝑥ҧ 99.994


2. Data Presentation
Chi-squared Test
• Limiting distributions are functions that describe the
expected distribution of results if an experiment is repeated
many times.

• How can we decide whether our observed distribution of


results is consistent with the expected theoretical
distribution?

• The chi-squared (𝝌𝟐 ) test is the procedure that we use to


answer this question.

• In general, 𝜒 2 is a sum of squares with the form:


115
2. Data Presentation
Chi-squared Test
𝑛 2
2
observed value − expected value
𝜒 =෍
standard deviation
1

• 𝜒 2 is an indicator of the agreement between the observed


and expected value of some variable.

• If the agreement is good, 𝜒 2 will be of order n; and if it is


bad, 𝜒 2 will be much greater than n.

• We can only use 𝜒 2 to test this agreement if we know the


expected values and the standard deviation.
116
2. Data Presentation
Chi-squared Test
• Consider an experiment to measure a number 𝑥 with a
certain expected distribution of results.

• We repeat the measurement N times and, having divided


the range of possible results 𝑥 into 𝑛 bins, 𝑘 = 1,…, 𝑛, we
count the number 𝑂𝑘 of observations that fall in each bin 𝑘.

• The expected number 𝐸𝑘 is determined by the assumed


distribution, and the standard deviation is 𝐸𝑘 .
𝑛
2
2
𝑂𝑘 − 𝐸𝑘
𝜒 =෍
𝐸𝑘
𝑘=1
117
2. Data Presentation
Chi-squared Test
• If our hypothesis that our measurements conform to a
particular distribution is correct, then we would expect that
the deviations (𝑶𝒌 −𝑬𝒌 ) would be small.

• Conversely, if the deviations (𝑂𝑘 −𝐸𝑘 ) prove to be large,


then we would suspect that our hypothesis is incorrect.

• We need to decide how large we would expect (𝑂𝑘 −𝐸𝑘 ) to


be if the measurements really are distributed as expected.

• If 𝝌𝟐 = 0, then the agreement between the observed and the


expected distributions is perfect.
118
2. Data Presentation
Chi-squared Test
• In general, the individual terms in the 𝜒 2 equation are
expected to be of order 1, and there are 𝑛 terms in the sum.

• Thus if 𝜒 2 ≲ 𝑛, the observed and expected distributions


agree about as well as could be expected.

• But if 𝜒 2 ≫ 𝑛, we can suspect that our measurements were


not governed by the expected distribution.

• It is better to compare 𝜒 2 , not with the number of bins 𝑛, but


with the number of degrees of freedom (𝒅) instead.

119
2. Data Presentation
Reduced 𝝌𝟐
• The number 𝒅 in a statistical calculation is the number of
observed data minus the number of parameters computed
from the data and used in the calculation.

• Therefore,

𝑑 =𝑛−𝑐

where 𝑛 is the number of bins and 𝑐 is the number of


parameters that had to be calculated from the data.

• The number 𝑐 is often called the number of constraints.


120
2. Data Presentation
Reduced 𝝌𝟐
• We can now make our 𝜒 2 test more precise. It can be
shown that the expected value of 𝜒 2 is precisely 𝑑,

(expected average value of 𝜒 2 ) = 𝑑

• We can now use a reduced chi-squared, which we denote


by 𝜒෤ 2 and define as

𝜒෤ 2 = 𝜒 2 /𝑑

• And since the expected value of 𝜒 2 is 𝑑, we see that the

(expected average value of 𝜒෤ 2 ) = 1


121
2. Data Presentation
Reduced 𝝌𝟐
• If we obtain a value of 𝜒෤ 2 of order 1 or less, then we have
no reason to doubt our expected distribution.

• If we obtain a value of 𝜒෤ 2 much larger than 1, then it is


unlikely that our expected distribution is correct.

• We now need a quantitative measure of agreement.

• We need some guidance where to draw the boundary


between agreement and disagreement.

122
2. Data Presentation
Probabilities for 𝝌𝟐
• We can calculate the probability of obtaining a value of
𝜒෤ 2 as large as, or larger than, our observed value 𝜒෤𝑜 2 (where
the subscript o stands for “obtained”).

• We compute the probability

𝑃(𝜒෤ 2 ≥ 𝜒෤𝑜 2 )

of finding a value of 𝜒෤ 2 greater than or equal to the value of


𝜒෤𝑜 2 actually obtained.

123
2. Data Presentation
Probabilities for 𝝌𝟐
• If the probability is high, then our value 𝜒෤𝑜 2 is perfectly
acceptable, and there is no reason to reject our expected
distribution.

• If this probability is unreasonably low, then a value of 𝜒෤ 2 as


large as our observed 𝜒෤𝑜 2 is very unlikely, and it is unlikely
that our expected distribution is correct.

• We have to decide on the boundary between what is


“reasonably probable” and what is not.

124
2. Data Presentation
Probabilities for 𝝌𝟐
• With the boundary at 5 percent, we would say that our
observed value 𝜒෤𝑜 2 indicates a “significant disagreement” if

𝑃 𝜒෤ 2 ≥ 𝜒෤𝑜 2 < 5%

• We would then reject our expected distribution at the “5


percent significance level”.

• If at 1 percent, then we could say that the disagreement is


“highly significant” if 𝑃 𝜒෤ 2 ≥ 𝜒෤𝑜 2 < 1% and reject the
expected distribution at the “1 percent significance level.”

125
2. Data Presentation
Probabilities for 𝝌𝟐
• The probabilities 𝑃 𝜒෤ 2 ≥ 𝜒෤𝑜 2 are calculated from the
integral

2 2 2 𝑑−1 −𝑥 2 /2
𝑃𝑑 (𝜒෤ ≥ 𝜒෤0 ) = 𝑑/2 න 𝑥 𝑒 𝑑𝑥
2 𝛤(𝑑/2) 𝜒𝑜

and the results are tabulated.

126
2. Data Presentation
Probabilities for 𝝌𝟐
The percentage 𝑃 𝜒෤ 2 ≥ 𝜒෤𝑜 2 of obtaining a value 𝜒෤ 2 greater than or equal to 𝜒෤𝑜 2 .

෥𝟎𝟐
𝝌
d
0 0.25 0.5 0.75 1.0 1.25 1.5 1.75 2.0 3.0 4.0 5.0 6.0

1 100 62 48 39 32 26 22 19 16 8.3 4.6 2.5 1.4


2 100 78 61 47 37 29 22 17 14 5.0 1.8 0.7 0.2
3 100 86 68 52 39 29 21 15 11 2.9 0.7 0.2 -

5 100 94 78 59 42 28 19 12 7.5 1.0 0.1 - -


10 100 99 89 68 44 25 13 6 2.9 0.1 - - -
15 100 100 94 73 45 23 10 4 1.2 - - - 127 -
2. Data Presentation
Probabilities for 𝝌𝟐
• In the table, the numbers in the left column give choices of
𝑑, the number of degrees of freedom, and those at the other
column headings give possible values of 𝜒෤𝑜 2 .

• Each cell in the table shows the percentage probability


𝑃 𝜒෤ 2 ≥ 𝜒෤𝑜 2 as a function of 𝑑 and 𝜒෤𝑜 2 .

• Example: with 10 degrees of freedom (𝑑 = 10), the


probability of obtaining 𝜒෤ 2 ≥ 2 is 2.9 percent.

• Thus, if we obtained 𝜒෤𝑜 2 equal to 2 in an experiment with


𝑑 = 10, we could reject the distribution at the 5 % level.
128
2. Data Presentation
Probabilities for 𝝌𝟐
• The probabilities in the second column of the table are all
100 percent, since one is always certain to get 𝜒෤ 2 ≥ 0.

• As 𝜒෤ 2 increases, the probability of getting 𝜒෤ 2 ≥ 𝜒෤𝑜 2


diminishes, but it does so at a rate that depends on 𝑑.

• Thus, for 𝑑 = 2, the probability of obtaining 𝜒෤ 2 ≥ 1 is 37


percent; whereas for 𝑑 = 15, it is 45 percent.

• We are now able (using the table) to assign quantitative


significance to the value of 𝜒෤𝑜 2 obtained in any particular
experiment.
129
2. Data Presentation
Example
• 40 measurements of range 𝑥 of a projectile fired from gun.
𝒙 (cm)
731 771 722 653 733
739 709 760 672 766
678 689 805 764 709
698 754 725 738 787
772 681 688 757 742
780 676 748 687 645
748 810 778 753 675
770 830 710 638 712 130
2. Data Presentation
Example
• Rearranged: smallest to largest.
𝒙 (cm)
638 687 722 748 771
645 688 725 753 772
653 689 731 754 778
672 698 733 757 780
675 709 738 760 787
676 709 739 764 805
678 710 742 766 810
681 712 748 770 830 131
2. Data Presentation
Example
• Suppose we have reason to believe these measurements
are governed by a Gaussian distribution 𝑓𝑋,𝜎 𝑥 .

• We use our 40 measurements to compute best estimates


for the center 𝑋 and width 𝜎 of the expected distribution.

40
𝑥𝑖
best estimate of 𝑋 = 𝑥ҧ = ෍ = 𝟕𝟑𝟎. 𝟏 𝐜𝐦
40
𝑖=1

σ 𝑥𝑖 − 𝑥ҧ 2
best estimate of 𝜎 = = 𝟒𝟔. 𝟖 𝐜𝐦
39
132
2. Data Presentation
Example
• Suppose we have reason to believe these measurements
are governed by a Gaussian distribution 𝑓𝑋,𝜎 𝑥 .

• We use our 40 measurements to compute best estimates


for the center 𝑋 and width 𝜎 of the expected distribution.

40
𝑥𝑖
best estimate of 𝑋 = 𝑥ҧ = ෍ = 𝟕𝟑𝟎. 𝟏 𝐜𝐦
40
𝑖=1

σ 𝑥𝑖 − 𝑥ҧ 2
best estimate of 𝜎 = = 𝟒𝟔. 𝟖 𝐜𝐦
39
133
2. Data Presentation
Example
• We divide the range of possible 𝑥 values into bins, with bin
boundaries at 𝑋 – 𝜎, 𝑋, and 𝑋 + 𝜎.

Observations
Bin # Interval (cm)
Ok in bin
1 x<X–σ x < 683.3 8
2 X–σ<x<X 683.3 < x < 730.1 10
3 X<x<X+σ 730.1 < x < 776.9 16
4 X+σ<x 776.9 < x 6

• Assuming that our measurements are distributed normally,


we can calculate the expected number 𝐸𝑘 of measurements
in each bin 𝑘. 134
2. Data Presentation
Example
• The probability that any one measurement fall in an interval
𝑎 < 𝑥 < 𝑏 is the area under the Gaussian distribution
function between 𝑥 = 𝑎 and 𝑥 = 𝑏.

Expected Observed
Bin # Interval (cm) Probability Pk
Ek = NPk Ok

1 x<X–σ 16 % 6.4 8
2 X–σ<x<X 34 % 13.6 10
3 X<x<X+σ 34 % 13.6 16
4 X+σ<x 16 % 6.4 6

135
2. Data Presentation
Example
• We now calculate
𝑛
(𝑂 − 𝐸 )2
𝑘 𝑘
𝜒2 = ෍
𝐸𝑘
𝑘=1

(1.6)2 (−3.6)2 (2.4)2 (−0.4)2


= + + +
6.4 13.6 13.6 6.4
= 1.80

• We further need to calculate reduced chi squared

𝜒෤ 2 = 𝜒 2 /𝑑 136
2. Data Presentation
Example
• Here there were three constraints and hence only one
degree of freedom,

𝑑 =𝑛−𝑐 =4−3=1

• The first constraint is the number of observations N given by


𝑁 = σ𝑛𝑘=1 𝑂𝑘 .

• The other two constraints were the parameters 𝑋 and 𝜎,


estimated in order to calculate the expected numbers 𝐸𝑘 .

• In the examples considered here, there will always be at


least one constraint (𝑁 = σ 𝑂𝑘 ). 137
2. Data Presentation
Example
• Therefore

𝜒 2 1.80
2
𝜒෤ = = = 1.80
𝑑 1

• Question: is a value of 𝜒෤ 2 = 1.80 sufficiently larger than 1 to


rule out our expected Gaussian distribution or not?

• The probability turns out to be 𝑃(𝜒෤ 2 ≥ 1.80) ≈ 18%.

• We have no reason to reject our expected distribution.

138
2. Data Presentation
Summary: Chi-squared Test
• If we make 𝑛 measurements for which we know, or can
calculate, the expected values and the standard deviations,
then we define 𝜒 2 as
𝑛 2
observed value − expected value
𝜒2 =෍
standard deviation
1

• The 𝑛 measurements are the numbers, 𝑂1 ,…, 𝑂𝑛 , of times


that the value of some quantity 𝑥 was observed in each of 𝑛
bins.

139
2. Data Presentation
Summary: Chi-squared Test
• The expected number 𝐸𝑘 is determined by the assumed
distribution of 𝑥, and the standard deviation is 𝐸𝑘 .

𝑛
𝑂𝑘 − 𝐸𝑘 2
2
𝜒 =෍
𝐸𝑘
𝑘=1

• If the assumed distribution of 𝑥 is correct, the 𝜒 2 should be


of order 𝑛.

• If 𝜒 2 ≫ 𝑛, the assumed distribution is probably incorrect.

140
2. Data Presentation
Summary: Chi-squared Test
• If we were to repeat the whole experiment many times, the
mean value of 𝜒 2 should be equal to 𝑑, the number of
degrees of freedom, defined as

𝑑 =𝑛−𝑐

• Where 𝑐 is the number of parameters that had to be


calculated from the data to compute 𝜒 2 .

• The reduced 𝜒 2 is defined as

𝜒෤ 2 = 𝜒 2 /𝑑
141
2. Data Presentation
Summary: Chi-squared Test
• If assumed distribution is correct, 𝜒෤ 2 should be of order 1.

• If 𝜒෤ 2 ≫ 1, the data do not fit the assumed distribution


satisfactorily.

• Suppose we obtain the value 𝜒෤𝑜 2 in an experiment. If 𝜒෤𝑜 2 is


appreciably greater than one, we have reason to doubt our
assumed distribution.

• From the table, we can find the probability 𝑃 𝜒෤ 2 ≥ 𝜒෤𝑜 2 of


getting a value 𝜒෤ 2 as large as 𝜒෤𝑜 2 , assuming the expected
distribution is correct.
142
2. Data Presentation
Summary: Chi-squared Test
• If this probability is small, we have reason to reject the
expected distribution.

• If it is less than 5%, we would reject the assumed


distribution at the 5% (“significant”) level.

• If it is less than 1%, we would reject the assumed


distribution at the 1% (“highly significant”) level.

143
3. Data Interpretation

Overview

• These deal with the interpretation of the results that


have been presented. Moving from “chaos to concept.”

• The question is: what can be usefully said with the data
that were gathered?

144
3. Data Interpretation
• An experiment is likely to contain many details, both major
and minor.

• The discussion of results must focus on the important


points; spare the reader any mass of unnecessary detail.

• Where shortcomings have been identified in the


experimental method, these should be discussed.

• If the data from the experiment do not lend strong support to


the particular idea or hypothesis at the core of the
experiment, then this should be acknowledged.

145
3. Data Interpretation
• Even if the experimental method used could have been
improved, we should be not be too dismissive of data that
were obtained in an experiment.

• The question is: what can be usefully said with the data
that were gathered, despite the shortcomings?

• Here we must also refer back to the purpose of the


experiment.

• What was the aim of the experiment, and how far did the
experiments performed go in achieving that aim?

146
3. Data Interpretation

• If others have undertaken a similar investigation, then it is


usual to include a comparison of findings, giving reference
to the other work.

• For a known value of a quantity, a comparison of the values


should be given along with a reference to the source of the
information.

147

Das könnte Ihnen auch gefallen