Data Analysis Honours 2019

OUTLINE
# Day Date Topic/Activity
1 Thu 7 Feb Introduction & exercises
2 Fri 8 Feb Least squares fitting
3 Thu 14 Feb Histograms
4 Fri 15 Feb Distributions
5 Fri 22 Feb Class test

Les Kirkup (1994) John R. Taylor (1997) 2
The ‘GUM’
Guide to the Expression of Uncertainty in
Measurement
• Until recently, there has been a lack of consistency and

clarity in the texts that set out the concepts and
terminology of measurement.
• To address this, in the mid 1990s, international bodies

responsible for measurement standards published the
‘GUM’.
www.BIPM.org
The Big Picture
Measurements made during an experiment

generate ‘raw’ data which must be collected,
presented, and interpreted.
Typical Thesis Structure
CHAPTER 1: Introduction
CHAPTER 2: Theory/Literature Review
CHAPTER 3: Methodology
CHAPTER 4: Data Analysis
CHAPTER 5: Findings
Lecture #1
Analyse
To examine (something) methodically and in detail,
typically in order to explain and interpret it.
• With the data in hand, a most important question is

asked: ‘What do the data tell us?’
• An attempt to answer this question is the essence of

data analysis.
Introduction
• Discussion is organised under the headings:
1. Data Collection
2. Data Presentation
3. Data Interpretation
• These can be thought of as the different stages in the

process of data analysis in science.
• Different levels of analysis are involved in the different

stages of data handling.
• ‘Data Presentation’ possibly involves the most analysis.

8
1. Data Collection
Overview
The better the record that has been made of what has
been done, the easier will be the task of presenting the
work.
9
1. Data Collection
Significant figures
• Significant: “having a particular meaning; indicative of
something.”
o “meaningful”
• If an experimental data value is recorded as 6.12, this

implies that the actual value lies between 6.11 and 6.13
• If the value is written as 6.124, then this implies that the

actual value lies between 6.123 and 6.125
10
1. Data Collection
Significant figures
• Writing a value as 6.12 is to give it to three significant
figures, and to write it as 6.124 is to give it to four
significant figures.
• Significant figures are the figures that lie between the first
non-zero figure and the last figure inclusive.
11
Check Your Neighbour
How many significant figures appear in the following numbers?
A. 1.654 ?
B. 0.00437 ?
C. 64 000 ?
D. 1.20 ?
E. 0.100 007 38 ?
12
Answer:
A. 1.654 Four
B. 0.00437 ?
C. 64 000 ?
D. 1.20 ?
E. 0.100 007 38 ?
13
Answer:
A. 1.654 Four
B. 0.00437 Three
C. 64 000 ?
D. 1.20 ?
E. 0.100 007 38 ?
14
Answer:
A. 1.654 Four
B. 0.00437 Three
C. 64 000 Two
D. 1.20 ?
E. 0.100 007 38 ?
15
Answer:
A. 1.654 Four
B. 0.00437 Three
C. 64 000 Two
D. 1.20 Three
E. 0.100 007 38 ?
16
Answer:
A. 1.654 Four
B. 0.00437 Three
C. 64 000 Two
D. 1.20 Three
E. 0.100 007 38 Eight
17
How many significant figures are implied by the way the
following numbers are written?
A. 3.24 ?
B. 0.0023 ?
C. 83 400 ?
D. 1.010 ?
E. 10.5 ?
18
Answer:
A. 3.24 Three
B. 0.0023 ?
C. 83 400 ?
D. 1.010 ?
E. 10.5 ?
19
Answer:
A. 3.24 Three
B. 0.0023 Two
C. 83 400 ?
D. 1.010 ?
E. 10.5 ?
20
Answer:
A. 3.24 Three
B. 0.0023 Two
C. 83 400 Three
D. 1.010 ?
E. 10.5 ?
21
Answer:
A. 3.24 Three
B. 0.0023 Two
C. 83 400 Three
D. 1.010 Four
E. 10.5 ?
22
Answer:
A. 3.24 Three
B. 0.0023 Two
C. 83 400 Three
D. 1.010 Four
E. 10.5 Three
23
1. Data Collection
Significant figures & scientific notation
• It is not always clear how many figures in a number are
significant.
• Example: A time interval of 346 s can be written as 346 000 ms,

346 000 000 µs, etc. In all cases there are three significant
figures. However, if presented with 346 000 ms only, how does
the reader know that the above zeros are not significant?
o Note: It is possible to have a timing device with a resolution of

1 ms, thus making all six figures significant.
• The way to get around this confusion is to present the

numbers in scientific notation.
24
1. Data Collection
Significant figures & scientific notation
• Example: numbers expressed in scientific notation.
Number Scientific notation

12.65 1.265 × 101
0.00023 2.3 × 10−4
342.5 3.425 × 102
34 001 3.4001 × 104
• For numbers expressed in scientific notation, the number of

significant figures is equal to the number of figures that
appear to the left of the multiplication sign. 25
Give the following numbers in scientific notation to four
significant figures:
# Number Scientific notation

A. 0.005 654 2 ?
B. 125.04 ?
C. 93 842 773 ?
D. 3 400 042 ?
E. 0.000 000 100 092 ?
26
Answer:
A. 0.005 654 2 5.654 × 10−3
B. 125.04 ?
C. 93 842 773 ?
D. 3 400 042 ?
E. 0.000 000 100 092 ?
27
Answer:
A. 0.005 654 2 5.654 × 10−3
B. 125.04 1.250 × 102
C. 93 842 773 ?
D. 3 400 042 ?
E. 0.000 000 100 092 ?
28
Answer:
A. 0.005 654 2 5.654 × 10−3
B. 125.04 1.250 × 102
C. 93 842 773 9.384 × 107
D. 3 400 042 ?
E. 0.000 000 100 092 ?
29
Answer:
A. 0.005 654 2 5.654 × 10−3
B. 125.04 1.250 × 102
C. 93 842 773 9.384 × 107
D. 3 400 042 3.400 × 106
E. 0.000 000 100 092 ?
30
Answer:
A. 0.005 654 2 5.654 × 10−3
B. 125.04 1.250 × 102
C. 93 842 773 9.384 × 107
D. 3 400 042 3.400 × 106
E. 0.000 000 100 092 1.001 × 10−7
31
Give the following numbers in scientific notation to two

A. 0.005 654 2 ?
B. 125.04 ?
C. 93 842 773 ?
D. 3 400 042 ?
E. 0.000 000 100 092 ?
32
Give the following numbers in scientific notation to two
Answer:
A. 0.005 654 2 5.7 × 10−3
B. 125.04 1.3 × 102
C. 93 842 773 9.4 × 107
D. 3 400 042 3.4 × 106
E. 0.000 000 100 092 1.0 × 10−7
33
1. Data Collection
Significant figures & calculations
• If you are required to perform a calculation in which the
uncertainties in the quantities are not known, the following
rules are useful:
• Rule 1: When multiplying or dividing numbers: give the

result of the calculation to the least number of significant
figures as contained in the quantities involved.
• Example: 3.7 × 3.01 = 11.37.

• Quantity 3.7 has least number of significant figures (two).
• Give answer as 11.
34
1. Data Collection
Significant figures & calculations
• Rule 2: When adding or subtracting numbers: round the
result of the calculation to the least number of decimal
places as contained in the quantities involved.
• Example: 11.24 + 13.1 = 24.34.

• Quantity 13.1 has least number of decimal places (one).
• Give answer as 24.3.
35
Write down the results of the following calculations to an
appropriate number of significant figures:
# Calculation Answer
A. 1.2 × 8 ?
B. 13.0 × 43.23 ?
C. 0.0104 × 0.023 ?
D. 33 + 435.5 ?
E. 14.1 ÷ 76.3 ?
F. 105.55 – 34.2 ?
36
Write down the results of the following calculations to an
appropriate number of significant figures:
Answer:
# Calculation Answer
A. 1.2 × 8 10
B. 13.0 × 43.23 562
C. 0.0104 × 0.023 0.000 24
D. 33 + 435.5 469
E. 14.1 ÷ 76.3 0.185
F. 105.55 – 34.2 71.4
37
1. Data Collection
True value, accuracy and precision
• Measuring a quantity is an attempt to find an estimate of the
‘true value’ of that quantity.
• The true value can never be known with absolute precision,

but by gathering more data, we hope to get a better
estimate of the true value.
• If our estimate is close to the true value, then we say that

the measurements are accurate.
• A measurement is precise when the uncertainty on the

value is small, but this does not imply that it is close to the
true value. 38
1. Data Collection
Uncertainties in measurements
• Despite our best efforts or the quality of the equipment we
use, there is going to be an amount of variability in
quantities measured in an experiment.
• Estimates of the uncertainty in measurements should

always accompany the measurement and need to be
recorded in the laboratory notebook.
• For tabulation of data with uncertainties, it is best to write

the uncertainty in the heading of the column in the table
containing the data.
39
1. Data Collection
• Example: variation of electrical resistance with temperature
of a copper wire.
Temperature (°C) ± 0.5 °C Resistance (Ω) ± 0.5 Ω

8.0 0.208
16.5 0.213
23.5 0.222
32.0 0.229
40.5 0.232
54.5 0.243
40
1. Data Collection
• If we were to make repeated measurements of a particular
quantity, we are likely to find a variation in the observed
values.
• Although it may be possible to reduce an uncertainty by

improved experimental method or the careful use of
statistical techniques, it can never be eliminated.
• We need to be able to identify and quantify the variation,

otherwise the reliability of our experiment is likely to be
questioned, and any conclusions drawn from the experiment
may be of limited value.
41
1. Data Collection
Single measurement: resolution uncertainty
• No instrument exists that can measure a quantity to infinitely
fine resolution.
• All measurements are limited by the instrument you are

using.
• If the quantity measured is stable or varies slowly with time,

it is reasonable to quote the uncertainty as one half the
smallest division on the scale.
• The resolution limit of an instrument represents the smallest

uncertainty that can be quoted in a single measurement of a
quantity. 42
1. Data Collection
Single measurement: reading uncertainty
• It is possible that the quantity under investigation varies by
much more than half the smallest division on the instrument.
• Example: Heating a beaker containing water using

thermometer of resolution of ±1°C. As water is stirred,
thermometer indicates a wide temperature variation: 36°C,
then, 33°C, and then, 35°C.
• Quoting uncertainty of ±1°C would underestimate the

experimental uncertainty.
• We estimate uncertainty to be less than ±5°C, but greater

than ±1°C and choose a compromise between these.
43
1. Data Collection
Single measurement: reading uncertainty
• In this situation, there are no ‘hard and fast’ rules about
quoting uncertainties, and we have to rely on our common
sense.
44
1. Data Collection
Single measurement: calibration uncertainty
• The instruments that you use should have been calibrated
at some time against a standard. For the calibration to
remain valid, the instrument must be checked regularly.
• If scientists around the world are trying to compare their

measurements, they need to be sure that their instruments
‘agree’ on what is a metre, volt, second, etc.
• An uncalibrated, or poorly calibrated, instrument leads to

systematic uncertainty in data and influences all
measurements made with that instrument.
45
1. Data Collection
Repeat measurements
• To be able to get a real feel for the variability in
measurement, more than one measurement should be
made for each quantity.
• Where this is possible we can use statistical tools to allow

us to quantify experimental uncertainties.
46
1. Data Collection
The mean
• Example: Times for an object to fall 25 m
Time of fall (s) 0.64 0.61 0.63 0.53 0.59 0.65 0.60 0.61 0.64 0.71
• We could expect the time that it really took for the object to
fall to lie somewhere between two extreme measured
values, namely between 0.53 s and 0.71 s.
• If a single value for the time of fall is required, we can do no

better than to calculate the average (or mean) of ten
measurements that were made.
47
1. Data Collection
The mean
• The mean (𝑥)ҧ is calculated using the formula:
σ 𝑥𝑖
𝑥ҧ =
𝑛
• Using the data given, the mean time is 0.621 s.
• We could quote the mean to one, two or three significant

figures, that is 0.6 s, 0.62 s or 0.621. Which do we choose?
• We can answer this question only when we have an

estimate for the uncertainty in the mean value.
48
1. Data Collection
Uncertainty in the mean
• A simple method of estimating the uncertainty in the mean
of a set of data involves first calculating the range of the
data:
range = largest value – smallest value
• The uncertainty in the mean is found by dividing the range

by the number of measurements made, n:
range
Uncertainty in mean =
n
49
1. Data Collection
• Example: Speed of sound in air at 20°C
Speed (m/s) 341.5 342.4 342.2 345.5 341.1 338.5 340.3 342.7
• The mean is 341.775 m/s; the range 345.5 – 338.5 = 7 m/s.

The uncertainty is 7 ÷ 8 = 0.875 m/s.
• We might be tempted to say that the speed of sound in air is

341.775 m/s with an uncertainty of 0.875 m/s.
• Uncertainties serve to quantify the probable range in which

the value of that quantity lies.
50
1. Data Collection
• There is no point, therefore, in quoting the uncertainty to
more than one significant figure.
• If the first figure in the uncertainty is a ‘1’, it is usual to give

the uncertainty to two significant figures.
• In the present example we would round 0.875 m/s up to 0.9

m/s.
• We now further round the mean to the same number of

decimal places as the uncertainty, i.e. 341.775 m/s
becomes 341.8 m/s.
51
1. Data Collection
• To summarise, there are four steps in quoting the value of
the quantity:
1. Calculate the mean of the measured values.

2. Calculate the uncertainty in the quantity, making clear
the method used. Round the uncertainty to one
significant figure (or two if the first figure is a ‘1’).
3. Quote the mean and uncertainty to the appropriate
number of figures.
4. State the units of the quantity.
52
1. Data Collection
• When an uncertainty in an experimental value is quoted, we
are not saying that the actual or true value of the quantity
must lie between the limits given by (mean + uncertainty) to
(mean – uncertainty).
• The probability is high that it will lie between these limits,

and it actually is possible to quantify that probability.
• The uncertainty that is expressed in the same units as the

quantity being measured is referred to as the absolute
uncertainty in the quantity.
53
1. Data Collection
Fractional and percentage uncertainty
• In some cases you may be required to state the ratio
uncertainty in quantity
quantity
• This ratio is referred to as the fractional uncertainty in the

quantity.
• The percentage uncertainty is found by multiplying the

fractional uncertainty by 100%.
• Fractional or percentage uncertainty are normally quoted to

no more than one significant figure.
54
1. Data Collection
Systematic and random uncertainties
• There are two broad categories of uncertainties that can
occur in an experiment:
1. Systematic uncertainties
2. Random uncertainties
• There are two types of systematic uncertainty which can

exist with measuring instruments:
1. Offset uncertainty
2. Gain uncertainty
55
1. Data Collection
Offset uncertainty
• Example: Melting point of water using thermocouple
Temp. (°C) -7.5 -7.3 -6.9 -7.4 -7.7 -7.6 -7.6 -7.3 -7.6
• The mean is -7.43°C and the uncertainty 0.08°C.
• Clearly there is something wrong here: the melting point of

water should be very close to 0.0°C
• For whatever reason, all measurements are too low by

about 7.5°C.
• We have just exposed an offset uncertainty in our system.

56
1. Data Collection
Gain uncertainty
• The offset uncertainty remains fixed irrespective of the
magnitude of the quantity being measured.
• In contrast, the gain uncertainty is dependent on the

magnitude of the quantity.
• Example: Five calibration mass pieces are placed on a

balance and readings were taken.
Mass piece (g) 20.00 40.00 60.00 80.00 100.00
Reading (g) 20.26 40.65 60.98 81.20 101.52
Difference (g) 0.26 0.65 0.98 1.20 1.52
57
1. Data Collection
Gain uncertainty
• As the mass of the piece increases, so the difference
between the measured and calibrated mass increases.
• The difference increases in direct proportion to the

magnitude of the mass piece located on the balance.
• This establishes the relationship between the calibrated

mass and the measured mass for this particular weighing
balance.
• Future measurements of the mass using this balance can

then be corrected for the gain uncertainty.
58
1. Data Collection
Random uncertainties
• Random uncertainties produce scatter in observed values.
• The cause could be environmental factors such as:

• Electrical interference affecting voltage/current sensitive
measurements.
• Vibrations affecting measurements with sensitive
electronic balance.
• Power supply fluctuations affecting optical
measurements.
• We can use statistical techniques to estimate random

uncertainties and calculate the effect of combining
uncertainties. 59
1. Data Collection
Random uncertainties
• Statistics is the science of assembling, organising and
interpreting numerical data.
• The statistical approach is valid when we have made

sufficient measurements (say in excess of five) to
satisfactorily describe the spread in data.
60
1. Data Collection
Standard deviation (SD)
• If 𝑥𝑖 represents an 𝑖th data value in a set of 𝑛 repeated
measurements, and 𝑥ҧ the mean of the data values, then the
standard deviation, 𝜎 is given by
σ(𝑥𝑖 − 𝑥)ҧ 2
𝜎=
𝑛
• A pocket calculator, or computer software package, with

built-in statistical functions can be very helpful, especially
when there are many numbers to process.
61
1. Data Collection
• Example: Time for a body to slide down a plane.
Time (s) 0.64 0.64 0.59 0.58 0.70 0.61 0.68 0.55 0.57 0.63
• For these 10 measurements, 𝝈 = 0.04571 s. When 50

measurements were made, 𝝈 = 0.04364 s.
• The SD of a set of repeat measurements of a quantity

remains almost constant, regardless of how many
measurements are made.
• By making repeat measurements we are trying to get the

best estimate for the quantity and its uncertainty.
62
1. Data Collection
• Should the SD be taken as the uncertainty in the mean? If
so, is there any point in increasing repeat measurements?
• The SD is characteristic of the spread of the whole data set

and should not be taken as the uncertainty in the mean.
• The standard deviation of mean (𝝈𝒙ഥ ) is the proper

estimate for the uncertainty in the mean.
• It is proven mathematically that

𝜎
𝜎𝑥ҧ =
𝑛
63
1. Data Collection
Standard deviation of mean (SDOM)
• Example: Volume of water from fluid-flow experiment
Vol. (mL) 33 45 43 42 45 42 41 44 40 42
• Mean is 41.7 mL, with SD of 3.29 mL.
• The experiment was performed eight times, with each one

consisting of 10 repeat measurements.
Mean (mL) 41.0 41.7 40.4 41.5 41.7 40.4 42.5 39.5
SD (mL) 3.13 3.29 3.07 3.11 3.20 2.94 2.73 3.20
• Mean of the means is 41.1 mL, with SDOM of 0.893 mL.

64
1. Data Collection
Standard deviation of mean (SDOM)
• We see now that it is worthwhile to make many repeat
measurements if we want to reduce the uncertainty in the
mean.
• In summary:
• The best estimate is the mean of repeat measurements.
• The SD is a measure of the spread of the measurements

as insensitive to how many measurements are made.
• The SDOM is the uncertainty in the mean and this does

decrease as the number of measurements increases.
65
1. Data Collection
Population and sample
• Although we want to reduce uncertainties in the data we
collect during experiments, we are not able to make an
infinite number of repeat measurements of a quantity.
• The totality of measurements that could be made is called

the population.
• We are only able to make a few repeat measurements

which can be regarded as the sample of all possible
measurements and use these to estimate the population
mean and SD.
66
1. Data Collection
Population and sample
• From this perspective, the SD is the estimate of the
population SD and is calculated using
σ(𝑥𝑖 − 𝑥)ҧ 2
𝜎=
𝑛−1
• This version of the SD equation is preferred in calculations.
• So long as the number of repeat measurements is greater

than 3, both versions of SD equations will usually return the
same number to one significant figure.
67
1. Data Collection
Combining uncertainties
• An experiment may require the determination of several
quantities which are later to be inserted into an equation.
• Example: calculate the density, ρ, of a body of mass, m, of

a body and volume, V. How do uncertainties in m and V
combine to give the uncertainty in ρ?
• We can apply differential calculus to determine this.
• The combination of uncertainties is called the propagation

of uncertainties, or error propagation.
68
1. Data Collection
• Example: Consider a function 𝑉 = 𝑉(𝑎, 𝑏), where 𝑎 and 𝑏
have uncertainties ∆𝑎 and ∆𝑏, respectively.
• To find the uncertainty in V, we compute
𝜕𝑉 𝜕𝑉
∆𝑉 = ∆𝑎 + ∆𝑏
𝜕𝑎 𝜕𝑏
• The brackets around partial derivatives mean that we ignore

any minus sign that may occur after differentiation.
• This avoids a cancellation of terms that could occur.

69
1. Data Collection
• The previous method is satisfactory, but tends to
overestimate the uncertainty in the calculated quantity.
• It is possible for uncertainties ∆𝑎 and ∆𝑏 to partially cancel

out in situations where they are independent of each other.
• Taking the SDOM as the uncertainty in the mean of the

measured values of 𝑉 = 𝑉(𝑎, 𝑏), the propagation becomes
2 2
𝜕𝑉 𝜕𝑉
𝜎𝑉ഥ = 2
𝜎𝑎ത + 𝜎𝑏ത2
𝜕𝑎 𝜕𝑏
70
Overview
When data are presented pictorially, trends can be

detected that we would be unlikely to recognise if the data
were given only in tabular form.
71
x-y graphs
• Pictorial representation of data in a graph is a good way to
summarise many important features of the experiment.
• A graph can indicate:

• the range of measurements made
• the uncertainty in each measurement
• the existence or absence of a trend in the data gathered.
• which data points do not follow the general trend
exhibited by the majority of data.
• An x-y graph possesses horizontal and vertical axes termed

the x- and y-axis, respectively.
72
x-y graphs
• INDEPENDENT VARIABLE: The quantity which is
controlled or deliberately varied during an experiment and is
plotted as the x-coordinate.
• DEPENDENT VARIABLE: The quantity that varies in

response to changes in the independent variable and is
plotted as the y-coordinate.
• TITLE: indicates the relationship being investigated. If it is

stated that quantity ‘A’ is plotted versus or against quantity
‘B’, then quantity ‘A’ is plotted on the y-axis and quantity ‘B’
on the x-axis.
73
x-y graphs
• LABELS & UNITS: indicate names of the quantities under
study and their units of measurement (usually in brackets).
• ORIGINS: There is no rule to say that we must include the

origin on a graph. To do so may cause important information
to be concealed.
74
Linear x-y graphs
• Linear graphs have an important place in the analysis of
experimental data for the following reasons:
• The gradient and y-intercept can be calculated.

• Departure from linearity can be observed.
• Outliers can be identified.
• Predictability of x- or y-quantity for a chosen y- or x-
quantity.
• If we are satisfied that a linear relationship exists between

the x- and y-quantities, it is useful to be able to write down
an equation that represents that relationship.
75
Linear x-y graphs
• An equation representing the relationship between x and y
quantities can be found by first plotting the data on an x-y
graph followed by drawing the ‘best’ line through the points
with a plastic ruler.
• The gradient and intercept of this line can then be

calculated.
• Although positioning a line ‘by eye’ through the data points

gives reasonable estimates of m and c, there are some
difficulties with this method:
76
Linear x-y graphs
i. No two people draw the same ‘best’ line through a
given data set.
ii. If the uncertainty in each data point is different, how do
we take this into account when drawing the best line?
iii. Drawing the best line is difficult for largely-scattered
data.
iv. Finding the uncertainties in m and c is cumbersome.
• In order to avoid the guesswork involved in finding the best

line by eye, we use the method of ‘least squares’ (a.k.a.
linear regression).
77
Least squares method
i. We assume that any random uncertainty in data values is
confined to measurements made of the y-quantity.
ii. We assume that the uncertainty in each measurement of

the y-quantity is the same. This is the unweighted least
squares fit. (The other would be the weighted).
• The following diagram shows part of an x-y graph with a line

passing close to the data points:
78
Observed
experimentally
Calculated:
𝑦𝑖𝑐 = 𝑚𝑥𝑖 + 𝑐
79
• For a particular value of x, labelled 𝑥𝑖 , there are observed
(𝑦𝑖𝑜 ) and calculated (𝑦𝑖𝑐 ) values of y.
• ∆𝑦𝑖 is the difference between the observed and calculated

y-value and is called the residual:
∆𝑦𝑖 = 𝑦𝑖𝑜 − 𝑦𝑖𝑐
• The best position for the line, and therefore the best values
for m and c, is found by minimising the sum of the square of
the residuals.
80
• Writing SS for sum of squares, we say:
𝑆𝑆 = (∆𝑦1 )2 +(∆𝑦2 )2 +(∆𝑦3 )2 + ⋯ + (∆𝑦𝑛 )2
= σ𝑛1 (∆𝑦𝑖 )2
• Replacing ∆𝑦𝑖 by 𝑦𝑖𝑜 − 𝑦𝑖𝑐 and 𝑦𝑖𝑐 by 𝑚𝑥𝑖 + 𝑐 we can write
𝑆𝑆 = ෍[𝑦𝑖𝑜 − 𝑚𝑥𝑖 + 𝑐 ]2
81
• We seek values of m and c that reduce SS to the smallest
possible value. Those are the best values for gradient and
intercept.
• By partially differentiating the above equation with respect to

m and c, equating each result to zero, we get:
෍ 𝑥𝑖 (𝑦𝑖𝑜 −𝑚𝑥𝑖 − 𝑐) = 0
and:
෍(𝑦𝑖𝑜 −𝑚𝑥𝑖 − 𝑐) = 0
82
• The above equations can be expanded and combined to
give the following equations for m and c:
𝑛 σ 𝑥𝑖 𝑦𝑖 − σ 𝑥𝑖 σ 𝑦𝑖
𝑚= 2
𝑛 σ 𝑥𝑖 2 − (σ 𝑥𝑖 )
and:
σ 𝑥𝑖 2 σ 𝑦𝑖 − σ 𝑥𝑖 σ 𝑥𝑖 𝑦𝑖
𝑐= 2
2
𝑛 σ 𝑥𝑖 − (σ 𝑥𝑖 )
83
• The ‘o’ from the subscript of y has been omitted from the
observed values of y that appear in the equations.
• It is not possible to decide how many figures m and c should

be quoted to until the uncertainties in m and c (written as 𝜎𝑚
and 𝜎𝑐 ) have been calculated.
• In order to calculate 𝜎𝑚 and 𝜎𝑐 we assume the following:

i. For each value of x, the corresponding value of y has
some uncertainty.
ii. The uncertainty in each value of y contributes
something to the uncertainties in m and c.
84
• After going through a number of mathematical steps, the
explicit equations for 𝜎𝑚 and 𝜎𝑐 are quoted as follows:
1
𝜎𝑛2
𝜎𝑚 = 1
2 2
𝑛 σ 𝑥𝑖 2 − (σ 𝑥𝑖 )
and:
1
𝜎 σ 𝑥𝑖 2 2
𝜎𝑐 = 1
2 2
𝑛 σ 𝑥𝑖 2 − (σ 𝑥𝑖 )
85
• The 𝜎 is the uncertainty in each y-value of the data point.
• It is usual, when fitting a line to data in which the uncertainty

in each point is constant, to take this uncertainty to be the
standard deviation of the distribution of the y-values about
the fitted line. This is given by:
1
2
1
𝜎= ෍(𝑦𝑖 −𝑚𝑥𝑖 − 𝑐)2
𝑛−2
86
Weighting the fit
• ‘Weighted’ least squares fitting is used for situations in
which the uncertainties in the y-values vary from point to
point.
• The sum of squares is weighted so that, when fitting takes

place, the calculated line lies closest to those points that are
known to the greatest precision.
• Each value of uncertainty (written as 𝜎𝑖 ) must be used in the

calculations of m, 𝜎𝑚 , c, and 𝜎𝑐 .
87
Weighting the fit
• Let
2 2
1 𝑥𝑖 𝑥𝑖
∆= ෍ 2 ෍ 2 − ෍ 2
𝜎𝑖 𝜎𝑖 𝜎𝑖
• The equations for m and c are:
1 𝑥𝑖 𝑦𝑖 𝑥𝑖 𝑦𝑖
σ 2 σ 2 −σ 2 σ 2
𝜎𝑖 𝜎𝑖 𝜎𝑖 𝜎𝑖
𝑚=
∆
88
Weighting the fit
𝑥2 𝑦𝑖 𝑥𝑖 𝑥𝑖 𝑦𝑖
σ 2 σ 2 −σ 2 σ 2
𝜎𝑖 𝜎𝑖 𝜎𝑖 𝜎𝑖
𝑐=
∆
• The equations for 𝜎𝑚 and 𝜎𝑐 are:
1
1 2
σ 2
𝜎𝑖
𝜎𝑚 =
∆
89
Weighting the fit
1
𝑥𝑖 2 2
σ 2
𝜎𝑖
𝜎𝑐 =
∆
• With the obvious good amount of work required in applying
the foregoing equations, it can be of great assistance to use
a computer spreadsheet, e.g. Microsoft Excel.
90
Histograms
• A histogram (or bar chart) is useful for displaying data
from repeat measurements.
• The range of the data is divided into a number of equal

intervals and the number of values that fall into each interval
is plotted vertically with the intervals plotted horizontally.
• Histogram intervals are sometimes referred to as bins (or

channels), and the number of values that fall into each bin
is referred to as the frequency (or counts).
• We can also plot frequency/N versus interval, where N is

the total number of measurements. 91
Histograms
• Step-by-step approach to plotting a histogram:
1. Find the range, R, of the data (max value – min value)
2. Count total number of values in data set, N
3. Calculate 𝑁 and round to the nearest whole number.

This determines the number of intervals
4. Divide R by the number calculated in step 3 to give the

width of each interval
5. Draw up table showing all the intervals covering the

range and the number of data values in each interval
Histograms
• Example: 50 readings of time for body to slide down incline.
Time (s)
0.61 0.60 0.53 0.64 0.58
0.60 0.69 0.56 0.68 0.57
0.61 0.60 0.58 0.64 0.59
0.61 0.65 0.58 0.68 0.61
0.68 0.59 0.55 0.62 0.59
0.58 0.60 0.64 0.59 0.70
0.51 0.66 0.61 0.55 0.63
0.64 0.61 0.62 0.53 0.63
0.63 0.64 0.61 0.62 0.60
0.59 0.53 0.59 0.54 0.55
Histograms
• Data re-arranged: from smallest to largest.
Time (s)
0.51 0.58 0.60 0.61 0.64
0.53 0.58 0.60 0.61 0.64
0.53 0.58 0.60 0.62 0.64
0.53 0.58 0.60 0.62 0.65
0.54 0.59 0.60 0.62 0.66
0.55 0.59 0.61 0.63 0.68
0.55 0.59 0.61 0.63 0.68
0.55 0.59 0.61 0.63 0.68
0.56 0.59 0.61 0.64 0.69
0.57 0.59 0.61 0.64 0.70
Histograms
• Step 1: Find the range, R, of the data
o Range = 0.70 - 0.51 = 0.19
• Step 2: Count total number of values in data set, N
o N = 50
• Step 3: Calculate 𝑁 to determine the number of intervals
o 𝑁 = 50 = 7.1 = 7
95
Histograms
• Step 4: Divide R by the number calculated in step 3 to give
the width of each interval
o Width = R/7 = 0.19/7 = 0.03
• Step 5: Draw up table showing all the intervals covering the

range and the number of data values in each interval
96
Histograms
Interval number Interval range (s) Frequency
1 0.51 – 0.53 4
2 0.54 – 0.56 5
3 0.57 – 0.59 11
4 0.60 – 0.62 15
5 0.63 – 0.65 9
6 0.66 – 0.68 4
7 0.69 – 0.71 2
Histograms
50 readings of time for body to slide down incline
16
14
12
10
Frequency
2 4
0
1 2 3 4 5 6 7
98
Intervals of time (s)
Histograms
16
14
12
10
Frequency
2 5
4
0
1 2 3 4 5 6 7
99
Histograms
16
14
12
10
Frequency
6
11
4
2 5
4
0
1 2 3 4 5 6 7
100
Histograms
16
14
12
10
Frequency
8
15
6
11
4 9
2 5
4 4
2
0
1 2 3 4 5 6 7
101
Histograms
• The histogram displays the following features:
1. ‘Interval #4’ (0.60 to 0.62) has the largest frequency. It

includes the mean (0.60 s). It indicates that a large
proportion of values lie close to the mean.
2. The distribution of values is approximately symmetrical

about Interval #4 containing the mean.
3. There are few values that lie far from the mean.
• These features show up in a wide variety of experimental

data. Such features correspond to Gaussian distribution.
102
Histograms
• Example: ~78,000 readings of gamma-ray energies.
DATA
Histograms
• Example of data presented as histogram.
137Cs Spectrum
800
600
Counts
400
200
0
542
576
46
79
112
145
178
211
245
278
311
344
377
410
443
476
509
609
642
675
708
741
774
807
104
Energy (keV)
Distributions
• In most experiments, as one increases the number of
measurements, the histogram takes on some definite,
continuous curve.
• The curve is called the limiting distribution.
• The limiting distribution is a theoretical construct, which can

never itself be measured exactly, unless we make infinitely
many measurements and use infinitesimally narrow bins.
• There is evidence that almost all measurements have a

limiting distribution.
105
Distributions
Repeated measurements of time of motion of a body
16
14
12
10
Frequency
8
6
4
2
0
1 2 3 4 5 6 7
Distributions
• A limiting distribution defines some function f(x).
• The fraction of measurements that fall in any small interval x

to (x + dx) equals the area f(x)dx.
• More generally, the number of measurements that fall

between x = a and x = b is the total area
𝑏
න 𝑓 𝑥 𝑑𝑥
𝑎
• This gives the probability that any measurement will fall

between x = a and x = b.
107
Distributions
• If we knew the distribution f(x), then we would know the
probability of obtaining an answer in any interval a ≤ x ≤ b.
• The total probability of obtaining a measurement between

− ∞ and ∞ is one. Therefore
∞
න 𝑓 𝑥 𝑑𝑥 = 1
−∞
• If we knew the limiting distribution f(x), we would also

calculate the mean 𝑥ҧ for the measurements.
108
Distributions
• The mean of any number is the sum of all different values,
𝑥𝑖 , each weighted by the fraction of times it is obtained,
𝑥ҧ = ෍ 𝑥𝑖 𝐹𝑖
𝑖
• For distribution f(x), 𝐹𝑖 = 𝑓 𝑥 𝑑𝑥 thus

∞
𝑥ҧ = න 𝑥𝑓 𝑥 𝑑𝑥
−∞
109
Distributions
• We can also calculate the standard deviation, 𝜎𝑥 , for the
measurements
∞
2
𝜎𝑥 = න (𝑥 − 𝑥)ҧ 2 𝑓 𝑥 𝑑𝑥
−∞
• Not all limiting distributions (e.g. binomial and Poisson

distributions) have a symmetric bell shape characteristic of
the Gaussian (or normal) distribution.
• Nevertheless, many measurements have a symmetric bell-

shaped curve for their limiting distribution.
110
The Normal Distribution
• If a measurement has many small sources of random error
and negligible systematic error, then the measured values
will be distributed on a bell-shaped curve.
• This curve will be centered on the true value of x (denoted

by X).
• The mathematical function that describes the bell-shaped

curve is called the normal (also Gaussian) distribution:
−(𝑥−𝑋) 2 /2𝜎 2
𝑓 𝑥 = 𝑒
111
• The 𝜎 is called the width parameter.
∞
• To satisfy ‫׬‬−∞ 𝑓 𝑥 𝑑𝑥 = 1, this function becomes:
1 −(𝑥−𝑋) 2 /2𝜎 2
𝑓 𝑥 = 𝑒
𝜎 2𝜋
• It follows that, if the limiting distribution is the Gaussian

distribution centered on the true value X, then, after many,
many trials:
𝑥ҧ = 𝑋 and 𝜎𝑥 2 = 𝜎 2
112
• In other words, if we make a large (but finite) number of
trials, then our average, 𝑥,ҧ will be close to X.
• Also, the width parameter, 𝜎, of the Gaussian function is just

the standard deviation that we would obtain after making
many measurements.
• For normally distributed results, almost 70% of the total area

under the whole curve lies between ±𝜎, that is, between 𝑥ҧ −
𝜎 and 𝑥ҧ + 𝜎.
• About 95% of data lie between 𝑥ҧ − 2𝜎 and 𝑥ҧ + 2𝜎.

113
• Table summarises confidence limits and their associated
probability.
Probability that true value lies

Confidence limits
between these limits (%)
𝑥ҧ − 𝜎𝑥ҧ to 𝑥ҧ + 𝜎𝑥ҧ 68.3
𝑥ҧ − 2𝜎𝑥ҧ to 𝑥ҧ + 2𝜎𝑥ҧ 95.4

Chi-squared Test
• Limiting distributions are functions that describe the
expected distribution of results if an experiment is repeated
many times.
• How can we decide whether our observed distribution of

results is consistent with the expected theoretical
distribution?
• The chi-squared (𝝌𝟐 ) test is the procedure that we use to

answer this question.
• In general, 𝜒 2 is a sum of squares with the form:

115
Chi-squared Test
𝑛 2
2
observed value − expected value
𝜒 =෍
standard deviation
1
• 𝜒 2 is an indicator of the agreement between the observed

and expected value of some variable.
• If the agreement is good, 𝜒 2 will be of order n; and if it is

bad, 𝜒 2 will be much greater than n.
• We can only use 𝜒 2 to test this agreement if we know the

expected values and the standard deviation.
116
Chi-squared Test
• Consider an experiment to measure a number 𝑥 with a
certain expected distribution of results.
• We repeat the measurement N times and, having divided

the range of possible results 𝑥 into 𝑛 bins, 𝑘 = 1,…, 𝑛, we
count the number 𝑂𝑘 of observations that fall in each bin 𝑘.
• The expected number 𝐸𝑘 is determined by the assumed

distribution, and the standard deviation is 𝐸𝑘 .
𝑛
2
2
𝑂𝑘 − 𝐸𝑘
𝜒 =෍
𝐸𝑘
𝑘=1
117
Chi-squared Test
• If our hypothesis that our measurements conform to a
particular distribution is correct, then we would expect that
the deviations (𝑶𝒌 −𝑬𝒌 ) would be small.
• Conversely, if the deviations (𝑂𝑘 −𝐸𝑘 ) prove to be large,

then we would suspect that our hypothesis is incorrect.
• We need to decide how large we would expect (𝑂𝑘 −𝐸𝑘 ) to

be if the measurements really are distributed as expected.
• If 𝝌𝟐 = 0, then the agreement between the observed and the

expected distributions is perfect.
118
Chi-squared Test
• In general, the individual terms in the 𝜒 2 equation are
expected to be of order 1, and there are 𝑛 terms in the sum.
• Thus if 𝜒 2 ≲ 𝑛, the observed and expected distributions

agree about as well as could be expected.
• But if 𝜒 2 ≫ 𝑛, we can suspect that our measurements were

not governed by the expected distribution.
• It is better to compare 𝜒 2 , not with the number of bins 𝑛, but

with the number of degrees of freedom (𝒅) instead.
119
Reduced 𝝌𝟐
• The number 𝒅 in a statistical calculation is the number of
observed data minus the number of parameters computed
from the data and used in the calculation.
• Therefore,
𝑑 =𝑛−𝑐
where 𝑛 is the number of bins and 𝑐 is the number of

parameters that had to be calculated from the data.
• The number 𝑐 is often called the number of constraints.

120
Reduced 𝝌𝟐
• We can now make our 𝜒 2 test more precise. It can be
shown that the expected value of 𝜒 2 is precisely 𝑑,
(expected average value of 𝜒 2 ) = 𝑑
• We can now use a reduced chi-squared, which we denote

by 𝜒෤ 2 and define as
𝜒෤ 2 = 𝜒 2 /𝑑
• And since the expected value of 𝜒 2 is 𝑑, we see that the
(expected average value of 𝜒෤ 2 ) = 1

121
Reduced 𝝌𝟐
• If we obtain a value of 𝜒෤ 2 of order 1 or less, then we have
no reason to doubt our expected distribution.
• If we obtain a value of 𝜒෤ 2 much larger than 1, then it is

unlikely that our expected distribution is correct.
• We now need a quantitative measure of agreement.
• We need some guidance where to draw the boundary

between agreement and disagreement.
122
Probabilities for 𝝌𝟐
• We can calculate the probability of obtaining a value of
𝜒෤ 2 as large as, or larger than, our observed value 𝜒෤𝑜 2 (where
the subscript o stands for “obtained”).
• We compute the probability
𝑃(𝜒෤ 2 ≥ 𝜒෤𝑜 2 )
of finding a value of 𝜒෤ 2 greater than or equal to the value of

𝜒෤𝑜 2 actually obtained.
123
• If the probability is high, then our value 𝜒෤𝑜 2 is perfectly
acceptable, and there is no reason to reject our expected
distribution.
• If this probability is unreasonably low, then a value of 𝜒෤ 2 as

large as our observed 𝜒෤𝑜 2 is very unlikely, and it is unlikely
that our expected distribution is correct.
• We have to decide on the boundary between what is

“reasonably probable” and what is not.
124
• With the boundary at 5 percent, we would say that our
observed value 𝜒෤𝑜 2 indicates a “significant disagreement” if
𝑃 𝜒෤ 2 ≥ 𝜒෤𝑜 2 < 5%
• We would then reject our expected distribution at the “5

percent significance level”.
• If at 1 percent, then we could say that the disagreement is

“highly significant” if 𝑃 𝜒෤ 2 ≥ 𝜒෤𝑜 2 < 1% and reject the
expected distribution at the “1 percent significance level.”
125
• The probabilities 𝑃 𝜒෤ 2 ≥ 𝜒෤𝑜 2 are calculated from the
integral
∞
2 2 2 𝑑−1 −𝑥 2 /2
𝑃𝑑 (𝜒෤ ≥ 𝜒෤0 ) = 𝑑/2 න 𝑥 𝑒 𝑑𝑥
2 𝛤(𝑑/2) 𝜒𝑜
and the results are tabulated.
126
The percentage 𝑃 𝜒෤ 2 ≥ 𝜒෤𝑜 2 of obtaining a value 𝜒෤ 2 greater than or equal to 𝜒෤𝑜 2 .
෥𝟎𝟐
𝝌
d
0 0.25 0.5 0.75 1.0 1.25 1.5 1.75 2.0 3.0 4.0 5.0 6.0
1 100 62 48 39 32 26 22 19 16 8.3 4.6 2.5 1.4

2 100 78 61 47 37 29 22 17 14 5.0 1.8 0.7 0.2
3 100 86 68 52 39 29 21 15 11 2.9 0.7 0.2 -
5 100 94 78 59 42 28 19 12 7.5 1.0 0.1 - -

10 100 99 89 68 44 25 13 6 2.9 0.1 - - -
15 100 100 94 73 45 23 10 4 1.2 - - - 127 -
• In the table, the numbers in the left column give choices of
𝑑, the number of degrees of freedom, and those at the other
column headings give possible values of 𝜒෤𝑜 2 .
• Each cell in the table shows the percentage probability

𝑃 𝜒෤ 2 ≥ 𝜒෤𝑜 2 as a function of 𝑑 and 𝜒෤𝑜 2 .
• Example: with 10 degrees of freedom (𝑑 = 10), the

probability of obtaining 𝜒෤ 2 ≥ 2 is 2.9 percent.
• Thus, if we obtained 𝜒෤𝑜 2 equal to 2 in an experiment with

𝑑 = 10, we could reject the distribution at the 5 % level.
128
• The probabilities in the second column of the table are all
100 percent, since one is always certain to get 𝜒෤ 2 ≥ 0.
• As 𝜒෤ 2 increases, the probability of getting 𝜒෤ 2 ≥ 𝜒෤𝑜 2

diminishes, but it does so at a rate that depends on 𝑑.
• Thus, for 𝑑 = 2, the probability of obtaining 𝜒෤ 2 ≥ 1 is 37

percent; whereas for 𝑑 = 15, it is 45 percent.
• We are now able (using the table) to assign quantitative

significance to the value of 𝜒෤𝑜 2 obtained in any particular
experiment.
129
Example
• 40 measurements of range 𝑥 of a projectile fired from gun.
𝒙 (cm)
731 771 722 653 733
739 709 760 672 766
678 689 805 764 709
698 754 725 738 787
772 681 688 757 742
780 676 748 687 645
748 810 778 753 675
770 830 710 638 712 130
Example
• Rearranged: smallest to largest.
𝒙 (cm)
638 687 722 748 771
645 688 725 753 772
653 689 731 754 778
672 698 733 757 780
675 709 738 760 787
676 709 739 764 805
678 710 742 766 810
681 712 748 770 830 131
Example
• Suppose we have reason to believe these measurements
are governed by a Gaussian distribution 𝑓𝑋,𝜎 𝑥 .
• We use our 40 measurements to compute best estimates

for the center 𝑋 and width 𝜎 of the expected distribution.
40
𝑥𝑖
best estimate of 𝑋 = 𝑥ҧ = ෍ = 𝟕𝟑𝟎. 𝟏 𝐜𝐦
40
𝑖=1
σ 𝑥𝑖 − 𝑥ҧ 2
best estimate of 𝜎 = = 𝟒𝟔. 𝟖 𝐜𝐦
39
132
Example
• Suppose we have reason to believe these measurements
are governed by a Gaussian distribution 𝑓𝑋,𝜎 𝑥 .
• We use our 40 measurements to compute best estimates

for the center 𝑋 and width 𝜎 of the expected distribution.
40
𝑥𝑖
best estimate of 𝑋 = 𝑥ҧ = ෍ = 𝟕𝟑𝟎. 𝟏 𝐜𝐦
40
𝑖=1
σ 𝑥𝑖 − 𝑥ҧ 2
best estimate of 𝜎 = = 𝟒𝟔. 𝟖 𝐜𝐦
39
133
Example
• We divide the range of possible 𝑥 values into bins, with bin
boundaries at 𝑋 – 𝜎, 𝑋, and 𝑋 + 𝜎.
Observations
Bin # Interval (cm)
Ok in bin
1 x<X–σ x < 683.3 8
2 X–σ<x<X 683.3 < x < 730.1 10
3 X<x<X+σ 730.1 < x < 776.9 16
4 X+σ<x 776.9 < x 6
• Assuming that our measurements are distributed normally,

we can calculate the expected number 𝐸𝑘 of measurements
in each bin 𝑘. 134
Example
• The probability that any one measurement fall in an interval
𝑎 < 𝑥 < 𝑏 is the area under the Gaussian distribution
function between 𝑥 = 𝑎 and 𝑥 = 𝑏.
Expected Observed
Bin # Interval (cm) Probability Pk
Ek = NPk Ok
1 x<X–σ 16 % 6.4 8
2 X–σ<x<X 34 % 13.6 10
3 X<x<X+σ 34 % 13.6 16
4 X+σ<x 16 % 6.4 6
135
Example
• We now calculate
𝑛
(𝑂 − 𝐸 )2
𝑘 𝑘
𝜒2 = ෍
𝐸𝑘
𝑘=1
(1.6)2 (−3.6)2 (2.4)2 (−0.4)2

= + + +
6.4 13.6 13.6 6.4
= 1.80
• We further need to calculate reduced chi squared
𝜒෤ 2 = 𝜒 2 /𝑑 136
Example
• Here there were three constraints and hence only one
degree of freedom,
𝑑 =𝑛−𝑐 =4−3=1
• The first constraint is the number of observations N given by

𝑁 = σ𝑛𝑘=1 𝑂𝑘 .
• The other two constraints were the parameters 𝑋 and 𝜎,

estimated in order to calculate the expected numbers 𝐸𝑘 .
• In the examples considered here, there will always be at

least one constraint (𝑁 = σ 𝑂𝑘 ). 137
Example
• Therefore
𝜒 2 1.80
2
𝜒෤ = = = 1.80
𝑑 1
• Question: is a value of 𝜒෤ 2 = 1.80 sufficiently larger than 1 to

rule out our expected Gaussian distribution or not?
• The probability turns out to be 𝑃(𝜒෤ 2 ≥ 1.80) ≈ 18%.
• We have no reason to reject our expected distribution.
138
Summary: Chi-squared Test
• If we make 𝑛 measurements for which we know, or can
calculate, the expected values and the standard deviations,
then we define 𝜒 2 as
𝑛 2
observed value − expected value
𝜒2 =෍
standard deviation
1
• The 𝑛 measurements are the numbers, 𝑂1 ,…, 𝑂𝑛 , of times

that the value of some quantity 𝑥 was observed in each of 𝑛
bins.
139
• The expected number 𝐸𝑘 is determined by the assumed
distribution of 𝑥, and the standard deviation is 𝐸𝑘 .
𝑛
𝑂𝑘 − 𝐸𝑘 2
2
𝜒 =෍
𝐸𝑘
𝑘=1
• If the assumed distribution of 𝑥 is correct, the 𝜒 2 should be

of order 𝑛.
• If 𝜒 2 ≫ 𝑛, the assumed distribution is probably incorrect.
140
• If we were to repeat the whole experiment many times, the
mean value of 𝜒 2 should be equal to 𝑑, the number of
degrees of freedom, defined as
𝑑 =𝑛−𝑐
• Where 𝑐 is the number of parameters that had to be

calculated from the data to compute 𝜒 2 .
• The reduced 𝜒 2 is defined as
𝜒෤ 2 = 𝜒 2 /𝑑
141
• If assumed distribution is correct, 𝜒෤ 2 should be of order 1.
• If 𝜒෤ 2 ≫ 1, the data do not fit the assumed distribution

satisfactorily.
• Suppose we obtain the value 𝜒෤𝑜 2 in an experiment. If 𝜒෤𝑜 2 is

appreciably greater than one, we have reason to doubt our
assumed distribution.
• From the table, we can find the probability 𝑃 𝜒෤ 2 ≥ 𝜒෤𝑜 2 of

getting a value 𝜒෤ 2 as large as 𝜒෤𝑜 2 , assuming the expected
distribution is correct.
142
• If this probability is small, we have reason to reject the
expected distribution.
• If it is less than 5%, we would reject the assumed

distribution at the 5% (“significant”) level.
• If it is less than 1%, we would reject the assumed

distribution at the 1% (“highly significant”) level.
143
Overview
• These deal with the interpretation of the results that

have been presented. Moving from “chaos to concept.”
• The question is: what can be usefully said with the data
that were gathered?
144
• An experiment is likely to contain many details, both major
and minor.
• The discussion of results must focus on the important

points; spare the reader any mass of unnecessary detail.
• Where shortcomings have been identified in the

experimental method, these should be discussed.
• If the data from the experiment do not lend strong support to

the particular idea or hypothesis at the core of the
experiment, then this should be acknowledged.
145
• Even if the experimental method used could have been
improved, we should be not be too dismissive of data that
were obtained in an experiment.
• The question is: what can be usefully said with the data
that were gathered, despite the shortcomings?
• Here we must also refer back to the purpose of the

experiment.
• What was the aim of the experiment, and how far did the
experiments performed go in achieving that aim?
146
• If others have undertaken a similar investigation, then it is

usual to include a comparison of findings, giving reference
to the other work.
• For a known value of a quantity, a comparison of the values

should be given along with a reference to the source of the
information.
147

Data Analysis Honours 2019

Hochgeladen von

Dokumentinformationen

Copyright

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Data Analysis Honours 2019

Hochgeladen von

Copyright:

OUTLINE

# Day Date Topic/Activity

1 Thu 7 Feb Introduction & exercises

2 Fri 8 Feb Least squares fitting

3 Thu 14 Feb Histograms

4 Fri 15 Feb Distributions

5 Fri 22 Feb Class test

• Until recently, there has been a lack of consistency and

• To address this, in the mid 1990s, international bodies

Measurements made during an experiment

CHAPTER 2: Theory/Literature Review

CHAPTER 4: Data Analysis

• With the data in hand, a most important question is

• An attempt to answer this question is the essence of

• Discussion is organised under the headings:

• These can be thought of as the different stages in the

• Different levels of analysis are involved in the different

• ‘Data Presentation’ possibly involves the most analysis.

• If an experimental data value is recorded as 6.12, this

• If the value is written as 6.124, then this implies that the

E. 0.100 007 38 Eight

• Example: A time interval of 346 s can be written as 346 000 ms,

o Note: It is possible to have a timing device with a resolution of

• The way to get around this confusion is to present the

Number Scientific notation

• For numbers expressed in scientific notation, the number of

# Number Scientific notation

# Number Scientific notation

• Rule 1: When multiplying or dividing numbers: give the

• Example: 3.7 × 3.01 = 11.37.

• Example: 11.24 + 13.1 = 24.34.

• The true value can never be known with absolute precision,

• If our estimate is close to the true value, then we say that

• A measurement is precise when the uncertainty on the

• Estimates of the uncertainty in measurements should

• For tabulation of data with uncertainties, it is best to write

Temperature (°C) ± 0.5 °C Resistance (Ω) ± 0.5 Ω

• Although it may be possible to reduce an uncertainty by

• We need to be able to identify and quantify the variation,

• All measurements are limited by the instrument you are

• If the quantity measured is stable or varies slowly with time,

• The resolution limit of an instrument represents the smallest

• Example: Heating a beaker containing water using

• Quoting uncertainty of ±1°C would underestimate the

• We estimate uncertainty to be less than ±5°C, but greater

• If scientists around the world are trying to compare their

• An uncalibrated, or poorly calibrated, instrument leads to

• Where this is possible we can use statistical tools to allow

• If a single value for the time of fall is required, we can do no

• Using the data given, the mean time is 0.621 s.

• We could quote the mean to one, two or three significant

• We can answer this question only when we have an

range = largest value – smallest value

• The uncertainty in the mean is found by dividing the range

• The mean is 341.775 m/s; the range 345.5 – 338.5 = 7 m/s.

• We might be tempted to say that the speed of sound in air is

• Uncertainties serve to quantify the probable range in which

• If the first figure in the uncertainty is a ‘1’, it is usual to give

• In the present example we would round 0.875 m/s up to 0.9

• We now further round the mean to the same number of

1. Calculate the mean of the measured values.

• The probability is high that it will lie between these limits,

• The uncertainty that is expressed in the same units as the

• This ratio is referred to as the fractional uncertainty in the

• The percentage uncertainty is found by multiplying the

• Fractional or percentage uncertainty are normally quoted to

• There are two types of systematic uncertainty which can