Sie sind auf Seite 1von 14

Student Ref: 20177530

Statistics Assignment
Question 1

a) Frequency Distribution Table

Class Interval Midpoint Frequency

100.1 – 103 101.55 15

103.1 – 106 104.55 11

106.1 – 109 107.55 14

109.1 – 112 110.55 19

112.1 – 115 113.55 12

115.1 – 118 116.55 14

118.1 – 121 119.55 15

Figure 1

Figure 2

The frequency distribution table was created by arranging the data into separate
classes and determining their ranges. Then the midpoint was calculated for each
class by averaging the lower and higher band of their range:

Classh+Classl2

The frequency of the amount of resistors in each band is calculated by either


manually counting the number that fall in that particular range or by using the
following excel formula which counts the amount of resistors in the higher band
and subtracts the number of resistors in the lower band:

=COUNTIF(A3:A102,”>=118.1”)-COUNTIF(A3:A102,”>121”)

b) The data set was inputted onto a spreadsheet in order to produce the
histogram and calculate the following mean.

The mean can be calculated by inputting the following formula into the
spreadsheet cell:

=AVERAGE(A3:A102)

The outputted result of the mean is 110.621.


Student Ref: 20177530

The mean can also be calculated from the histogram graph by summing the
products of each bar’s frequency and midpoint (area) and then divide by the sum
of the frequencies:

Midpoint Frequency Midpoint x Frequency

101.55 15 1523.25

104.55 11 1150.05

107.55 14 1505.7

110.55 19 2100.45

113.55 12 1362.6

116.55 14 1631.7

119.55 15 1793.25

Total 11067

Total / sum of 110.67


Frequencies

Figure 3

This method is a good estimate of the mean but not as accurate.

c) First method to determine the mode of the data using the spreadsheet
functions is by inputting the following into a cell:

=MODE(A3:A102)

This produces the result 118.2. The mode represents the most commonly
occurring number in the data set which can also be calculated manually. We can
also calculate the mode from the histogram by drawing a diagonal line from the
upper corners of the highest block to adjacent block upper corners on opposite
sides shown below:

A vertical line is then plotted


through the point where the two
diagonal lines intersect. The
value at the point where it
crosses the horizontal axis
determines the mode. This is a
Student Ref: 20177530

crude method and not very accurate compared to the calculated method using
the raw data.

Figure 4

A cumulative data plot represents all the resistors in ascending order. From this
we can determine the mode from identifying the longest horizontal section with
no gradient in the trend as follows in figure 5:

Figure 5

d) The median of the data set is determined by inputting the following


formula into a cell in the spreadsheet:

=MEDIAN(B3:B102)

This returns the value of 110.95. This represents the middle value of the data set
range once the values have been sorted. There are 100 resistor values in the
range so the median is between the 50th and the 51st value which are 110.9 and
111.0 respectively. To calculate this manually we average the two values as
follows:

110.9+111.02=110.95

To determine the median from the histogram a vertical line is drawn that divides
the total area of the histogram into two even parts. The total area of the
histogram is 3 x 100 = 300 units. Halving this produces 150 which gives us the
unit area required on either side of the median. To achieve this the largest
rectangle must be split so that

150 – (45 + 33 + 42) = 30 units lay to the left and


Student Ref: 20177530

150 – (36 + 42 + 45) = 27 units lay to the right.

This corresponds that the median lies at approximately 111. This calculation is
slightly less accurate than the first method.

Question 2

a) Here we determine the standard deviation from the following data:

Maximum Load Number of Cables

84 – 88 4

89 – 93 10

94 – 98 24

99 – 103 34

104 – 108 28

109 – 113 12

114 – 118 6

119 – 123 2

Figure 6

To achieve this another table is formed which details midpoint x, xc, frequency f,
xcf and xc2f with a step size of 5.

x xc f xcf xc2f

86 -3 4 -12 36

91 -2 10 -20 40

96 -1 24 -24 24

101 0 34 0 0

106 1 28 28 28

111 2 12 24 48

116 3 6 18 54

121 4 2 8 32
Student Ref: 20177530

Totals 120 22 262

Figure 7

From figure 7 we can calculate the mean of xc using the following formula:

c=xcff=22120=0.183

=101+0.183×5=101.917

The following formula is used to determine the standard deviation:

σc=xc2ff- c2=262120-0.1832=2.15=1.466

Then multiply this by the step increment value of 5 to find the standard
deviation:

σ=1.466×5=7.331

This value of 7.331 represents the average distance that the values in the data
set lay away from the mean which entails that this set has a large dispersion and
variability of values.

b) To obtain the standard deviation using a spreadsheet the following


formulae was inputted:

x xc f xcf x2cf ̅xc


MAX LOAD =F11/E11
=((B2-
84 88 A2)/2)+A2 -3 4 =D2*E2 =D2*D2*E2
=((B3-
̅x
89 93 A3)/2)+A3 -2 10 =D3*E3 =D3*D3*E3 Mean =C5+(J1*(C3-C2))
=((B4-
94 98 A4)/2)+A4 -1 24 =D4*E4 =D4*D4*E4
=((B5- =SQRT((G11/E11)-
σc
99 103 A5)/2)+A5 0 34 =D5*E5 =D5*D5*E5 (J1*J1))
=((B6-
104 108 A6)/2)+A6 1 28 =D6*E6 =D6*D6*E6
=((B7- Standard
σ
109 113 A7)/2)+A7 2 12 =D7*E7 =D7*D7*E7 Deviation =J5*(C3-C2)
=((B8-
114 118 A8)/2)+A8 3 6 =D8*E8 =D8*D8*E8
=((B9-
119 123 A9)/2)+A9 4 2 =D9*E9 =D9*D9*E9

Total =SUM(E2:E =SUM(F2:F =SUM(G2:G


s 9) 9) 9)
Student Ref: 20177530

The mean , was calculated as 101.917, and the standard deviation to be 7.331
which is the same as the previously calculated values.
Student Ref: 20177530

Question 3

a) Using the Binomial distribution, we can determine the probabilities of


defective components with this formula:

(q+p)n=qn+nqn-1p+n(n-1)2!qn-2p2+nn-1(n-2)3!qn-3p3+…

i. Probability that 0 will be defective:


3% of 50 equals 1.5, 1.5/50 = 0.03 is the probability.
Therefore p = 0.03 and q = 0.97

qn=0.9750=0.218

ii. Probability that 1 will be defective:

nqn-1p=50×0.9749×0.03=0.337

iii. Probability that 1 or less will be defective:

qn+nqn-1p=(0.9750)+(50×0.9749×0.03)=0.218+0.337=0.555

iv. Probability that 2 or more will be defective:

1-qn+nqn-1p=1-[(0.9750)+(50×0.9749×0.03)]=1-0.555=0.445

a) Using the Poisson distribution, we can determine the probabilities of


defective fuses with this formula:

e-λ(1+λ+λ22!+λ33!+…)

i. Probability that 0 will be defective:

λ=np=250×0.01=2.5
e-2.51=0.082

ii. Probability that 1 will be defective:

λe-λ=2.5e-2.5=0.205

iii. Probability that 2 will be defective:


Student Ref: 20177530

λ2e-λ2!=2.52×e-2.52×1=0.257

iv. Probability that 3 or more will be defective:

1-(e-λ+λe-λ+λ2e-λ2!)=1-0.082+0.205+0.257=0.456

a) To compare the two distribution results for 10% of defective items


produced by a machine with a sample amount of 10, we can see how they
differ for 2 defective items.

10% = 0.1 = p, therefore q = 0.9 and n = 10.

Binomial:

(q+p)n=qn+nqn-1p+n(n-1)2!qn-2p2+nn-1(n-2)3!qn-3p3+…

n(n-1)2!qn-2p2=10(10-1)2×1×0.910-2×0.12=0.194

Poisson:

λ=np=10×0.1=1

e-λ(1+λ+λ22!+λ33!+…)

λ2e-λ2!=12e-12×1=0.184

There is a difference of 0.010 between the two approaches. This can be


explained by the fact that the Poisson distribution is concerned with the situation
where the number of trials has no limit. The binomial distribution involves a finite
number of possibilities where the number of trials is known and is a key variable
in the probability calculation. The Poisson distribution is a good approximation of
the binomial method but in this instance, where n is less than 10, it is more
accurate to use the binomial method despite the Poisson saving on computation.
Student Ref: 20177530

Question 4

For the sample size, N = 7, and mean,

=1.12+1.15+1.10+1.14+1.15+1.10+1.117=1.1243Ωm-1

And standard deviation:

σ=(1.12-1.1243)2+(1.15-1.1243)2+(1.10-1.1243)2+…
7=0.002971437=0.0206Ωm-1

With the mean and standard deviation calculated, we must find z, the normal
standard variate.

z=99÷2100=0.495

Using the following table Partial areas under the standardised normal curve in
figure 8, z has been identified by the red square which equates to 2.57.

Using the formula:

±zσn

We can evaluate the confidence intervals:

1.124+2.57×0.02067=1.144 Ωm-1

1.124-2.57×0.02067=1.104Ωm-1
Student Ref: 20177530

Figure 8 – Partial areas under the standardised normal curve1

1 Higher Engineering Mathematics 2006, John Bird, p.561


Student Ref: 20177530

Another method is to employ the Student’s t distribution where v has been


identified by the red square:

Figure 9 – Percentile values (tp) for Student’s t distribution with v degrees of


freedom

2
(shaded area = p)

So, t0.99 , v = 7 – 1 = 6, which shows tc = 3.14.

2 Higher Engineering Mathematics 2006, John Bird, p.587


Student Ref: 20177530

From this we can carry out the following calculation:

±tcsN-1=1.1243±3.14(0.0206)7-1=1.1243±0.0264

Which gives rise to the two possible values of:

1.1243+0.0264=1.151Ωm-1

1.1243-0.0264=1.098Ωm-1

This indicates that there is a 99% chance that the true specific resistance of the
wire lies between 1.151Ωm-1 and 1.098Ωm-1.
Student Ref: 20177530
Student Ref: 20177530

Das könnte Ihnen auch gefallen