10 views

Uploaded by Lei Yin

Data analysis for sampling

- Paper Geospatial
- Review of Credit Rating [Company Update]
- R-REC-SM.1880-1-201508-I!!PDF-E
- NIDS Block Diagrm Doc
- statisticsquiz
- CIMAC_Guidance for evaluation of Fatigue Tests
- GRADE guidelines 01. Introduction- GRADE evidence profiles and summary of findings tables (1).pdf
- Preview Book Method-Validation
- Accuracy and Precision Lab-Overview-Muskegon Heights
- SBE8-SM08
- Econometrics Project Iorganda Beatrice Cristina 133 Revised
- V1 3 Ch3 Uncertainties
- cb work
- Scope and Sequence Stats H
- 1-s2.0-S0001457505000667-main
- u27 lo1
- Present Generalized Architectures That Unify Multiple Experimentation Systems
- M1231_SYLLABUS_Sum16
- Bluman 6th Ed Stats Formulas
- Testing Static Tradeoff against Pecking Order Models of Capital Structure inBrazilian Firms

You are on page 1of 8

DATA ANALYSIS

(Part 1)

HANIM AWAB Department of Chemistry Faculty of Science UTM

assessment on fewer data (generally>25) or data accumulated from the analysis of similar samples The problem is examined with respect to precision, accuracy and reliability required of the results Analysis of the results obtained are resolved into two stages: - examination of the reliability of the results - assessment of the meaning of the results

TYPES OF ERROR

1. GROSS ERROR (eg. eg. C Contaminated ontaminated reagents, faulty instrument) - Serious obvious errors that give outlier readings - Detectable with sufficient replicate measurements - Experiments with gross errors must be repeated 2. RANDOM/INDETERMINATE ERROR (eg. eg. Inaccurate manipulation of procedure) - Data scattered symmetrically about a mean value - Deviations of measurements from the mean shown using the Gaussian or normal error curve - Cannot eliminate but can be minimized - Error can be assessed by statistical tests

and calculate the size of the errors

minimized and approximated to an acceptable precision

Some ways to overcome errors Carry out replicate measurements Analyse accurately using known standards or standard reference materials (SRM) Perform statistical tests on data

3. SYSTEMATIC/DETERMINATE ERROR Operator/Instrument error/Method error - All data too high/too low or data increases with magnitude of measurement - Causes bias in technique (either +ve +ve or ve) ve) - Affects accuracy - May be detected by: - blank determinations, - analysis of standard samples, - independent analyses by alternative/dissimilar methods - Can be avoided/eliminated avoided/eliminated by correcting instrument, method and personal errors* errors*

*Ways to minimize/eliminate systematic errors Instrument errors: - Careful recalibration and good maintenance of apparatus (eg (eg glassware) and instruments ( (eg eg AAS, GC)

materials (SRM) Use 2 or more independent methods - Analysis of blanks

Personal errors:

- Training of operator, care and selfself-discipline

when a standard sample is analyzed (value estimated from results of varying precision depending on the method used) Accuracy - nearness of a measurement or result to the true value (expressed in terms of error) Precision - variability of a measurement (Standard deviations are precision indicators) SpreadSpread - difference between the highest and lowest results in a set (spread is a measure of precision) Mean - average of a replicate set of results Median - middle value of a replicate set of results

Degree of Freedom - number of results in a set (each time another quantity is derived from the set, the degrees of freedom are reduced by 1) Range - difference between the highest and lowest value of the results Standard Deviation (s or ) - difference, with respect to sign, between an individual result and the mean or median of the set Relative Standard Deviation (RSD) - Also known as the coefficient of variation, often used in comparing precisions Variance (V) (V) - square of the value of standard deviation (2 or s2)

Determinations/Formula

MEAN (AVERAGE) MEDIAN

STANDARD DEVIATION Measure of spread about the mean Estimate the variability of individual measurement (The standard deviation is better estimated by the pooling of results from more than one set)

divided by number of measurements

N

order, if data in the middle is an odd number record it as the median Arranged in ascending order, if two middle data are even numbers then average the two numbers

x =

i = 1

N-1 = degree of freedom

xi

(

x

2222

xxxx

))))

iiii

ssss

iiii

1111 NNNN

aka population, N = Number of replicate

2222

))))

iiii

RELATIVE STANDARD DEVIATION (RSD)/ COEFFICIENT OF VARIATION (CV) Standard deviation divided by mean (depends on the units used)

Mean = xi/N = 0.077 (x xi-mean)2 = 4.01x10-4 VARIANCE The square of standard deviation - Sample variance ( 30) 30): V = s2 - Population variance (large #) #): : V = 2

Sample 1 2 3 4 5 6 7 8 9

Se (mg/g) 0.07 0.07 0.08 0.07 0.07 0.08 0.08 0.09 0.08

(xi - mean) 4.9x10-5 4.9x10-5 9.0x10-6 4.9x10-5 4.9x10-5 9.0x10-6 9.0x10-6 1.69x10-4 9.0x10-6

S.D. =

s=

(x

i

x)2

= 0.007

N 1

STD. DEV. FOR POOLED DATA (Spooled) To achieve a value of good approx. to s for N 30, it is sometimes necessary to pool pool data from a number of sets of measurements Suppose there are t small sets of data, comprising N1, N2,.Nt measurements, the equation for the resultant sample standard deviation is:

Analysis of 6 bottles for sugar

Bottle Sugar (% ) 1 0.94 2 1.08 3 1.20 4 0.67 5 0.83 6 0.76

2222

Obs 3 4 5 4 3 4

2222

Deviations from mean 0.05, 0.10, 0.08 0.06, 0.05, 0.09, 0.06 0.05, 0.12, 0.07, 0.00, 0.08 0.05, 0.10, 0.06, 0.09 0.07, 0.09, 0.10 0.06, 0.12, 0.04, 0.03

2222

2 2 2

N1

N2

N3

= (

5 0 . 0

) +(

2

0 1 2222 . 0

) +(

8 0 . 0

9 8 1 2222 0 . 0

) =

spooled =

i =1

i =1

i =1

N1 + N2 + N3 +......t

S 1 2 3 4 5 6 Total

ssss

7 9 0 . 0 =

1111

i

6 6666 2 3 1 3 . 0 2

ssss

% 8 8 0 . 0

d e l o o p

Solve this Problem Given a set of diameters of four cells in units of m, 120, 135, 160 150 (a) Use functions available in your calculator (b) Use the Excel Spreadsheet (at your own time and submit the data and result printout) Calculate the following: - Mean - Median - Standard Deviation - Relative Standard Deviation (RSD) - Variance

PRECISION

- Reproducibility (repeatability) of repeated measurements ie How similar are values obtained in exactly the same way? Useful for measuring deviation from the mean

d i = xi x

ACCURACY Nearness (proximity) to the true value, ie. measurement of agreement between experimental mean and true value (which may not be known!) Measures of accuracy:

- Absolute error:

- Relative error: E R = |

xi | 100%

Discussion Question 1 Four students analyzed Fe content in a sample. Each student performed 5 replicates and the results are illustrated below. Comment on the accuracy and precision of each set of results (Hint: Student C obtained the best results)

True value A B C D 9.80 10.00 10.20 mean 10.10 9.90 10.01 10.01

Discussion Question 2 - Comment on the accuracy and precision of the following results. Explain or show proof? - Which set of data has to be thrown out (discarded)? (discarded) ? Why?

Student A B 10.10 10.08 10.09 10.07 10.08 10.10 0.01 C 9.65 9.75 9.78 10.07 10.24 9.90 0.25 D 9.97 9.98 10.02 10.03 10.05 10.01 0.03 E 9.80 9.89 10.01 10.13 10.22 10.01 0.17

X

DATA VALUE 10.00 10.00 10.00 10.00 10.00 10.00 0.00

CONFIDENCE LIMIT & CONFIDENCE INTERVAL Confidence Interval (CI) is the range of values surrounding the mean, mean, within which the population mean, is expected to lie with a certain degree of probability The boundries of the range are called the Confidence Limits Confidence Level (CL) is the probability that the true mean lies within a certain interval (expressed as %) Example: It is 99% probable that for a set of measurement is 7.25mg 0.15. Thus, the mean should lie in the interval from 7.10mg to 7.40mg with 99% probability

CI for large no. of data (>30) with known population std deviation, CI for small no. of data (30) without knowing (know s)

=x z N

=x ts N

Values of z for determining confidence limits Confidence level (%) 50 68 80 90 95 96 99 99.7 99.9 z 0.67 1.0 1.29 1.64 1.96 2.00 2.58 3.00 3.29

N = Number of measurements z = values from normal distribution curve (Read from the zz-table) t = values from normal distribution curve but depends on the degree of freedom (N(N-1) (Read from the tt-table) t is also known as the students t, generally used in hypothesis tests

Degrees of Freedom (N (N-1) 1 2 3 4 5 6 7 8 9 19 59 80% 3.08 1.89 1.64 1.53 1.48 1.44 1.42 1.40 1.38 1.33 1.30 1.29 90% 6.31 2.92 2.35 2.13 2.02 1.94 1.90 1.86 1.83 1.73 1.67 1.64 95% 12.7 4.30 3.18 2.78 2.57 2.45 2.36 2.31 2.26 2.10 2.00 1.96 99% 63.7 9.92 5.84 4.60 4.03 3.71 3.50 3.36 3.25 2.88 2.66 2.58

SAMPLE QUESTION (CONFIDENCE INTERVAL) Calculate the confidence interval (CI) at 95%, 90% & 99% confidence level given the following data for the analysis of Ca in a rock sample: 14.35, 14.41, 14.40, 14.32, 14.37 Mean = 14.37, s = 0.037 From table: @ confidence level 95% & NN-1 = 4, t = 2.78 = 14.37 2.78 x 0.037 CI = = x t s =

Confidence interval is 14.37 0.05 or 14.32<< 14.42 Summary of results (calculate the rest by yourselves): @ Confidence level Confidence interval (CI) 90% = 14.37 0.04 95% = 14.37 0.05 = 14.37 0.08 99% If confidence level increases, the CI increases, and the probability of appearing in the interval also increases

AAS analysis of Cu in aircraft engine oil gave a mean value of 8.53 mg Cu/mL Cu/mL. . Pooled results of many analyses showed that s = 0.32 mg Cu/mL Cu/mL. . Calculate the confidence intervals (CI) at 90% & 99% confidence levels based on (a) 1 (b) 4 (c) 16 measurements (a) Confidence limit (CL) = = x t s

(b)

90%, CL = 8.53

99%, CL = 8.53

N

(c)

90%, CL = 8.53

@ 99%, CL = 8.53

99%, CL = 8.53

Analysis of an insecticide gave the following values for % of Lindane: 7.47, 6.98, 7.27. Calculate the CL for the mean value at the 90% confidence level

OTHER USAGE OF CONFIDENCE INTERVAL To determine # of replicates (N) needed for the the mean to be within the confidence interval To determine systematic error

2 i

x=

x

N

2172 . = 7.24 3

s=

@90%, CL = x ts

= 7.24

Example 1: 1: Calculate the number of replicates needed to change the confidence interval by 1.5 g/mL at 95% confidence level. Given, s = 2.4 g/mL

Example 2: 2: Ten measurements on a sample gave a mean of 0.461, with std dev of 0.003. A solution gave a reading of 0.470. Show whether systematic error exists at 95% confidence level At 95 95% % confidence level, (N (N 1) = 9, t = 2.26

(0.003 ) ts = 0.461 2.26 N 10 = 0.461 0.002 This means, 0.459 < < 0.463, ie 95% of the time, the true value lies between 0.459 to 0.463 Therefore, the the reading 0.470 is NOT in the range, and systematic error EXISTS

= x

DISTRIBUTION OF ERRORS

NORMAL or GAUSSIAN distribution (bell shaped, symmetrical curve) gives limits within which the population mean () is expected to lie with a given degree of probability (without any systematic error)

50% -0.67s +0.67s 80% -1.29s

dN/N

95% +1.29s

dN/N

Based on the curve, percentages of area under the curves between certain limits of z are as follows: 50% of area lies between 0.67s 80% " 1.29s 90% " 1.64s 95% " 1.96s 2.58s 99% " When we say that at a confidence level of 80%, the confidence limits are 1.29s we mean that: - 80% of the time the true mean will lie between 1.29s of the measurements made - or in other words 20% of the time the true mean will NOT lie between 1.29s

-1.96s

+1.96s

dN/N

1s 2s 3s 4s

-4s -3s -2s -1s 0 1s 2s 3s 4s -4s -3s -2s -1s 0 1s 2s 3s 4s -4s -3s -2s -1s 0

mean is indicated by

SIGNIFICANCE TESTS

Tests whether the difference between two results is significant (or merely due to random variations) - used to decide whether the difference between the measured and known values can be explained by random errors The NULL HYPOTHESIS, HYPOTHESIS, Ho If Ho is accepted: accepted: means there is NO significant difference between observed and known values (other than that due to random observation) If Ho is rejected: rejected: means difference is significant

Has two uses: (1) Comparison of true value, and mean, to detect if difference is significant - Used to detect the existence of systematic error or bias Calculate t (generally for 95% confidence level) If value of tcalculate < tcritical (ie tcalc < ttable), ACCEPT the null hypothesis, thus Ho: = Accepting Ho means that there is NO significant difference (or no systematic error) at the 95% confidence level, but there is 5% probability that there is a sgnificant difference

(2) Comparison of means ( ) of two samples - eg Compare mean of new method with a reference (or standard) method - Accept Null hypothesis (Ho) if NO significant difference between methods ie the results are the same, or =0 - Calculate t, if tcalc < ttable, accept Ho to show that there is NO significant difference in results Use pooled estimate of std dev, s2={(n1-1)s12+ (n2-1)s22} / (n1+n2-2), or

The F Test

F-TABLE

- One tailed test: test: test whether method A is more - Two tailed test: test: test whether methods A and B - F is ratio of two

sample variances:

precise than method B (assumes A is always precise) differ in their precision (ie any method can be precise)

F=

s2 1 1 = 2 2 s2

Ho: Population variances are equal (or 1) [F is always >1, thus the smaller ie the more precise is always the denominator] If Fcalc < Ftable (Accept Ho) which means that there is NO significant difference in precision between the two methods

Example Question: ONEONE-TAILED F TEST A proposed method for COD of wastewater was compared with a standardized method The results are given as follows: Standardized method (8 (8 determinations): determinations): mean =72 mg/L, s = 3.31 mg/L determinations): Proposed method (9 (9 determinations): mean = 72 mg/L, s = 1.51 mg/L () Is the proposed method significantly more precise than the standardized method? F = (SStd)2/(SProp)2 = (3.31)2/(1.51)2 = 4.8 Data values: 8 for Std & 9 for proposed, thus from the FF-table degrees of freedom (N(N-1) = 7numerator and 8denominator, Fcrit = 3.50 Since Fcalc >Ftable , reject Ho. Thus there is a significant difference bet the methods and the proposed method is significantly more precise

Set as denominator

Example: Determination of CO using a Standard Procedure gave an s value of 0.21 ppm. The method was modified twice giving s1 of 0.15 and s2 of 0.12 (both 9 degrees of freedom). Are the modified methods significantly more precise than the std? Ho : s1 = sstd Ho: s2 = sstd

F1 =

2 std 2 1

F2 =

In standard methods the # of data is large, thus s, & degrees of freedom becomes infinity, From FF-table, num=, den=9; Fcrit = 2.71 F1< Ftable : accept Ho but F2>Ftable : reject Ho Only the 2nd modified method is is significantly more precise than the standard method

The Q TEST or DIXONS TEST (Detection of gross errors) The QQ-Test is used for detecting outlier (suspected unreasonable data) which statistically does not belong to the set Example: Example : 10.05, 10.10, 10.15, 10.05, 10.45, 10.10

normal range (More easily observed when numbers are arranged in a decreasing or increasing order) 10.05, 10.05, 10.10, 10.10, 10.15, 10.45

The Qcal is compared with the Qtable and the null hypothesis, Ho is checked

Q expt =

= 0.75

From QQ-table (@95% & N=6) Q = 0.625 (Q-table:Next slide ) Qcal > Qtable data (10.45) can be rejected

will change from the original value if changed!)

Contd

Q TABLE No. of Observations 3 4 5 6 7 8 9 10 Confidence Level 90% 0.941 0.765 0.642 0.560 0.507 0.468 0.437 0.412 95% 99% 0.970 0.829 0.710 0.625 0.568 0.526 0.493 0.466 0.994 0.926 0.821 0.740 0.680 0.634 0.599 0.568

EXAMPLE QUESTION: QQ-TEST The following data was obtained for the determination of nitrite concentration (mg/L) in a sample of river water: 0.403, 0.410, 0.401, 0.380, 0.400, 0.413, 0.411 Should the data 0.380 be retained? Q = |0.380 - 0.400|/|0.413 - 0.380)| = 0.606 From the QQ-table: Sample size 7, Qtable = 0.570 Qcalc>Qtable, thus the suspect outlier is rejected

- Paper GeospatialUploaded byMuhammadGhazianRahman
- Review of Credit Rating [Company Update]Uploaded byShyam Sunder
- R-REC-SM.1880-1-201508-I!!PDF-EUploaded byandiexxx
- NIDS Block Diagrm DocUploaded byashutoshghule
- statisticsquizUploaded byodin203
- CIMAC_Guidance for evaluation of Fatigue TestsUploaded byacwind
- GRADE guidelines 01. Introduction- GRADE evidence profiles and summary of findings tables (1).pdfUploaded bywinkhaing
- Preview Book Method-ValidationUploaded byvrcom
- Accuracy and Precision Lab-Overview-Muskegon HeightsUploaded byJeffrey Ells
- SBE8-SM08Uploaded byMauricio Casanova
- Econometrics Project Iorganda Beatrice Cristina 133 RevisedUploaded byEmi Baka
- V1 3 Ch3 UncertaintiesUploaded bytrongsipra
- cb workUploaded byAkansha Goyal
- Scope and Sequence Stats HUploaded bybillups400
- 1-s2.0-S0001457505000667-mainUploaded byNandeesh Mallapurmath
- u27 lo1Uploaded byapi-295102908
- Present Generalized Architectures That Unify Multiple Experimentation SystemsUploaded byapi-10558660
- M1231_SYLLABUS_Sum16Uploaded byAnkur Dalal
- Bluman 6th Ed Stats FormulasUploaded bybjensvold4193
- Testing Static Tradeoff against Pecking Order Models of Capital Structure inBrazilian FirmsUploaded byLữ Anh
- IB Lab Report Guide.docUploaded byYusuf Dalva
- Portal Frame (Autosaved)Uploaded byRaJ ShlrzvesteR
- Performance Evaluation of GNSS for Train LocalizationUploaded bySreekanth Pagadapalli
- 007 Confidence IntervalUploaded byFufu Zein Fuad
- 6PH03 Exemplar Material - Script 5Uploaded byLegacyGrade12
- MSA_J-okUploaded byAnonymous 5F2C52v
- 2-EvaluasiPPGUploaded bysekar ayu
- abelUploaded byMarian Soroceanu
- How to Prepare a Laboratory Report for APSC182Uploaded byTanjid Hossain
- Acurancy TransducerUploaded byGustavo

- Appendix C CITablesUploaded byErika Madrazo
- Ch6_AnnualWorthAnalysisUploaded byLei Yin
- Ch4 Effective Interest Feb 2013Uploaded byLei Yin
- PFD ModificationUploaded byLei Yin
- Plant Design_Separation_Tower DesignUploaded byLei Yin
- Plant Design Costing RevisionUploaded byLei Yin
- Annuity Problems for EngineeringUploaded byAnonymous I7TYFsv
- Ch1 Foundations_Engineering Economic ExerciseUploaded byLei Yin
- Ch1 Foundations_Engineering Economic ExerciseUploaded byLei Yin
- Eit OrginalUploaded byLei Yin
- 29920922 Sample Problem 6Uploaded byLei Yin
- 04 Script Examples Solid Liquid ExtractionUploaded byLei Yin
- 04 Script Examples Solid Liquid ExtractionUploaded byLei Yin
- 1.1 Introduction of analytical chemistryUploaded byLei Yin
- Blank Tarquin Engineering Economy Selected Solutions 6th Ed Chapter 2Uploaded byLusash1
- skkk4173_Assignment2_EEUploaded byLei Yin
- skkk4173_Assignment1_engineering economicUploaded byLei Yin
- Blank Tarquin Engineering Economy Selected Solutions 6th Ed Chapter 1Uploaded byLusash1
- 2.2 Data AnalysisUploaded byLei Yin

- Chapter 03 W5 L1 Discrete Prob Dist - Bin and Poisson 2015 UTP C2Uploaded byack
- 15Uploaded bymayrstjk
- Statistical Tools in Research (June 23,2014)Uploaded byKenmiharu Soriano
- MA1061Uploaded bymember787
- 041SCF13Uploaded byujnzaq
- Using Skellam's Distribution to Assess Soccer Team PerformanceUploaded byBartoszSowul
- Sample - Lss Green Belt CourseUploaded byAhmed Ragab
- 10_stockwatson_1Uploaded byImran Akber
- Quartile Deviation Chap3Uploaded byIshwar Chandra
- Basics of Multivariate NormalUploaded bySayan Gupta
- Ch4 SlidesUploaded byMahadir Ahmad
- Probability and Stochastic ProcessesUploaded byPaul Malcolm
- syllabusUploaded byLee J
- Review No 110Uploaded bywatcharp
- MathUploaded bysj314
- 55379324-Module-5Uploaded byAmer Rahmah
- stat11t_Chapter8 statisticsUploaded byEllen Causing Delfin
- Estimating-the-Mean-and-Variance-of-a-Normal-Distribution.pdfUploaded byBhooveeshay Toolseeram
- Data AnalysisUploaded bymalyn1218
- Probability and Stochastic ProcessesUploaded byAshiquzzaman Akash
- Least Squares Method for Factor AnalysisUploaded byPrasanna Kumar
- Bayesian TutoUploaded byМхамед Аит Абдерахман
- Arithmatic FormulaUploaded bySelva Kumar Krishnan
- Introduction to Statistics for Bio Medical Engineers - Kristina M. RopellaUploaded byCarl Azzopardi
- 3302 SPSS 1way Repeated Measures ANOVA Stroop ExampleUploaded byaw1435
- QNT 275 Week 5 Final ExamUploaded by116117Math
- Expectation+and+MomentsUploaded byAllen Brown
- Becker (2013) Discovering Unobserved Heterogeneity in SEMUploaded bycaejt44
- Output Uji Anova WordUploaded byEndang Nazara
- Random Number GenerationBIGUploaded byNikhil Aggarwal