Sie sind auf Seite 1von 2

DATA ANALYSIS

An accident occurred in the factory and there is concern


of release of a toxic substance at 7 sites of the factory.
Are the measured PPM values worse than normal?

A first visual comparison of the histograms for each The distributions of values between measurements
RAW DATA ANALYSIS

site and hour measured against the Before with Sensor 1 and Sensor 2 suggest there are no
BEFORE

histogram, suggest that Site 3, Site 4, Site 6 and important differences for considering to perform
Site 7 could be the more affected sites. the analysis by Sensor.

SITE 1 SITE 2 SITE 3 SITE 4 SITE 5 SITE 6 SITE 7


HOUR 1
HOUR 2
HOUR 3
HOUR 4

The Boxplot
visualizations over the
OUTLIERS ANALYSIS

29 datasets, before
and after performing
the Outlier Analysis
with Tukey and
PPM

Hampel methods,
mainly suggest that
the distribution of the
Before dataset with
a logarithmic
transformation could
follow a normal
TRANSFORMATION)
(LOGARITHMIC

distribution.
PPM_LOG

Neither Tukey nor


Hampel methods
could remove all
outliers from the
untransformed Before
dataset.
NORMALITY ANALYSIS

PPM
TRANSFORMATION)
(LOGARITHMIC
PPM_LOG

The Normality Analysis results for the Before datasets suggest that the logarithmic transformed version of the data for the complete dataset and the Tukeys method resulting dataset follow a Normal Distribution. In these cases, the
4 tests performed (ShapiroWilk, KolmogorovSmirnov, Cramervon Mises and AndersonDarling) returned Pvalues over 0.05, not rejecting Normality.
The Normality Analysis results for the After datasets (the other 28 datasets for the 7 sites and 4 hours), in a vast majority, rejected normality. In the case of the logarithmic transformed datasets, just the complete (without removing
outliers) Site 5 dataset did not get a rejection of normality from the 4 tests performed.

Datasets
As the main goal of obtaining normally distributed datasets by removing the outliers was not achieved, the 28 datasets
considered (7 Sites by 4 hours) are the original datasets (without removing outliers). There was not found any reasons
to remove these outlier points.
TESTS AND RESULTS

Hypothesis testing
To know if the measured PPM values are worse than normal, there were performed 3 one sided tests for two
independent samples to check for rejections in the means values greater than normal hypothesis between each one of
the 28 After datasets with the Before dataset. The tests were performed for the untransformed data and for the
PPM

logarithmic transformed data. Hence, the results shown here correspond to 56 comparisons in total.

Ttest Safe datasets (PPM values): Site 1 (Hour 2), Site 2 (All hours), Site 5 (Hour 4)
Safe datasets (PPM_LOG values): Site 2 (All hours)
TRANSFORMATION)

*However, the Ttest requires the normality assumption which is already rejected in the previous part. Therefore, T
(LOGARITHMIC

test is not the right choice for testing the equality of means for these data sets.
PPM_LOG

KolmogorovSmirnoff Test Safe datasets: Site 2 (All hours)


*This test relaxes most of the assumptions, therefore the results from this test can be trusted.
WilcoxonRankSum Test Safe datasets (PPM values): Site 1 (Hour 2) Safe datasets (PPM_LOG values): None
*One of the required underlying assumptions for this test (variance homogeneity) is violated and the results of the
Wilcoxon rank sum test cannot be trusted.

Conclusion
All sites, except for Site 2, present PPM values worse than normal.

BY DOUGLAS GARCIA TORRES


D.M.GARCIA.TORRES@STUDENT.TUE.NL
STUDENT ID 1036847 2DMT00 APPLIED STATISTICS
Test results

* T-test requires the normality assumption which is already rejected in the previous
part. Therefore, T-test is not the right choice for testing the equality of means for
these data sets.

* These results use the Monte Carlo approach.

* These results use the Monte Carlo approach.


* One of the required underlying assumptions for this test (variance homogeneity)
is violated and the results of the Wilcoxon rank sum test cannot be trusted.

BY DOUGLAS GARCIA TORRES


D.M.GARCIA.TORRES@STUDENT.TUE.NL
STUDENT ID 1036847 2DMT00 APPLIED STATISTICS

Das könnte Ihnen auch gefallen