Sie sind auf Seite 1von 38

TUGASAN 3

PN SITI NUR DIYANA

Why clean your data?


Screening process Detect errors Missing data Outliers Make sure data meets assumptions for analysis Normality

2 Types of Screening
1. Preliminary data screening Screen one variable at a time on the entire data set before any analysis 2. In conjunction with statistical analysis Dependent on analysis being performed

Data Cleaning Tips for making your data suitable for analysis

Steps
1. Check for missing data 2. Check for normality 3. Remove outliers 4. Check for normality again 5. Transform data

Compute into variable

Statistical methods include diagnostic hypothesis tests for normality, and a rule of thumb that says a variable is reasonably close to normal if its skewness and kurtosis have values between 1.0 and +1.0.

What do we do when Data are Missing?


Listwise (casewise) deletion:: uses only complete cases

Step 2 Check for normality


Still using information from explore in SPSS Look at: 1. Descriptive table 2. Tests of Normality table 3. Histogram 4. Box plot

Step 2 Check for normality

Since the sample size is larger than 50, we use the Kolmogorov-Smirnov test. If the sample size were 50 or less, we would use the Shapiro-Wilk statistic instead. The null hypothesis for the test of normality states that the actual distribution of the variable is equal to the expected distribution, i.e., the variable is normally distributed. Since the probability associated with the test of normality is < 0.001 is less than or equal to the level of significance (0.01), we reject the null hypothesis and conclude that total hours spent on the Internet is not normally distributed. (Note: we report the probability as <0.001 instead of .000 to be clear that the probability is not really zero.)

Step 2 Check for normality

Step 3 Remove outliers


Remove data points highlighted in box plot Not the best method Schweinle Method Remove data that is 2.5 SD from mean

Step 3 Remove outliers


Schweinle Method 1. SD x 2.5--- 0.61463 x 2.5 ---- 1.53657 2. Add that value to the mean
1.53657+ 3.8026 = 5.339175 Remove any values above 5.339175

Step 3 Remove outliers


SPSS: Data select cases Select if condition is satisfied
Variable <= 5.339175 SPSS will not analyze data that is over 5.339175

Click continue and OK

Step 4 Check for normality again!

Step 5 Transform data

Step 5 Transform data


Transform data square root
SPSS: transform compute Target variable: enter new name Ex: sqrt Click on arithmetic under function group Click on sqrt under functions and special variables Click on the up arrow to bring sqrt(?) to numeric expression box Highlight variable to be transformed and click the right arrow to replace the (?)
Explore data again to check for normality

Reliability

Creating graphic illustration

Descriptive analysis

Inferential statistic

T-test

Correlation

Two-way anova

Regression

Das könnte Ihnen auch gefallen