You are on page 1of 578

SigmaPlot Statistics

1

Using the Advisor Wizard

3

Select what you need to do . . . . . . . . . . . . . . . . . . . . . . . 4 How are the data measured? . . . . . . . . . . . . . . . . . . . . . . 5 Did you apply more than one treatment per subject? . . . . . . . . . . 7 How many groups or treatments are there? . . . . . . . . . . . . . . . 8 What kind of data do you have? . . . . . . . . . . . . . . . . . . . .11 What kind of prediction do you want to make? . . . . . . . . . . . .12 What kind of curve do you want to use? . . . . . . . . . . . . . . . 13 How do you want to specify the independent variables? . . . . . . . .14 How do you want SigmaPlot to select the independent variable?. . . 15

Using Statistical Procedures

17

Using SigmaPlot Procedures . . . . . . . . . . . . . . . . . . . . . .17 Running SigmaPlot Procedures . . . . . . . . . . . . . . . . . . . . .17 Choosing the Procedure to Use . . . . . . . . . . . . . . . . . . . . .22 Describing Your Data with Basic Statistics . . . . . . . . . . . . . . 23 Choosing the Group Comparison Test to Use . . . . . . . . . . . . .29 Choosing the Repeated Measures Test to Use . . . . . . . . . . . . .34 When to Compare Effects on Individuals After Multiple Treatments 36 Choosing the Rate and Proportion Comparison to Use . . . . . . . . .38 Choosing the Prediction or Correlation Method . . . . . . . . . . . .39 Choosing the Survival Analysis to Use . . . . . . . . . . . . . . . . .41

iii

Testing Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Determining Experimental Power and Sample Size . . . . . . . . . 47 How to Determine the Power of an Intended Test . . . . . . . . . . 48 How To Estimate the Sample Size Necessary to Achieve a Desired Power48

Comparing Two or More Groups

51

About Group Comparison Tests . . . . . . . . . . . . . . . . . . . . 51 Parametric and Nonparametric Tests . . . . . . . . . . . . . . . . . 51 Comparing Two Groups . . . . . . . . . . . . . . . . . . . . . . . 52 Comparing Many Groups . . . . . . . . . . . . . . . . . . . . . . . 52 Data Format for Group Comparison Tests . . . . . . . . . . . . . . 52 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 53 Arranging Data for t-tests and ANOVAs . . . . . . . . . . . . . . . 53 Unpaired t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 About the Unpaired t-test . . . . . . . . . . . . . . . . . . . . . . . 57 Performing an Unpaired t-Test . . . . . . . . . . . . . . . . . . . . 58 Arranging t-Test Data . . . . . . . . . . . . . . . . . . . . . . . . . 58 Setting t-Test Options . . . . . . . . . . . . . . . . . . . . . . . . . 59 Running a t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Interpreting t-Test Results . . . . . . . . . . . . . . . . . . . . . . . 65 t-Test Report Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 68 Mann-Whitney Rank Sum Test . . . . . . . . . . . . . . . . . . . . 70 About the Mann-Whitney Rank Sum Test . . . . . . . . . . . . . . 71 Performing a Mann-Whitney Rank Sum Test . . . . . . . . . . . . 71 Arranging Rank Sum Data . . . . . . . . . . . . . . . . . . . . . . 72 Setting Mann-Whitney Rank Sum Test Options . . . . . . . . . . . 72 Running a Rank Sum Test . . . . . . . . . . . . . . . . . . . . . . 75 Interpreting Rank Sum Test Results . . . . . . . . . . . . . . . . . 76

iv

Rank Sum Test Report Graphs . . . . . . . . . . . . . . . . . . . . .78 One Way Analysis of Variance (ANOVA) . . . . . . . . . . . . . . .80 Performing a One Way ANOVA . . . . . . . . . . . . . . . . . . . .81 Arranging One Way ANOVA Data . . . . . . . . . . . . . . . . . .82 Setting One Way ANOVA Options . . . . . . . . . . . . . . . . . .82 Running a One Way ANOVA . . . . . . . . . . . . . . . . . . . . .87 Multiple Comparison Options for a One Way ANOVA . . . . . . . .89 Interpreting One Way ANOVA Results . . . . . . . . . . . . . . . .90 One Way ANOVA Report Graphs . . . . . . . . . . . . . . . . . . .96 Two Way Analysis of Variance (ANOVA) . . . . . . . . . . . . . .98 About the Two Way ANOVA . . . . . . . . . . . . . . . . . . . . .99 Performing a Two Way ANOVA . . . . . . . . . . . . . . . . . . . .99 Arranging Two Way ANOVA Data . . . . . . . . . . . . . . . . . 100 Setting Two Way ANOVA Options . . . . . . . . . . . . . . . . . 105 Running a Two Way ANOVA . . . . . . . . . . . . . . . . . . . . 109 Multiple Comparison Options for a Two Way ANOVA . . . . . . . 111 Performing a One Way ANOVA on Two Way ANOVA Data . . . . 114 Interpreting Two Way ANOVA Results . . . . . . . . . . . . . . . 114 Two Way ANOVA Report Graphs . . . . . . . . . . . . . . . . . . 122 Three Way Analysis of Variance (ANOVA) . . . . . . . . . . . . . 123 About the Three Way ANOVA . . . . . . . . . . . . . . . . . . . . 124 Performing a Three Way ANOVA . . . . . . . . . . . . . . . . . . 124 Arranging Three Way ANOVA Data . . . . . . . . . . . . . . . . . 125 Setting Three Way ANOVA Options . . . . . . . . . . . . . . . . . 129 Running a Three Way ANOVA . . . . . . . . . . . . . . . . . . . 134 Multiple Comparison Options for a Three Way ANOVA . . . . . . 136 Interpreting Three Way ANOVA Results . . . . . . . . . . . . . . 139 Three Way ANOVA Report Graphs . . . . . . . . . . . . . . . . . 146 Kruskal-Wallis Analysis of Variance on Ranks . . . . . . . . . . . 147 About the Kruskal-Wallis ANOVA on Ranks . . . . . . . . . . . . 148 Performing an ANOVA on Ranks . . . . . . . . . . . . . . . . . . 148

v

Arranging ANOVA on Ranks Data . . . . . . . . . . . . . . . . . . 149 Setting the ANOVA on Ranks Options . . . . . . . . . . . . . . . . 150 Running an ANOVA on Ranks . . . . . . . . . . . . . . . . . . . 154 Multiple Comparison Options for ANOVA on Ranks . . . . . . . . 156 Interpreting ANOVA on Ranks Results . . . . . . . . . . . . . . . 157 ANOVA on Ranks Report Graphs . . . . . . . . . . . . . . . . . . 161 Performing a Multiple Comparison . . . . . . . . . . . . . . . . . . 162 Holm-Sidak Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Tukey Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Student-Newman-Keuls (SNK) Test . . . . . . . . . . . . . . . . . 164 Bonferroni t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Fisher’s Least Significance Difference Test . . . . . . . . . . . . . 165 Dunnett’s Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Dunn’s test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Duncan’s Multiple Range . . . . . . . . . . . . . . . . . . . . . . . 165

One Sample t-Test

167

About the One Sample t-Test . . . . . . . . . . . . . . . . . . . . . 167 Performing a One Sample t-Test . . . . . . . . . . . . . . . . . . . 167 Arranging One Sample t-Test Data . . . . . . . . . . . . . . . . . . 168 Setting One Sample t-Test Data Options . . . . . . . . . . . . . . . 168 Running a One Sample t-Test . . . . . . . . . . . . . . . . . . . . . 171 Interpreting One Sample t-Test Results . . . . . . . . . . . . . . . . 172 One Sample t-Test Report Graphs . . . . . . . . . . . . . . . . . . 173

vi

Comparing Repeated Measurements of the Same Individuals 175
About Repeated Measures Tests . . . . . . . . . . . . . . . . . . . 175 Parametric and Nonparametric Tests . . . . . . . . . . . . . . . . . 175 Comparing Individuals Before and After a Single Treatment . . . . 176 Comparing Individuals Before and After Multiple Treatments . . . 176 Data Format for Repeated Measures Tests . . . . . . . . . . . . . . 176 Raw Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Indexed Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Paired t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Performing a Paired t-test . . . . . . . . . . . . . . . . . . . . . . . 178 Arranging Paired t-Test Data . . . . . . . . . . . . . . . . . . . . . 179 Setting Paired t-Test Options . . . . . . . . . . . . . . . . . . . . . 179 Running a Paired t-Test . . . . . . . . . . . . . . . . . . . . . . . . 183 Interpreting Paired t-Test Results . . . . . . . . . . . . . . . . . . . 185 Paired t-Test Report Graphs . . . . . . . . . . . . . . . . . . . . . 188 Wilcoxon Signed Rank Test . . . . . . . . . . . . . . . . . . . . . 190 About the Signed Rank Test . . . . . . . . . . . . . . . . . . . . . 191 Performing a Signed Rank Test . . . . . . . . . . . . . . . . . . . . 191 Arranging Signed Rank Data . . . . . . . . . . . . . . . . . . . . . 192 Setting Signed Rank Test Options . . . . . . . . . . . . . . . . . . 192 Running a Signed Rank Test . . . . . . . . . . . . . . . . . . . . . 195 Interpreting Signed Rank Test Results . . . . . . . . . . . . . . . . 196 Signed Rank Test Report Graphs . . . . . . . . . . . . . . . . . . . 198 One Way Repeated Measures Analysis of Variance (ANOVA) . . . 200 About the One Way Repeated Measures ANOVA . . . . . . . . . . 201 Performing a One Way Repeated Measures ANOVA . . . . . . . . 201 Arranging One Way Repeated Measures ANOVA Data . . . . . . . 202 Setting One Way Repeated Measures ANOVA Options . . . . . . . 203

vii

Running a One Way Repeated Measures ANOVA . . . . . . . . . 206 Multiple Comparison Options (One Way RM ANOVA) . . . . . . . 208 Interpreting One Way Repeated Measures ANOVA Results . . . . . 209 One Way Repeated Measures ANOVA Report Graphs . . . . . . . 216 Two Way Repeated Measures Analysis of Variance (ANOVA) . . . 218 About the Two Way Repeated Measures ANOVA . . . . . . . . . . 218 Performing a Two Way Repeated Measures ANOVA . . . . . . . . 219 Arranging Two Way Repeated Measures ANOVA Data . . . . . . . 219 Set Two Way Repeated Measures ANOVA Options . . . . . . . . . 224 Running a Two Way Repeated Measures ANOVA . . . . . . . . . . 227 Multiple Comparison Options (Two Way RM ANOVA) . . . . . . 229 Interpreting Two Way Repeated Measures ANOVA Results . . . . 230 Two way repeated measures ANOVA report graphs . . . . . . . . . 238 Friedman Repeated Measures Analysis of Variance on Ranks . . . . 239 About the Repeated Measures ANOVA on Ranks . . . . . . . . . . 239 Performing a Repeated Measures ANOVA on Ranks . . . . . . . . 239 Arranging Repeated Measures ANOVA on Ranks Data . . . . . . . 240 Setting the Repeated Measures ANOVA on Ranks Options . . . . . 240 Running a Repeated Measures ANOVA on Ranks . . . . . . . . . . 243 Multiple Comparison Options (RM ANOVA on ranks) . . . . . . . 244 Interpreting Repeated Measures ANOVA on Ranks Results . . . . . 245 Repeated Measures ANOVA on Ranks Report Graphs . . . . . . . . 249

Comparing Frequencies, Rates, and Proportions 251
About Rate and Proportion Tests . . . . . . . . . . . . . . . . . . 251 Contingency Tables . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Comparing the Proportions of Two Groups in One Category . . . . 252

viii

Comparing Proportions of Multiple Groups in Multiple Categories . 252 Comparing Proportions of the Same Group to Two Treatments . . . 252 Yates Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Data Format for Rate and Proportion Tests . . . . . . . . . . . . . . 253 z-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Chi—Squared Analysis of Contingency Tables . . . . . . . . . . . 253 Fisher Exact Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 McNemar’s Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 Comparing Proportions Using the z-Test . . . . . . . . . . . . . . . 258 About the z-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 Performing a z-test . . . . . . . . . . . . . . . . . . . . . . . . . . 258 Arranging z-test Data . . . . . . . . . . . . . . . . . . . . . . . . . 259 Setting z-test Options . . . . . . . . . . . . . . . . . . . . . . . . . 259 Running a z-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Interpreting Proportion Comparison Results . . . . . . . . . . . . . 262 Chi-square Analysis of Contingency Tables . . . . . . . . . . . . . 265 About the Chi-Square Test . . . . . . . . . . . . . . . . . . . . . . 266 Performing a Chi-Square Test . . . . . . . . . . . . . . . . . . . . 266 Arranging Chi-Square Data . . . . . . . . . . . . . . . . . . . . . . 267 Setting Chi-Square Options . . . . . . . . . . . . . . . . . . . . . . 268 Running a Chi-Square Test . . . . . . . . . . . . . . . . . . . . . . 270 Interpreting Results of a Chi-Squared Analysis of Contingency tables 272 The Fisher Exact Test . . . . . . . . . . . . . . . . . . . . . . . . . 275 About the Fisher Exact Test . . . . . . . . . . . . . . . . . . . . . 275 Performing a Fisher Exact Test . . . . . . . . . . . . . . . . . . . . 276 Running a Fisher Exact Test . . . . . . . . . . . . . . . . . . . . . 277 Interpreting Results of a Fisher Exact Test . . . . . . . . . . . . . . 279 McNemar’s Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 About McNemar’s Test . . . . . . . . . . . . . . . . . . . . . . . . 281 Performing McNemar’s Test . . . . . . . . . . . . . . . . . . . . . 282 Arranging McNemar Test Data . . . . . . . . . . . . . . . . . . . . 282

ix

Setting McNemar’s Options

. . . . . . . . . . . . . . . . . . . . . 284

Running McNemar’s Test . . . . . . . . . . . . . . . . . . . . . . . 285 Interpreting Results of McNemar’s Test . . . . . . . . . . . . . . . 286 Relative Risk Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 About the Relative Risk Test . . . . . . . . . . . . . . . . . . . . . 288 Performing the Relative Risk Test . . . . . . . . . . . . . . . . . . 289 Arranging Relative Risk Test Data . . . . . . . . . . . . . . . . . . 289 Setting Relative Risk Test Options . . . . . . . . . . . . . . . . . . 290 Running the Relative Risk Test . . . . . . . . . . . . . . . . . . . . 291 Interpreting Results of the Relative Risk Test . . . . . . . . . . . . 293 Odds Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 About the Odds Ratio Test . . . . . . . . . . . . . . . . . . . . . . 295 Performing the Odds Ratio Test . . . . . . . . . . . . . . . . . . . 296 Arranging Odds Ratio Test Data . . . . . . . . . . . . . . . . . . . 296 Setting Odds Ratio Test Options . . . . . . . . . . . . . . . . . . . 297 Running the Odds Ratio Test . . . . . . . . . . . . . . . . . . . . . 298 Interpreting Results of the Odds Ratio Test . . . . . . . . . . . . . . 300

Prediction and Correlation

303

About Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Data Format for Regression and Correlation . . . . . . . . . . . . . 305 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . 306 About the Simple Linear Regression . . . . . . . . . . . . . . . . . 306 Performing a Linear Regression . . . . . . . . . . . . . . . . . . . 307 Arranging Linear Regression data . . . . . . . . . . . . . . . . . . 307 Setting Linear Regression Options . . . . . . . . . . . . . . . . . . 307 Running a Linear Regression . . . . . . . . . . . . . . . . . . . . . 315

x

Interpreting Simple Linear Regression Results . . . . . . . . . . . . 316 Simple Linear Regression Report Graphs . . . . . . . . . . . . . . 325 Multiple Linear Regression . . . . . . . . . . . . . . . . . . . . . . 325 About the Multiple Linear Regression . . . . . . . . . . . . . . . . 326 Performing a Multiple Linear Regression . . . . . . . . . . . . . . 327 Setting Multiple Linear Regression Options . . . . . . . . . . . . . 327 Running a Multiple Linear Regression . . . . . . . . . . . . . . . . 336 Interpreting Multiple Linear Regression Results . . . . . . . . . . . 338 Multiple Linear Regression Report Graphs . . . . . . . . . . . . . . 347 Multiple Logistic Regression . . . . . . . . . . . . . . . . . . . . . 348 About the Multiple Logistic Regression . . . . . . . . . . . . . . . 349 Performing a Multiple Logistic Regression . . . . . . . . . . . . . 349 Arranging Multiple Logistic Regression Data . . . . . . . . . . . . 350 Setting Multiple Logistic Regression Options . . . . . . . . . . . . 350 Running a Multiple Logistic Regression . . . . . . . . . . . . . . . 359 Interpreting Multiple Logistic Regression Results . . . . . . . . . . 360 Polynomial Regression . . . . . . . . . . . . . . . . . . . . . . . . 369 About the Polynomial Regression . . . . . . . . . . . . . . . . . . 369 Performing a Polynomial Regression . . . . . . . . . . . . . . . . . 370 Arranging Polynomial Regression Data . . . . . . . . . . . . . . . 370 Setting Polynomial Regression Options . . . . . . . . . . . . . . . 371 Running a Polynomial Regression . . . . . . . . . . . . . . . . . . 378 Interpreting Incremental Polynomial Regression Results . . . . . . 379 Interpreting Order Only Polynomial Regression Results . . . . . . . 382 Polynomial Regression Report Graphs . . . . . . . . . . . . . . . . 387 Stepwise Linear Regression . . . . . . . . . . . . . . . . . . . . . 387 About Stepwise Linear Regression . . . . . . . . . . . . . . . . . . 388 Performing a Stepwise Linear Regression . . . . . . . . . . . . . . 389 Arranging Stepwise Regression Data . . . . . . . . . . . . . . . . . 390 Setting Forward Stepwise Regression Options . . . . . . . . . . . . 390 Setting Backward Stepwise Regression Options . . . . . . . . . . . 401

xi

Running a Stepwise Regression . . . . . . . . . . . . . . . . . . . . 412 Interpreting Stepwise Regression Results . . . . . . . . . . . . . . . 413 Stepwise Regression Report Graphs . . . . . . . . . . . . . . . . . 423 Best Subsets Regression . . . . . . . . . . . . . . . . . . . . . . . 424 About Best Subset Regression . . . . . . . . . . . . . . . . . . . . 424 "Best" Subsets Criteria . . . . . . . . . . . . . . . . . . . . . . . . 425 Performing a Best Subset Regression . . . . . . . . . . . . . . . . . 425 Arranging Best Subset Regression Data . . . . . . . . . . . . . . . 426 Setting Best Subset Regression Options . . . . . . . . . . . . . . . 426 Running a Best Subset Regression . . . . . . . . . . . . . . . . . . 429 Interpreting Best Subset Regression Results . . . . . . . . . . . . . 430 Pearson Product Moment Correlation . . . . . . . . . . . . . . . . . 433 About the Pearson Product Moment Correlation Coefficient . . . . . 434 Computing the Pearson Product Moment Correlation Coefficient . . 434 Running a Pearson Product Moment Correlation . . . . . . . . . . . 435 Interpreting Pearson Product Moment Correlation Results . . . . . . 436 Pearson Product Moment Correlation Report Graph . . . . . . . . . 437 Spearman Rank Order Correlation . . . . . . . . . . . . . . . . . . 438 About the Spearman Rank Order Correlation Coefficient . . . . . . 438 Computing the Spearman Rank Order Correlation Coefficient . . . . 438 Arranging Spearman Rank Order Correlation Coefficient Data . . . 439 Running a Spearman Rank Order Correlation . . . . . . . . . . . . 439 Interpreting Spearman Rank Correlation Results . . . . . . . . . . . 440 Spearman Rank Order Correlation Report Graph . . . . . . . . . . . 441

Survival Analysis

443

Three Survival Tests . . . . . . . . . . . . . . . . . . . . . . . . . 443 Two Multiple Comparison Tests . . . . . . . . . . . . . . . . . . . 444

xii

Data Format for Survival Analysis . . . . . . . . . . . . . . . . . . 444 Raw Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Indexed Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Single Group Survival Analysis . . . . . . . . . . . . . . . . . . . 446 Performing a Single Group Survival Analysis . . . . . . . . . . . . 446 Arranging Single Group Survival Analysis Data . . . . . . . . . . . 447 Setting Single Group Test Options . . . . . . . . . . . . . . . . . . 448 Running a Single Group Survival Analysis . . . . . . . . . . . . . . 451 Interpreting Single Group Survival Results . . . . . . . . . . . . . 453 Single Group Survival Graph . . . . . . . . . . . . . . . . . . . . . 455 LogRank Survival Analysis . . . . . . . . . . . . . . . . . . . . . . 455 Performing a LogRank Analysis . . . . . . . . . . . . . . . . . . . 456 Arranging LogRank Survival Analysis Data . . . . . . . . . . . . . 457 Setting LogRank Survival Options . . . . . . . . . . . . . . . . . . 457 Running a LogRank Survival Analysis . . . . . . . . . . . . . . . . 461 Interpreting LogRank Survival Results . . . . . . . . . . . . . . . . 467 LogRank Survival Graph . . . . . . . . . . . . . . . . . . . . . . . 469 Gehan-Breslow Survival Analysis . . . . . . . . . . . . . . . . . . 470 Performing a Gehan-Breslow Analysis . . . . . . . . . . . . . . . . 471 Arrange Gehan-Breslow Survival Analysis Data . . . . . . . . . . . 472 Setting Gehan-Breslow Survival Options . . . . . . . . . . . . . . 472 Running a Gehan-Breslow Survival Analysis . . . . . . . . . . . . 476 Interpreting Gehan-Breslow Survival Results . . . . . . . . . . . . 482 Gehan-Breslow Survival Graph . . . . . . . . . . . . . . . . . . . 484 Survival Curve Graph Examples . . . . . . . . . . . . . . . . . . . 485 Using Test Options to Modify Graphs . . . . . . . . . . . . . . . . 486 Editing Survival Graphs Using Graph Properties . . . . . . . . . . 488 Failures, Censored Values, and Ties . . . . . . . . . . . . . . . . . 489 Cox Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490 About Cox Regression . . . . . . . . . . . . . . . . . . . . . . . . 491 Performing a Cox Regression Proportional Hazards Model . . . . . 493

xiii

. . 495 Setting Cox Regression PH Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518 Determining the Power to Detect a Specified Correlation . . . . . . 495 Setting Cox Regression Stratified Options . . . . . . . . . . . . . . . . . . . . . . . . . . 511 Determining the Power of a z-Test Proportions Comparison . . . . . . . . . . . . . 508 Determining the Power of a t-Test . . 498 Running a Cox Regression . . . . . .Performing a Cox Regression Stratified Model . . . . . . . . . . . . . . . . . . 513 Determining the Power of a One Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 Determining the Power of a Chi-Square Test . . 530 Determining the Minimum Sample Size for a Chi-Square Test . . . . . . . . . . . . . . . 504 Computing Power and Sample Size 507 About Power . . . . . . . . . . 523 Determining the Minimum Sample Size for a Paired t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 502 Cox Regression Graph . . . . . . . . . . . . . . . 533 Determining the Minimum Sample Size to Detect a Specified Correlation 536 xiv . . . . . . . . . . . . . 521 Determining the Minimum Sample Size for a t-Test . . . 494 Arranging Cox Regression Data . . . . . . . . . . 507 About Sample Size . . . . 508 Determining the Power of a Paired t-Test . . . . . . . . . . . 500 Interpreting Cox Regression Results . . . . . . . . . . . . . . . . 525 Determining the Minimum Sample Size for a Proportions Comparison 527 Determining the Minimum Sample Size for a One Way ANOVA . . .

. . . . . . . . . . .3Way Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 Box Plot . . . . . . . . . . . . . . . . 541 Point Plot . 545 Bar Chart of the Standardized Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 Before and After Line Plots . . . . . . . . . . . . . . . . 549 2D Line/Scatter Plots of the Regressions with Prediction and Confidence Intervals . . . . . . . . . . . 558 xv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556 Profile Plots . . . . . . . . . . 540 Scatter Plot . . . 555 Scatter Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544 Scatter Plot of the Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542 Point Plot and Column Means . . . . . . . . . . . . . . . . . . . . 551 Grouped Bar Chart with Error Bars . . . . . . . . . . . . . . . . . . . . 558 Profile Plots . . . . . . . . . . . . . . . 547 Normal Probability Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2Way Effects . . . . . . . . . . . . . . . . . . . . . . . . . 546 Histogram of Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552 3D Category Scatter Graph . . . . . . . . . . 554 Multiple Comparison Graphs . . . . . . . . . . . . . 558 Profile Plots . . 557 Profile Plots . . . . . . . . . . . . . .Main Effects . . . . . . 550 3D Residual Scatter Plot . . . . . . . . . . . . . . .Generating Report Graphs 539 Bar Charts of the Column Means . . . . . . . . . . . . . . . . . . . .

xvi .

The tests and features described in this user’s manual include: Using the Advisor Wizard. See page 539. and proportions. See page 3 Using SigmaPlot procedures. provide a wide range of powerful yet easy to use statistical analyses specifically designed to meet the needs of researchers. Prediction and correlation. See page 175. Survival analysis. See “Using Statistical Procedures” on page 17. Comparing two or more groups. See page 443. without requiring in-depth knowledge of the math behind the procedures performed. imported directly from SigmaStat. See page 507. See page 251. 1 .Chapter 1 SigmaPlot Statistics SigmaPlot Statistics. See page 51. See page 303. rates. Comparing repeated measurements of the same individuals. Generating report graphs. Computing power and sample size. Comparing frequencies.

2 Chapter 1 .

Chapter 2 Using the Advisor Wizard Use the Advisor Wizard to help you to determine the appropriate test to use to analyze your data. First you need to start the Advisor Wizard. 1. There are two ways: You can select from the menus: Help Statistics Advisor Wizard Figure 2-1 Starting the Advisor Wizard from the Help menu. 3 .

Select this option if you want to compare data for significant differences. Click Next. The Pick Columns dialog box for the suggested test appears prompting you to select the worksheet columns with the data you want to test. 2. After the Advisor Wizard suggests a test. For more information. Standard toolbar. When the Advisor Wizard appears. click Run to perform the test. For more information. Finish to view the suggested test. is defining what you want to accomplish. or the distributions or proportions of different groups. see “Describing Your Data with Basic Statistics” on page 23. the data for different treatments on the same subjects. The Advisor Wizard begins by asking you if you need to: Describe your data with basic statistics. Back to go to the preceding panel. for example. or Cancel to close the Advisor Wizard. or fit a curve. You are asked to describe how your data is measured. Predict a trend.4 Chapter 2 Or you can click the Advise button on the Statistics Toolbar. Compare groups or treatments for significant differences. The data to be compared can be the data collected from different groups. find a correlation. answer the questions about what you want to do and the format of your data. Select what you need to do The first step in assigning a test appropriate to your data. Select this option if you want to use regression to predict a dependent variable from one or more independent variables. For more information. if you want to compare the mean blood pressure of people who are receiving different drug treatments. see “Picking Data to Test” on page 19. Click Next to go to the next panel. Figure 2-2 Selecting the Advisor Wizard on the Statistics Toolbar. see “How are the data measured?” on page 5. or describe the strength of association between two variables with a correlation . Select this option if you want to view a list of descriptive statistics for one or more columns of data. 3. The remainder of this section describes the answers for each dialog box.

Data can be measured in four ways: . For more information. see “How are the data measured?” on page 5. see “How are the data measured?” on page 5. Click Next. Select this option to determine the power or ability of a test to detect an effect for an experiment you want to perform. For more information. For example. see “How are the data measured?” on page 5.5 Using the Advisor Wizard coefficient. You are asked to describe how your data is measured. For more information. Select this option if you want to determine the desired sample size for an experiment you intend to perform. Figure 2-3 The Advisor Wizard How are the data measured? You need to define how your data are measured to determine which test to perform for most procedures. You are asked to describe how your data is measured. Determine the sensitivity of an experimental design. You are asked to describe how your data is measured. Click Next. select this option if you want to see if you can predict the average caloric intake of an animal from its weight. Determine the sample size for an experimental design. Click Next.

These ratings show that being dead is worse than being healthy. Select By order or rank if your data are measured on a rank scale that has an ordering relationship. Feeling ill = 2.. If you are predicting a trend. weight. For more information. For more information. see “Did you apply more than one treatment per subject?” on page 7. Examples of numeric values include height. see “Multiple Logistic Regression” on page 348. female). concentrations. For more information. If you are comparing groups or treatments for differences. If you are comparing groups or treatments for differences. click Finish. For more information. you are asked if you have repeated observations on the same individuals. click Finish. between values. For example. By proportion or number of observations (i. but they do not indicate that being dead is five times worse than being healthy. and where there is no relationship between the categories (such as Democrat versus Republican). If you are predicting a trend. or Help for information on the test. but no arithmetic relationship. For more information. If you are comparing groups or treatments for differences. such as: Healthy = 1. Sick = 3. . By order or rank. If you are predicting a trend. see “How many groups or treatments are there?” on page 8. Hospitalized = 4. clinical status is often measured on an ordinal scale. Click Run to perform the test. Cancel to exit the Advisor Wizard and return to the worksheet. you are asked if you have repeated observations on the same individuals. see “What kind of prediction do you want to make?” on page 12. Select By numeric values if your data are measured on a continuous scale using numbers. you are asked how many groups or treatments you have. If you are determining the sample size of or the sensitivity of an experimental design. SigmaPlot suggests running a Multiple Logistic Regression.6 Chapter 2 By numeric values. which counts the number or proportions that fall into categories. ages. and Dead = 5. see “Did you apply more than one treatment per subject?” on page 7. or any measurement where there is an arithmetic relationship between values. For more information. you are asked if you have repeated observations on the same individuals. see “Did you apply more than one treatment per subject?” on page 7. Select By proportion or number of observations in categories if your data is measured on a nominal scale. you are prompted to select the type of prediction you want to perform. male vs.e.

If you are comparing survival groups for significant differences. For more information. for example. click Finish. For more information. For more information. you are asked how your data is formatted. see “LogRank Survival Analysis” on page 455. you must specify whether the observations were. you are asked to specify the number of groups or treatments. Did you apply more than one treatment per subject? If you are comparing groups or treatments. Select Yes if you think the later survival times are less accurate than the early times. but other events like the time to motor failure or the time to vascular graph closure are equally valid. This event is typically a death. when there are many more late censored values. then you are asked whether later survival times are less accurate. For example. By survival time. or if you were taking an opinion poll of the same voters before and after a political debate. For more information. then the Advisor Wizard selects the Single Group survival analysis. There are also descriptions available of the results . If you are comparing groups on an arithmetic or rank scale. you would select Yes if you were testing the effect of changing diet on the cholesterol level of experimental subjects. Select Yes when you are comparing the same individuals before and after one or more different treatments or changes in condition. If you are comparing group proportions or distribution in categories. Yes. see “How many groups or treatments are there?” on page 8. Answer Yes if the observations are different treatments made on the same subjects. SigmaPlot suggests performing McNemar’s Test. on the same or different subjects. and your data is measured on a continuous numeric scale. then click Next. see “McNemar’s Test” on page 281. see “What kind of data do you have?” on page 11. see “Gehan-Breslow Survival Analysis” on page 470. For more information. or are to be made. see “Single Group Survival Analysis” on page 446. In this case the Advisor Wizard will suggest use of the Gehan-Breslow test. This might occur. Select No if all data is considered equally accurate and the Advisor Wizard will suggest use of the LogRank test. Select By survival time if you have measurements that correspond to the time to an event.7 Using the Advisor Wizard If you are determining a sample size or the sensitivity of a experimental group. For more information. or determining sample size or power. Select Yes or No. If you wish to describe your survival data’s statistics.

then Run to perform it. see "Interpreting t-Test Results" in Chapter 4. For more information.8 Chapter 2 for this procedure. see “Interpreting Results of McNemar’s Test” on page 286. No. if you are comparing differences in hormone levels between men and women. If you are comparing two different groups on an arithmetic scale. Two. If you are seeing if there is a difference between different groups. Answer No if each observation was obtained from a different subject. You can also click Back to return to the previous dialog box. SigmaPlot suggests the independent t-test. or click Help for information on using the Advisor Wizard. you are asked to specify the number of groups or treatments. If you are comparing groups on an arithmetic or rank scale. You can read descriptions of the results for this procedure. you are not repeating observations. For more information. see “How many groups or treatments are there?” on page 8. or a test is suggested. such as comparing the weights of three different populations of elephants. Select this option if you have two different experimental groups or if your subjects underwent two different treatments. you are asked what kind of data you have. For more information. or if you are measuring the change in individuals before and after a drug treatment. After specifying the number of groups. . Tip: Click Finish to view the suggested test. Select one of the following: One. How many groups or treatments are there? When comparing groups or treatments or determining sample size or power and your data is measured on a continuous numeric or rank scale. For more information. see "Unpaired t-Test" in Chapter 4. You should only select Yes if you are comparing the same individuals before and after one or more treatments. see “What kind of data do you have?” on page 11. If you are comparing group proportions or distribution in categories. For example. Cancel to return to the worksheet. Select this option if you have one different experimental group. you are asked more questions. For more information. For more information. there are two groups. see “About the One Sample t-Test” on page 167. SigmaPlot asks you how many treatments or conditions are involved.

For more information. SigmaPlot suggests performing the Paired t-test. you are analyzing three or more groups. For more information. see “Wilcoxon Signed Rank Test” on page 190. see“Determining the Power of a t-Test” on page 508. if you collected ethnic diversity data from five different cities.9 Using the Advisor Wizard If you are determining sample size or power for a comparison of two groups on an arithmetic scale. SigmaPlot suggests performing the Wilcoxon Signed Rank Test. SigmaPlot suggests performing One Way ANOVA. You can also read descriptions of the results for this procedure. SigmaPlot suggests performing Paired t-test sample size or power computations. For more information. If you are comparing two different groups on a rank scale. If you are comparing the same subjects undergoing two different treatments on a rank scale. Three or more. see “Interpreting Paired t-Test Results” on page 185. You can also determine the power. see “Interpreting Signed Rank Test Results” on page 196. or are comparing the response of the same subjects to three or more different treatments. You can also read descriptions of the results for this procedure. For more information. or subjected individuals to a series of four dietary changes and measured change in serum cholesterol. SigmaPlot suggests that you perform t-test sample size or power computations. For more information. see “Determining the Minimum Sample Size for a Paired t-Test” on page 525. Select this option if your group has three or more different groups to compare. see “Paired t-Test” on page 177. see “One Way Analysis of Variance (ANOVA)” on page 80. For more information. For example. SigmaPlot suggests performing the Mann-Whitney Rank Sum Test. If you are comparing three or more different groups on an arithmetic scale. If you are determining sample size or power for a comparison of three or more different groups on an arithmetic scale. For more information. For more information. see “Determining the Power of a Paired t-Test” on page 511. If you are determining sample size or power for a comparison of the same subjects undergoing two treatments on an arithmetic scale. SigmaPlot suggests per “Determining the . You can also read descriptions of the results for this procedure. If you are comparing the same subjects undergoing two different treatments on an arithmetic scale. For more information. see“MannWhitney Rank Sum Test” on page 70. For more information. see “Interpreting Rank Sum Test Results” on page 76. For more information.You can also read directions on determining power. see “Determining the Minimum Sample Size for a t-Test” on page 523.

Select this option if each experimental subject is affected by two different experimental factors or underwent two different treatments simultaneously. SigmaPlot suggests the Kruskal-Wallis ANOVA on Ranks. such as male and female for gender. see “Determining the Power of a One Way ANOVA” on page 515.e. You can also perform power computations. if you were comparing only males and females. If you are comparing the same subjects undergoing three or more different treatments on an arithmetic scale. are not considered to be different factors. SigmaPlot suggests Two Way Repeated Measures ANOVA. SigmaPlot suggests performing Two Way ANOVA. Note that either one or both factors can be repeated treatments. For more information. you would have only one factor. For more information. For more information. For more information. For more information. SigmaPlot suggests performing One Way Repeated Measures ANOVA. see“One Way Repeated Measures Analysis of Variance (ANOVA)” on page 200. For example. For more information. You can also read descriptions of the results for this procedure. If you are comparing three or more different groups on a rank scale. see “Two Way Repeated Measures Analysis of Variance . see “Two Way Analysis of Variance (ANOVA)” on page 98. For more information. For more information. For more information. see “Interpreting One Way Repeated Measures ANOVA Results” on page 209. For more information. Note that different levels of a factor.. If you are comparing three or more different groups on an arithmetic scale. If you are comparing the same subjects undergoing three or more different treatments on a rank scale.10 Chapter 2 Minimum Sample Size for a One Way ANOVA” on page 530. gender and nationality. see “Interpreting ANOVA on Ranks Results” on page 157. see “Interpreting Two Way ANOVA Results” on page 114. You can also read descriptions of the results for this procedure. You can also read descriptions of the results for this procedure. there would be two factors. if you compared males and females from different countries. If you are comparing the same subjects undergoing three or more repeated treatments on an arithmetic scale. see “Interpreting Repeated Measures ANOVA on Ranks Results” on page 245. There are two combinations of groups or treatments to consider (i. You can also read descriptions of the results for this procedure. see “Kruskal-Wallis Analysis of Variance on Ranks” on page 147. however. males and females from different cities). SigmaPlot suggests the Friedman Repeated Measures ANOVA on Ranks. see “Friedman Repeated Measures Analysis of Variance on Ranks” on page 239.

If you are determining power or sample size. For more information. or Cancel to quit the Advisor Wizard and return to the worksheet. Tip: After specifying the kind of data you have. if you are comparing males and females from different countries. Back to return to the previous panel. nationality. SigmaPlot suggests performing power or sample size computations for a correlation coefficient. Cancel to return to the worksheet. For more information. Select this option if each experimental subject is affected by three different experimental factors or underwent three different treatments simultaneously. click Finish to view the suggested test. the number of men and women that voted for a Republican or Democratic candidate. A contingency table uses the groups and categories as the rows and columns. this option also appears. Click Run to perform the test. such as male and female for gender. for example. or Help for information on the test. What kind of data do you have? You can have two kinds of data that are arranged by proportions in categories. there are three factors. A contingency table is a method of displaying the observed numbers of different groups that fall into different categories. These tables are used to see if there is a difference between the expected and observed distributions of the groups in the categories. Select this option if you have data in the form of a contingency table.11 Using the Advisor Wizard (ANOVA)” on page 218. you have only two factors. from Italy and Germany. and places the number of observations for each combination in the cells. Select one of the following: A contingency table. Note that different levels of a factor. If you select this option. There are three combinations of groups to consider. see “Three Way Analysis of Variance (ANOVA)” on page 123. You can also read descriptions of the results for this procedure. and diet. . For example. with different diets. and Italian and German for nationalities are not considered to be different factors. see “Interpreting Two Way Repeated Measures ANOVA Results” on page 230. if you are comparing only males and females. gender. This is a measure of the association between two variables. However. SigmaPlot suggests you run a Three Way ANOVA.

see “Arranging z-test Data” on page 259.12 Chapter 2 If you select a contingency table. you are asked what kind of prediction you want to make. see “Comparing Proportions Using the z-Test” on page 258. Select this option when you have data for the sample sizes of two groups and the proportion of each group that falls into a single category. or fitting a curve and your data is measured on a continuous numeric scale. You can also read descriptions of the results for this procedure. see “Chi-square Analysis of Contingency Tables” on page 265. see “Interpreting Proportion Comparison Results” on page 262. For more information. For more information. This data is used to see if there is a difference between the proportion of two different groups that fall into the category. SigmaPlot suggests performing a Linear Regression. There are three different goals available when you are trying to predict one dependent variable from one or more independent variables. You can also read descriptions of the results for this procedure. For more information. see “Interpreting Simple Linear Regression Results” on page 316. What kind of prediction do you want to make? If you are predicting a trend. see“Simple Linear Regression” on page 306. For more information. Observed proportions. Select one of the following: Fit a straight line through the data. where y is the dependent variable and x is the independent variable. You can also read descriptions of the results for this procedure. see “Interpreting Results of a Chi-Squared Analysis of Contingency tables” on page 272. If you select this option. If you select this option. For more information. Select this answer to find the slope and the intercept of the line that most closely describes the relationship of your data. After specifying the kind of prediction you want to make. For more information. For more information. click Finish to view the suggested test. . SigmaPlot suggests that you Compare Proportions. SigmaPlot asks more questions or suggests the kind of test to use. SigmaPlot suggests performing a Chi-Square Analysis of Contingency Tables. finding a correlation.

…x k are the k independent variables. For more information. SigmaPlot suggests computing the Pearson Product Moment Correlation. see . b 1. see “How do you want to specify the independent variables?” on page 14. x 2. SigmaPlot asks how you want to specify the independent variables. For more information. If you select this option. Select this option if you want to use a kth order polynomial curve of the form to predict the dependent variable y from the independent variable x. For more information. x 1. see “What kind of curve do you want to use?” on page 13 below.e. where b 0. …b k ) are the regression coefficients. b 1 ( b 2. click Finish. x 3. If you select this option. As the values for xi vary. SigmaPlot asks you what kind of curve you want to use. without specifying which is the dependent and independent variable. you are asked what kind of curve you want to use. Select this option to find how closely the value of one variable predicts the value of another (i. Select this option if you want to predict a dependent variable from more than one independent variable using the linear relationship y = b 0 + b 1 x 1 + b 2 x 2 + b 3 x 3 + …b k x k where y is the dependent variable. Predict a dependent variable from several independent variables. and b 0. Select this answer to find an equation that predicts the dependent variable from an independent variable without assuming a straight line relationship. If you select to fit a curved line through your data. SigmaPlot suggests performing Polynomial Regression. click Finish. You can also read descriptions of the results for this procedure. For more information. the corresponding value for y either increases or decreases proportionately. see “Polynomial Regression” on page 369. b 2.. What kind of curve do you want to use? If you are trying to predict one variable from one or more other variables using a curved line. If you select this option.13 Using the Advisor Wizard Fit a curved line through the data. …b k are the regression coefficients. Select one of the following: A polynomial curve with one independent variable. Measure the strength of association between pairs of variables. the likelihood that a variable increases or decreases when the other variable increases or decreases).

see “Interpreting Order Only Polynomial Regression Results” on page 382. For more information. regardless of whether they contribute significantly to predicting the dependent variable. you can select the independent variables using two methods. The dependent variable and independent variables are selected as columns from the worksheet when the regression procedure is performed. Select this option if you want SigmaPlot to screen the potential independent variables you select and only include ones that significantly contribute to predicting the dependent variable. see “Multiple Linear Regression” on page 325. SigmaPlot suggests using Nonlinear Regression. For more information. see “Interpreting the Nonlinear Regression Results dialog Box” in Chapter 8. see“Interpreting Multiple Linear Regression Results” on page 338. Select this option if you want to describe your data with a nonlinear function. For more information. You can also read descriptions of the results for this procedure. How do you want to specify the independent variables? If you chose to predict a dependent variable from several independent variables. Select one of the following: Include all selected independent variables in the equation. click Finish. You can also read about interpreting Order Only Polynomial Results. If you select this option. logistic sigmoid curves. For more information. Let SigmaPlot select the "best" variables to include in the equation. and hyperbolic curves that approach a maximum or minimum. You can also read descriptions of the results for this procedure. click Finish. Nonlinear Regression uses a dialog box to specify any general nonlinear equation with upto ten independent variables. A general nonlinear equation. Select this option if you want to compute a single equation using all independent variables you select for the equation.14 Chapter 2 “Interpreting Incremental Polynomial Regression Results” on page 379. Common nonlinear functions include rising and falling exponential and log curves. . For more information. You are then asked how you want to select the independent variables. then uses an iterative least squares algorithm to estimate the parameters in the regression model. see “How do you want SigmaPlot to select the independent variable?” on page 15. If you select this option. For more information. see “Nonlinear Regression” Chapter 8. SigmaPlot suggests performing a Multiple Linear Regression.

click Finish. see “Interpreting Stepwise Regression Results” on page 413. For more information. see “Best Subsets Regression” on page 424. see “Stepwise Linear Regression” on page 387. Sequentially remove independent variables from the equation. For more information. see “Stepwise Linear Regression” on page 387. you can select three different methods. Select this option to select the independent variables for the equation by starting with no independent variables. For more information. You . The predictive ability of models produced with forward stepwise regression is measured by their ability to reduce the residual sum of squares in the regression equation. Select this option to select the independent variables for the equation by starting with all independent variables in the equation. then adding variables until the ability to predict the dependent variable is no longer improved. If you select this option. If you select this option. You can also read descriptions of the results for this procedure. then deleting variables one at a time. SigmaPlot suggests the Backward Stepwise Regression. This elimination process continues until the ability of the model to predict the dependent variable is reduced below a specified level. For more information. click Finish. The variable that contributes the least to the prediction of the dependent variable is deleted from the equation first. Select this option if you want SigmaPlot to evaluate all possible regression models. SigmaPlot suggests Forward Stepwise Regression. If you select this option. You can also read descriptions of the results for this procedure. The variables are added in order of the amount of predictive ability they add to the model. SigmaPlot suggests the Best Subset Regression. and isolate the models that "best" predict the dependent variable. Sequentially add new independent variables to the equation. see “Interpreting Stepwise Regression Results” on page 413. Consider all possible combinations of the independent variable and select the best subset. For more information.15 Using the Advisor Wizard How do you want SigmaPlot to select the independent variable? If you are predicting the value of one variable from other variables. and you want SigmaPlot to screen potential variables for their contribution to the predictive value of the regression equation. click Finish. The predictive ability of models produced with backwards stepwise regression is measured by their ability to reduce the residual sum of squares in the regression equation.

see “Interpreting Best Subset Regression Results” on page 430. .16 Chapter 2 can also read descriptions of the results for this procedure. SigmaPlot selects the sets of independent variables that "best" predict the dependent variable using criteria specified in the Best Subsets Regression Options dialog box. For more information.

setting the test options using the selected test’s Options dialog box. For more information. see “Selecting a Test” on page 18. see “Picking Data to Test” on page 19. 3. Running SigmaPlot Procedures In general. Determining and choosing the test you want to perform. 4. For more information. The Advisor Wizard asks you questions about your goals and your data. you can perform statistical procedures directly by choosing the appropriate Statistics menu command.Chapter 3 Using Statistical Procedures Using SigmaPlot Procedures The statistical procedure you use to analyze a given data set depends on the goals of your analysis and the nature of your data. see “Arranging Worksheet Data” on page 18. see “Using the Advisor Wizard” on page 3. For more information. For more information. 2. Alternately. see “Setting Test Options” on page 18. then selects the appropriate test. the steps to run a test or procedure are: 1. 17 . Entering or importing and arranging your data appropriately in the worksheet. Running the test by picking the worksheet columns with the data you want to test using the Pick Columns dialog box. If desired. For more information.

generating. You may wish to enable or disable some of these options or change assumption checking parameters. To change option settings before you run a test: 1. see “Data Format for Repeated Measures Tests” on page 176 Data format for rate and proportion tests. For more information. all changes are saved between sessions.18 Chapter 3 5. Select the test. Viewing. see “Selecting a Test” on page 18. Select a test from the Statistics menu. For more information. see “Reports and Result Graphs” on page 20. For more information. For more information. see “Data Format for Rate and Proportion Tests” on page 253. and interpreting. see “Data Format for Group Comparison Tests” on page 52. You can: Select a test from the drop-down list in the Standard toolbar. Arranging Worksheet Data The method you use to enter or arrange data in the worksheet depends on the type of test you are running. Some data formats include: Data format for group comparison tests. Use these settings to perform additional tests and procedures. For more information. . Data format for repeated measures tests. either from the Standard toolbar or from the Statistics menu. Selecting a Test There are two ways you can select a test. the test reports and graphs. Setting Test Options You can configure almost all statistics procedures with a set of options.

19 Using Statistical Procedures 2. and all successively selected columns are assigned to successive entries in the list. click Apply. Click Run Test to continue the test. 5. The dialog box indicates the type of data you are selecting. To close the dialog box without changing any settings or running the test. Select a check box to include an option in the test. The Pick Columns dialog box for the test appears. or select the columns from the Data drop-down list. . The number or title of selected columns appear in each entry. To accept the current settings without continuing the test. The first selected column is assigned to the first entry in the Selected Columns list. The number of columns you can select depends on the test you are running and the format of your data. Click the tab of the options you want to view. select the columns in the worksheet. From the menus select: Statistics Current Test Options The Options dialog box for the test appears. 1. the Pick Columns dialog box appears prompting you to select the columns with the data you want to test (see the following step). Clear a check box if you do not want to use that test option. click Cancel. 4. To assign the desired worksheet columns to the Selected Columns list. For more information. 6. If the test you are running uses only one type of data format. use the Pick Columns dialog box to select the worksheet columns with the data you want to test and to specify how your data is arranged in the worksheet. Picking Data to Test When you run a test and if you can arrange your data in more than one format. see“Picking Data to Test” on page 19. 2. Select the appropriate format from the Data Format drop-down list. 3. then click Next.

the graph is not recoverable. From the menus select: Graph Create Result Graph SigmaPlot does not create graphs for rates and proportion tests. see “Repeating Tests” on page 21. To change your selections. select the assignment in the list. Reports and Result Graphs Test reports automatically appear after a test has been performed. Make sure the report is the active window. Saving. and Opening Reports and Graphs. 5. click Next. best subset and incremental polynomial regression reports and normality reports. then select new column from the worksheet. You can also export reports as non-notebook files and edit them in other applications. .20 Chapter 3 3. The toolbar Create Graph button and the Graph menu Create Graph command are dimmed for these tests. Note: If you close a report without generating or saving a graph. Editing. After the computations are completed. You can edit reports and graphs using the Format menu commands and the Graph Properties dialog box. If you are running a Forward or Backward Stepwise Regression. The Pick Columns dialog box appears. 4. You can also clear a column assignment by double-clicking it in the Selected Columns list. To generate a result graph: 1. Click Finish to perform the test on the data in the selected columns. For more information. 2. the report appears.

To find the last performed test. . edit the data in the columns used by the test. change the desired options. If you haven’t performed the test displayed in the drop-down list. After the computations are complete. If desired. From the menus select: Statistics Rerun Current Test The Pick Columns dialog box appears with the columns used in the last procedure selected. 3. 4. To change the option settings before you rerun the test. From the menus select: Statistics Run Current Test For more information. Make sure the last test you performed is displayed in the toolbar drop-down list. 2. You can add data and change values and column titles. To repeat a test using the same worksheet columns: 1. Click Finish to repeat the procedure using these columns.21 Using Statistical Procedures Repeating Tests Repeating a test involves running the last test you performed. you can scroll through the drop-down list until the button and command are active. select the toolbar Current Test Options button. 5. see “Running SigmaPlot Procedures” on page 17. using the same worksheet columns. then click OK to accept the changes and close the dialog box. a new report appears. the Statistics menu Rerun Current Test command is dimmed. To repeat a test using new data columns: 1.

see “Using the Advisor Wizard” on page 3. or power. see “Describing Your Data with Basic Statistics” on page 23. . or reaction versus no reaction). see “About Repeated Measures Tests” on page 175. For more information. see “About Group Comparison Tests” on page 51. or to compute the experimental sample size required to achieve a desired sensitivity. Use survival to determine statistics about the time to an event and to compare two or more time-to-event data sets. The Advisor can suggest which test to use. see “Computing Power and Sample Size” on page 507. For more information.22 Chapter 3 Choosing the Procedure to Use You can use SigmaPlot to perform a wide range of statistical procedures. Use descriptive statistics to compute a number of commonly used statistical values for the selected data. see “About Rate and Proportion Tests” on page 251. For more information. For more information. male versus female. Use repeated measures comparisons to test the differences in the same individuals before and after one or more treatments or changes in condition. For more information. Use power and sample size determination to calculate the sensitivity. For more information. of an experimental test. The type of procedure to choose depends on the kind of analysis you want to perform. You can also determine the appropriate test yourself. Use rate and proportion analysis to compare the distribution of groups that are divided or fall into different categories or classes (for example. Use group comparison tests to analyze two or more different sample groups for statistically significant differences.

For more information. see “Setting Descriptive Statistics Options” on page 24. percentiles. For more information. see “Descriptive Statistics Results” on page 27. Viewing the descriptive statistics results. Describing your data involves: Arranging your data in the appropriate format. . Selecting the columns you want to compute the statistics for. see “Arranging Descriptive Statistics Data” on page 24. For more information.23 Using Statistical Procedures Figure 3-1 Procedures to Use for Statistical Tests. Setting descriptive statistic options. etc.. median. standard deviation. For more information. see “Running the Descriptive Statistics Test” on page 26. that summarize the observed data. Describing Your Data with Basic Statistics You can use SigmaPlot to describe your data by computing basic statistics. such as the mean. All statistical procedure commands are found under the Statistics menu.

see “Picking Data to Test” on page 19. drag the pointer over your data. so you should arrange the data for each group or variable you want to analyze in separate columns. you can: Select the columns or block of data before you run the test.24 Chapter 3 Arranging Descriptive Statistics Data Descriptive Statistics are performed on columns of data. select Descriptive Statistics from the toolbar drop-down list. select the data before you run the test. You can select a minimum of one column and a maximum of 32 columns when describing data. To change descriptive statistics test options: 1. . Setting Descriptive Statistics Options You select the statistics that you would like to calculate in the Descriptive Statistics Options dialog box. Note:To calculate statistics for only a range of data. and want to select your data before you run the test. 3. If you are going to run the test after changing test options. or Select the columns while running the test. To open the Options for Descriptive Statistics dialog box. For more information. From the menus select: Statistics Current Test Options The Options for Descriptive Statistics dialog box appears. Selecting Data Columns You can calculate statistics for entire columns or only a portion of columns. When running the descriptive statistics procedure. 2.

Note: To set the number of decimal places displayed. To change the percentile or confidence intervals computed. . Clear any of the selected statistics settings you do not want to include in the report. click Clear. To clear all selections. If the observations are normally distributed. then the median and percentiles often provide a better description of the data. then the mean and standard deviation provide a good description of the data.25 Using Statistical Procedures Figure 3-2 The Options for Descriptive Statistics dialog box 4. For more information. click Select All. see “Descriptive Statistics Results” on page 27. To change the confidence interval. enter any number from 1 to 99 (95 and 99 are the most commonly used intervals) into the Confidence Interval Mean box. and select Number of significant digits. The specific summary statistics that are appropriate for a given data set depend on the nature of the data. 7. To select all statistics options. 8. If not. Click Run Test to perform the test with the selected options settings. 6. 5. edit the values in the Percentile box. from the menus select: Tools Options Click the Report tab.

. 2. drag the pointer over your data. The number or title of selected columns appear in each row. or select the columns from the Data for Data drop-down list. and all successively selected columns are assigned to successive rows in the list. To describe your data: 1. You can select up to 64 columns of data for the Descriptive Statistics Test. The first selected column is assigned to the first row in the Selected Columns list. Figure 3-3 The Pick Columns for Descriptive Statistics Dialog Box Note: If you selected columns before you chose the test. the selected columns automatically appear in the Select Columns list.26 Chapter 3 Running the Descriptive Statistics Test If you want to select your data before you run the procedure. select the columns in the worksheet. From the menus select: Statistics Describe Data The Pick Columns for Descriptive Statistics dialog box appears prompting you to specify a data format. To assign the desired worksheet columns to the Selected Columns list.

The two percentile points which define the upper and lower ends (tails) of the data. then select new column from the worksheet. These values are calculated for each column selected. For more information. see “Setting Descriptive Statistics Options” on page 24. . Range. Sum. You can also clear a column assignment by double-clicking it in the Selected Columns list. Median. Minimum. This is the number of non-missing observations in a worksheet column. select the assignment in the list. then selecting the largest value of the smaller half of the observations. Maximum is the largest observation. Sum of Squares. Missing. The mean is the average value for a column. The range is the minimum values subtracted from the maximum values. Select the specific statistics to compute in the Options for Descriptive Statistics dialog box. The sum is the sum of all observations. The median is the "middle" observation. computed by ordering all observations from smallest to largest. the mean is the center of the distribution. Standard Error of the Mean. Mean. 4. This is the number of missing observations in a worksheet column. as specified by the Descriptive Statistics options. Maximum. Click Finish to describe the data in the selected columns. The standard error of the mean is a measure of how closely the sample mean approximates the true population mean. The sum of squares is the sum of the squared observations. Standard Deviation. Descriptive Statistics Results The following statistics can be calculated and displayed in the results report. Standard deviation is a measure of data variability about the mean. Size. The mean equals the sum divided by the sample size.27 Using Statistical Procedures 3. the report appears. If the observations are normally distributed. After the computations are completed. Minimum is the smallest observation. Percentiles. To change your selections.

. K-S Distance. For more information. Normality. Point plot of the column data with error bars plotting the column means. For more information. Scatter plot with error bars of the column means. Kurtosis. The Descriptive Statistics point plot graphs all values in each column as a point on the graph. They include a: Bar chart of the column means. For more information. A normal distribution has skewness equal to zero. Box plot of the percentiles and median of column data. see “Bar Charts of the Column Means” on page 540. For more information. The Descriptive Statistics test box plot graphs the percentiles and the median of column data. Descriptive Statistics Result Graphs You can generate up to five graphs using the results from a descriptive statistics graph. Point plot of the column data. see “Box Plot” on page 544.28 Chapter 3 Confidence Interval for the Mean. Skewness is a measure of how symmetrically the observed values are distributed about the mean. compared to a normal distribution. The Kolmogorov-Smirnov distance is the maximum cumulative distance between the histogram of your data and the gaussian distribution curve of your data. A normal distribution has Kurtosis equal to zero. see “Point Plot” on page 542. Normality tests the observations for normality using the KolmogorovSmirnov test. The Descriptive Statistics bar chart plots the group means as vertical bars with error bars indicating the standard deviation. The confidence interval for the mean is the range in which the true population mean will fall for a percentage of all possible samples drawn from the population. For more information. The Descriptive Statistics scatter plot graphs the column means as single points with error bars indicating the standard deviation. Skewness. see “Scatter Plot” on page 541. The Descriptive Statistics point and column means plot graphs all values in each column as a point on the graph with error bars indicating the column means and standard deviations of each column. Kurtosis is a measure of how peaked or flat the distribution of observed values is. see “Point Plot and Column Means” on page 543.

Tip: You can also double-click the desired graph in the list.29 Using Statistical Procedures Creating a Descriptive Statistics Result Graph To generate a graph of Descriptive Statistics report data: 1. 2. Choosing the Group Comparison Test to Use Use the various group comparison procedures to test sample means or medians for differences. The specified graph appears in a graph window or in the report. . Select the type of graph you want to create from the Graph Type list and click OK. From the menus select: Graph Create Result Graph The Create Result Graph dialog box appears displaying the types of graphs available for the Descriptive Statistics report. Figure 3-4 The Create Result Graph Dialog Box 3. Make sure that the Descriptive Statistics report is in view.

choose the Mann-Whitney Rank Sum Test. Is the source population for your sample distributed along a normal "bell" (Gaussian) curve. If the populations are not normal. if you are already familiar with the comparison requirements. see See “Comparing Two or More Groups” on page 51. When to Use a t-test versus a Mann-Whitney Rank Sum Test You can perform two kinds of two group comparison tests: an unpaired t-test and the Mann-Whitney Rank Sum Test. When to Compare Two Groups If you collected data from two different groups of subjects (for example. which ranks the values along a new ordinal scale before performing the test. however. two different species of fish or voters from two different parts of the country). use a two group comparison to test for a significant difference beyond what can be attributed to random sampling variation.. If your samples were taken from populations with non-normal distribution and/or unequal variances. then selects the appropriate test. The criteria used to select the appropriate procedure include: The number of groups to compare.30 Chapter 3 The Advisor Wizard prompts you to answer questions about your data and goals. a non-parametric. SigmaPlot lists the specific tests in the Statistics menu and the toolbar drop-down list. or not? Comparisons of samples from normal populations use parametric tests. For more information. then performs an unpaired . you can go directly to the appropriate test. which are based on the mean and standard variation parameters of a normally distributed population. The Mann-Whitney Rank Sum Test arranges the data into sets of rankings. The unpaired t-test is a parametric test which directly compares the sample data. Note: SigmaPlot can automatically test for assumptions of normality and equal variance. see "Unpaired t-Test" in Chapter 4. or distribution-free test must be used. Are you comparing two different groups or many different groups? The distribution of the sample data. For more information. Choose the unpaired t-test if your samples were taken from normally distributed populations and the variances of the two populations are equal.

When to Compare Many Groups If you collected data from three or more different groups of subjects. the Mann-Whitney Rank Sum Test is more reliable. For more information. For more information. it is slightly more sensitive (for example. see "Kruskal-Wallis Analysis of Variance on Ranks" in Chapter 4. use one of the ANOVA (analysis of variance) procedures to test if there is difference among the groups beyond what can be attributed to random sampling variation. such as poor. For more information. The Two Way ANOVA. SigmaPlot tests for normality using the Kolmogorov-Smirnov test. The single factor or One Way ANOVA. If your samples are already ordered according to qualitative ranks. For more information. see "One Way Analysis of Variance (ANOVA)" in Chapter 4. see "Two Way Analysis of Variance (ANOVA)" in Chapter 4. If assumptions of normality and equal variance are violated.31 Using Statistical Procedures t-test on the sum of these ranks. and for equal variance using the Levene Median test. The Three Way ANOVA. see "Mann-Whitney Rank Sum Test" in Chapter 4. The Three Way ANOVA. When these assumptions are not met. fair. rather than directly on the data. The Kruskal-Wallis Analysis of Variance on Ranks. the alternative parametric or nonparametric test is suggested. and very good. For more information. Activate and configure assumption tests in the t-test and Mann-Whitney Rank Sum Test Options dialog boxes. use the Mann-Whitney Rank Sum Test. The Kruskal-Wallis Analysis of Variance on Ranks. it has greater power) than the Mann-Whitney Rank Sum Test. good. . assuming normality and equal variance. The advantage of the t-test is that. Note: You can tell SigmaPlot to analyze your data and test for normal distribution and equal variance. see "Three Way Analysis of Variance (ANOVA)" in Chapter 4. There are four procedures available: The single factor or One Way ANOVA. The Two Way ANOVA.

If assumptions of normality and equal variance are violated. then performs an analysis of variance based on these ranks. so it does not require assuming normality and equal variance. To open the dialog box for the current test. The advantage of parametric ANOVAs are that. one factor). Two. Two. Two. If your samples were taken from populations with non-normal distribution and/or unequal variance. Use a One Way ANOVA if there are several different experimental groups that received a set of related but different treatments (for example. and Three Way ANOVA lies in the design of the experiment that produced the data.32 Chapter 3 Choose One. The One. the Kruskall-Wallis ANOVA on ranks is more reliable. click the Current Test Options button. For more information. Two. they are slightly more sensitive (for example. and for equal variance using the Levene Median test. or Three Way ANOVA if the samples were taken from normally distributed populations and the variances of the populations are equal. choose the Kruskall-Wallis ANOVA on ranks. the alternative parametric or nonparametric test is suggested. which is the nonparametric analog of the one way ANOVA. have greater power) than the analysis based on ranks. rather than directly on the data. This design is essentially the same as an unpaired t-test (a one way ANOVA of two . When the assumptions are not met. Note: SigmaPlot does not have a two factor analysis of variance based on ranks. see "One Way Analysis of Variance (ANOVA)" in Chapter 4. and Three Way ANOVAs are parametric tests which directly compare the samples arithmetically. These tests are specified in the Options dialog boxes. when the normality and equal variance assumptions are met. and Three Way ANOVAs The difference between a One. When to Use One. The Kruskall-Wallis ANOVA on ranks arranges the data into sets of rankings. Note that you can also tell SigmaPlot to analyze your data and tests for normal distribution and equal variance. or from the menus select: Statistics Current Test Options SigmaPlot tests for normality using the Kolmogorov-Smirnov test.

any differences between differing levels of education are the same in all states. any differences between differing levels of education are the same for all genders in all states. and years of education. Use a Two Way ANOVA if there were two experimental factors that are varied for each experimental group. An example of when to use a Three Way ANOVA would be when comparing male and female teachers from three different states. An example of when to use Two Way ANOVA would be when comparing teachers from the three states and with different education levels for their knowledge of evolution -. There is no difference in knowledge among education levels. The two factor design can test three hypotheses about the state and education levels: There is no difference in opinion of the teachers among states. see "Unpaired t-Test" in Chapter 4. There is no interaction between state and education in terms of knowledge. but do not indicate what the . How to Determine Which Groups are Different Analysis of variance techniques (both parametric and nonparametric) test the hypothesis of no differences between the groups. There is no difference in knowledge among education levels. see "Three Way Analysis of Variance (ANOVA)" in Chapter 4. Use a Three Way ANOVA if there are three experimental factors which are varied for each experimental group. There is no interaction between gender. There is no difference in opinion of the teachers among states. The factor varied is state. An example of when to use a One Way ANOVA would be when comparing biology teachers from three different states for their knowledge of evolution.33 Using Statistical Procedures groups obtains exactly the same P value as an unpaired t-test). state. For more information.the two different factors are state and years of education. For more information. The three factor design can test that: There is no difference in opinion of the teachers among gender. and education in terms of knowledge. state. with different levels of education for their knowledge of evolution—the three different factors are gender. For more information. see "Two Way Analysis of Variance (ANOVA)" in Chapter 4.

You can use the multiple comparison procedures (post-hoc tests) provided by SigmaPlot to isolate these differences. then select the desired P value. see "Setting One Way ANOVA Options" in Chapter 4. If the effect . However. see “Using the Advisor WizardUsing the Advisor Wizard” in Chapter 1Chapter 2. The criteria used to select the appropriate procedure include: The number of treatments to compare. or after two or more different treatments? The distribution of the treatment effects. Are the individual effects distributed along a normal "bell" (Gaussian) curve.You can also specify to use multiple comparisons to test for a difference only when the ANOVA P value is significant by selecting the Only When ANOVA P Value is Significant option.34 Chapter 3 differences are. For more information. you can go directly to the appropriate test. select Always Perform on the Post Hoc Tests tab in the ANOVA options dialog boxes. The Advisor Wizard prompts you to answer questions about your data and goals. From the menus select: Statistics Current Test Options Choosing the Repeated Measures Test to Use Use repeated measures tests to determine the effect a treatment or condition has on the same individuals by observing the individuals before and after the treatments or conditions. or not? Comparisons of treatment effects with normal distributions use parametric tests. The specific multiple comparisons procedures to use for each ANOVA are selected in the Multiple Comparison Options dialog box. By concentrating on the changes produced by the treatment instead of the values observed before and after the treatment. To always test for differences among the groups. which gives a more sensitive (or more powerful) test for finding an effect. repeated measures tests eliminate the differences due to individual reactions. Are you comparing the effect before and after a single treatment. For more information. if you are already familiar with the comparison requirements. which are based on the mean and standard deviation parameters of a normally distributed population. then selects the appropriate test. To open: 1.

rather than directly on the data. which ranks the values along a new ordinal scale before performing the test. see “Paired t-Test” on page 177. For more information. a nonparametric. Choose the Paired t-test if your samples were taken from a population in which the changes to each subject are normally distributed. If the assumption of normality is violated.. patients before and after a surgical treatment.e. The Paired t-test is a parametric test which directly compares the sample data. or rats before and after training). For more information. When to Compare Effects on Individuals Before and After a Single Treatment If data was collected from the same group of individuals (for example. assuming normality and equal variance. Note: SigmaPlot can automatically test for assumptions of normality and variance. The Wilcoxon Signed Rank Test arranges the data into sets of rankings. choose the Wilcoxon Signed Rank Test. use the Wilcoxon Signed Rank Test. use Before and After comparison to test for a significant difference beyond what can be attributed to random individual variation. see “Wilcoxon Signed Rank Test” on page 190. such as poor. When these assumptions are not met. When to use a Paired t-test versus a Wilcoxon Signed Rank Test You can use two different tests to compare observations before and after an intervention in the same individuals: the Paired t-test and the Wilcoxon Signed Rank Test. and very good. then performs a Paired t-test on the sum of these ranks. For more information. good. it is slightly more sensitive (i. The advantage of the paired t-test is that.35 Using Statistical Procedures distributions are not normal. has greater power) than the Wilcoxon Signed Rank Test. the Wilcoxon Signed Rank Test is more reliable. or distribution-free test must be used. Note: You can tell SigmaPlot to analyze your data and test for normality. see “Comparing Repeated Measurements of the Same Individuals” on page 175. the alternative parametric or nonparametric test is . If your sample effects are not normally distributed. If your samples are already ordered according to qualitative ranks. SigmaPlot lists the specific tests in the Statistics menu and the toolbar drop-down list. fair.

e. have greater power) than the analysis based on ranks.. so it does not require assuming normality and equal variances. Note:SigmaPlot does not have a two factor analysis of variance based on ranks. Note that you can tell SigmaPlot to analyze your data and test for normal distribution and equal variance. the .36 Chapter 3 suggested. and the Friedman Repeated Measures ANOVA on Ranks. Assumption tests are activated and configured in the Paired t-test and Wilcoxon Options dialogs. then performs an analysis of variance based on these ranks. When the assumptions are not met. when the normality and equal variance assumptions are met. For more information. which is the nonparametric analog of the One Way ANOVA. use one of the Repeated Measures ANOVA (analysis of variance) procedures to test if there is difference among the effects of the treatments beyond what can be attributed to random individual variation. When to Compare Effects on Individuals After Multiple Treatments If you collected data on the same individuals undergoing three or more different treatments or conditions. If the treatment effects are not normally distributed and/or have unequal variances. The Friedman Repeated Measures ANOVA on Ranks arranges the data into sets of rankings. choose the Friedman Repeated Measures ANOVA on Ranks. the Two Way Repeated Measures ANOVA. see “One Way Repeated Measures Analysis of Variance (ANOVA)” on page 200. The advantage of parametric Repeated Measures ANOVAs are that. see “Friedman Repeated Measures Analysis of Variance on Ranks” on page 239. For more information. Choose One or Two Way ANOVA if the treatment effects are normally distributed with equal variances. For more information. If assumptions of normality and equal variance are violated. see “Two Way Repeated Measures Analysis of Variance (ANOVA)” on page 218. There are three procedures available: the single factor or One Way Repeated Measures ANOVA (analysis of variance). they are slightly more sensitive (i. rather than directly on the data. the Repeated Measures Friedman ANOVA on ranks is more reliable. The one and two way ANOVAs are parametric tests which directly compare the two samples arithmetically. SigmaPlot tests for normality using the Kolmogorov-Smirnov test.

This design is essentially the same as a paired t-test (a one way repeated measures ANOVA of two groups obtains exactly the same P value as a paired t-test). One or both of the factors can be repeated on the individuals. These tests are specified in the repeated measures one and two way and Friedman options dialog boxes. and for equal variance using the Levene Median test. and (3) there is no interaction between education level and school in terms of reading skill. see “Two Way Repeated Measures Analysis of Variance (ANOVA)” on page 218. This example has repeated measures on education level only. If you changed the schools so that all students attended all schools as well. . An example of when to use One Way Repeated Measures ANOVA would be when comparing the reading skills of the same students after grade school. The two factor design can test three hypotheses about the education levels and schools: (1) there is no difference in reading skill at different education levels. For more information. then the school factor is also repeated. The repeated factor is education. high school. see “Two Way Repeated Measures Analysis of Variance (ANOVA)” on page 218. Note:SigmaPlot automatically determines if one or both factors have repeated observations in a two way repeated measures ANOVA. one factor). An example of when to use Two Way Repeated Measures ANOVA would be when comparing reading skills at different education levels. Use a Two Way RM ANOVA if there were two experimental factors that are varied for the individuals.37 Using Statistical Procedures alternative parametric or nonparametric test is suggested. with school as the unrepeated second factor. Use a One Way RM ANOVA if the individuals received a set of related but different treatments (for example. For more information. but the students attended different schools. SigmaPlot tests for normality using the Kolmogorov-Smirnov test. (2) there is no difference in reading skill at different schools or after changing schools. When to Use One and Two Way RM ANOVA The difference between a one factor and two factor repeated measures ANOVA lies in the design of the experiment that produced the data. and college. any effect of levels of education are the same in all schools.

Chi-Square analysis of contingency tables. or a group before and after a treatment or change in condition. such as the proportion of males and females found in different countries. select Always Perform on the Post Hoc Tests tab in the ANOVA options dialog boxes. rate. Use z-test to determine if proportions of a single group divided into two categories are significantly different. You can also specify to use multiple comparisons to test for a difference only when the ANOVA P value is significant by selecting Only When ANOVA P Value is Significant. and proportion tests compare percentages and occurrences of observations. Select the appropriate test from the Standard toolbar. then select the desired P value. and McNemar’s Test. Compare Proportions compares two groups according to the percentage of each group in the two categories. To always test for differences among the groups. but do not indicate which treatments have an effect. see “Comparing Proportions Using the z-Test” on page 258. From the menus select: Statistics Current Test Options Choosing the Rate and Proportion Comparison to Use Frequency. Fisher Exact Test. You can compare distribution in categories using a z-test to Compare Proportions. To open: 1. You can use the multiple comparison procedures provided by SigmaPlot to isolate the differences in effect. Use rate and proportion comparisons to determine if there is a significant difference in the distribution of a group among different categories or classes beyond what can be attributed to random sampling variation.38 Chapter 3 How to Determine Which Treatments Have an Effect Repeated measures analysis of variance techniques (both parametric and nonparametric) test the hypothesis of no effect among treatments. For more information. 2. Select the specific multiple comparisons procedures to use for each ANOVA under Multiple Comparisons on the Post Hoc Tests tab on the Options for ANOVA Options dialog box. The data can be random observations of a population. .

For more information. For more information. Multiple Linear Regression. For more information. and suggests the appropriate test. The type of regression procedure to use depends on the number of independent variables and the shape of the relationship between the dependent and independent variables. and compute a correlation coefficient to describe how strongly the value of one variable is associated with another.39 Using Statistical Procedures Use Chi-Square ( χ ) analysis of contingency tables to compare the numbers of individuals of two or more groups that fall into two or more different categories. Polynomial Regression. 2 Use the Fisher Exact Test if you have two groups falling into two categories (a 2 x 2 contingency table) with a small number of expected observations in any category. When to Use Regression to Predict a Variable Regression methods are used to predict the value of one variable (the dependent variable) from one or more independent variables by estimating the coefficients in a mathematical model. Choosing the Prediction or Correlation Method When you want to predict the value of one variable from one or more other variables. You can perform regression using Simple Linear Regression. see “McNemar’s Test” on page 281. Note:SigmaPlot automatically analyzes your data for its suitability for Chi-Square or Fisher Exact Test. see “The Fisher Exact Test” on page 275. Regression is also known as fitting a line or curve to the data. you can use regression methods to estimate the predictive equation. . Regression is a parametric statistical method that assumes that the residuals (differences between the predicted and observed values of the dependent variables) are normally distributed with constant variance. Use McNemar’s Test to compare the number of individuals that fall into different categories before and after a single treatment or change in condition. Regression assumes that the value of the dependent variable is always determined by the value of independent variables. Multiple Logistic Regression. and Nonlinear Regression. see “Chi-square Analysis of Contingency Tables” on page 265.

see "Stepwise Linear Regression" in Chapter 8. Use Best Subset Regression to evaluate all possible models of the regression equation. linearly). For more information. Note: You can use these procedures to find Multiple Linear Regression models. see “Nonlinear Regression” in Chapter 8. see "Stepwise Linear Regression" in Chapter 8. Use Nonlinear Regression to fit any general equation to the observations. and the dependent variable changes in proportion to changes in the independent variable (for example. see "Polynomial Regression" in Chapter 8. You can determine whether or not a possible independent variable contributes to a multiple linear regression model using Forward and Backward Stepwise Regression or Best Subset Regression. and identify those with the best predictive ability (according a to specified criterion). Use these procedures if you are unsure of the contribution of a variable to the value of the independent variable in a Multiple Linear Regression. Use Backwards Stepwise Regression to begin with all selected independent variables. and the dependent variable changes in proportion to changes in each independent variable (for example. then add variables that contribute to the prediction of the dependent variable. see "Simple Linear Regression" in Chapter 8. see "Best Subsets Regression" in Chapter 8. until (ideally) all variables that contribute have been added to the model. For more information. until only variables with real predictive value remain in the model. see "Multiple Logistic Regression" in Chapter 8. Choose Polynomial or Nonlinear Regression for curved data sets. Use Multiple Logistic Regression when you want to predict a qualitative dependent variable. by fitting a logistic function to the data. For more information. . see "Multiple Linear Regression" in Chapter 8. For more information. Use Polynomial Regression for curved relationships that include powers of the independent variable in the regression equation. For more information. such as the presence or absence of a disease. For more information. Use Multiple Linear Regression when there are several independent variables. For more information. For more information. from observations of one or more independent variables.40 Chapter 3 Use a Simple Linear Regression procedure if there is a single independent variable. and delete the variables that least contribute to predicting the dependent variable. Use Forward Stepwise Regression to start with zero independent variables. linearly).

The Spearman Rank Order Correlation is a nonparametric test that constructs a measure of association based on ranks rather than on arithmetic values. Correlation does not predict the value of one variable from another. has greater power) than the Spearman Rank Order Correlation. choose Spearman Rank Order Correlation. For more information. such as poor. The Pearson Product Moment Correlation is a parametric test which assumes that data were drawn from a normal population. This may also be used to generate a single survival curve graph and statistics for all data sets combined in a multi-group data set provided that the data is in Indexed format. see "Spearman Rank Order Correlation" in Chapter 8.e. Choose Pearson Product Moment Correlation if the residuals are normally distributed and the variances are constant. assuming normality and constant variance. and the Spearman Rank Order Correlation coefficient. good.41 Using Statistical Procedures When to Use Correlation Compute the correlation coefficient if you want to quantify the relationship between two variables without specifying which variable is the dependent variable and which is the independent variable. If the residuals are not normally distributed and/or have non-constant variances. see "Pearson Product Moment Correlation" in Chapter 8. For more information. Use Single Group to determine the survival time statistics and graph for a single data set (group). . Choosing the Survival Analysis to Use Use survival analysis to generate the probability of the time to an event and the associated statistics such as the median survival time.. For more information. select the survival time as status columns and ignore the group column. You can compute two kinds of correlation coefficients: the Pearson Product Moment Correlation coefficient. see “Survival Analysis” on page 443. choose Spearman rank order correlation. If your samples are already ordered according to qualitative ranks. it is slightly more sensitive (i. To do this. fair. it only quantifies the strength of association between the value of one variable with another. The advantage of the Pearson Product Moment Correlation is that. and very good.

Performing a Normality Test To run a normality test: . For more information. SigmaPlot can automatically perform a normality test when running a statistical procedure that makes assumptions about the population parameters. When to Test for Normality Normality is assumed for all parametric tests and regression procedures. For more information. The LogRank statistic assumes that all survival times are equally accurate. you can transform your data using Transforms menu commands so that it meets the normality requirements. If the data fails the assumptions required for a particular test. Testing Normality A normal population follows a standard. Parametric tests assume normality of the underlying population or residuals of the dependent variable. SigmaPlot will suggest the appropriate test that can be used instead. If you want to perform a parametric test and your data fails the normality test.42 Chapter 3 Use LogRank to determine the survival time statistics and graph for multi-group data sets. Many censored values with large survival times provide an example of this situation. Use Gehan-Breslow for exactly the same situation as the LogRank case except that the later survival times are assumed to be less accurate and are given less weight. you can run a normality test on the data before performing the parametric procedure again. and can become unreliable if this assumption is violated. see “Gehan-Breslow Survival Analysis” on page 470. This assumption testing is enabled in the Options dialog for each test. see “LogRank Survival Analysis” on page 455. "bell" shaped Gaussian distribution. The LogRank statistic and one of two multiple comparison procedures will be used to determine which groups are significantly different. SigmaPlot uses the Kolmogorov-Smirnov test (with Lilliefors’ correction) to test data for normality of the estimated underlying population. To make sure transformed data now follows a normal distribution pattern.

View and interpret the Normality test report. Setting the P Value for the Normality Test The Kolmogorov-Smirnov test uses a P value to determine whether the data passes or fails. transform. If desired. From the menus select Statistics Normality The Pick Columns for Normality dialog box appears. or import the data to be tested for normality into data worksheet columns. set the P value used to pass “Setting the P Value for the Normality Test” on page 43. Enter. and generate the report graphs. 6. Select the worksheet columns with the data you want to test. 4. Set this P value on the Report tab of the Options dialog box. To set the P value for the Normality test: 1. 3. 5.43 Using Statistical Procedures 1. . From the menus select: Tools Options The Options dialog box appears. 2. Click Finish.

If the P computed by the test is greater than the P set here. increase the P value. 4. 5. To require a stricter adherence to normality. The P value determines the probability of being incorrect in concluding that the data is not normally distributed. To change the P value for the normality test.050. 3. enter a value in the P Value for Significance box. Click the Report tab. . the suggested value in SigmaPlot is 0. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions. the test passes. Larger values of P (for example.44 Chapter 3 Figure 3-5 The Reports tab of the Options dialog box 2. 0.100) require less evidence to conclude the data is not normal. decrease P. To relax the requirement of normality. Requiring smaller values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal.

you need to select the data to test. Use the Pick Columns dialog to select the worksheet columns with the data you want to test. Arranging Normality Test Data Normality test data must be in raw data format.45 Using Statistical Procedures 6. treatment or level in separate columns. Figure 3-6 Valid Data Format for Normality Testing Running a Normality Test To run a Normality test. drag the pointer over your data. 2. Click OK when finished. If you want to select your data before you run the test. From the menus select: Statistics Normality . You can test up to 64 columns of data for normality. To run a Normality test: 1. with the individual observations for each group.

expanded explanations of the results may also appear. Result Explanations In addition to the numerical results. Interpreting Normality Test Results The results of a Normality test display the K-S distances and P values computed for each column. then select new column from the worksheet. Click Finish to describe the data in the selected columns. To edit the report. To assign the desired worksheet columns to the Selected Columns list. the report appears. To change your selections. 3. Note: The number of decimal places displayed is also controlled in Reports tab of the Options dialog box. 4. select the columns in the worksheet. or select the columns from the Data for Data drop-down list. use the Format menu commands. You can select up to 64 columns of data for the Normality test. You can turn off this explanatory text in Reports tab of the Options dialog box. and whether or not each column selected passed or failed the test. The number or title of selected columns appear in each row. select the assignment in the list. The first selected column is assigned to the first row in the Selected Columns list. P Values . After the computations are completed.46 Chapter 3 The Pick Columns for Normality dialog box appears. and all successively selected columns are assigned to successive rows in the list. You can also clear a column assignment by double-clicking it in the Selected Columns list. If you selected columns before you chose the test. K-S Distance The Kolmogorov-Smirnov distance is the maximum cumulative distance between the histogram of your data and the gaussian distribution curve of your data. 5. the selected columns automatically appear in the Selected Columns list.

The Normality histogram plots the raw residuals in a specified range. or sensitivity of a statistical hypothesis test depends on the alpha ( α ) level. Determining Experimental Power and Sample Size The power. and the sample size.47 Using Statistical Procedures The P values represent the observations for normality using the Kolmogorov-Smirnov test. Normal probability plot of the residuals. then click OK. see “Normal Probability Plot” on page 549. Select the type of graph you want to create from the Graph Type list. . Creating a Normality Report Graph To generate a graph of Normality report data: 1. the size of the effect or difference you wish to detect. the underlying population variability. 2. For more information. If the P computed by the test is greater than the P set in the appropriate Report Options dialog. The Normality probability plot graphs the frequency of the raw residuals. They include a: Histogram of the residuals. see “Generating Report Graphs” on page 539. using a defined interval set. see “Histogram of Residuals” on page 547. your data can be considered normal. For more information. Normality Report Graphs You can generate two graphs using the results from a Normality report. For more information. The specified graph appears in a graph window or in the report. or risk of a false positive conclusion. From the menus select: Graph Create Result Graph The Create Graph dialog box appears displaying the types of graphs available for the Normality report.

When the Power dialog box appears. You can determine power or sample size for: Paired and Unpaired t-tests. From the menus select: Statistics Power 2. alpha ( α ). the size of the difference. 3. When to Compute Power and Sample Size Use power and sample size computations to determine the parameters for an intended experiment. Then choose the test. One Way ANOVA. specify the remaining parameters of the data. see “Computing Power and Sample Size” on page 507. Use these procedures to help improve the ability of your experiments to test the desired hypotheses. Chi-Square analysis of contingency tables. Correlation Coefficients. From the menus select: Statistics Sample Size . How To Estimate the Sample Size Necessary to Achieve a Desired Power 1. For more information. For more information. z-test comparison of proportions. before the experiment is carried out. How to Determine the Power of an Intended Test 1.48 Chapter 3 The sample size for an intended experiment is determined by the power. and the population variability. see “Computing Power and Sample Size” on page 507.

see “Computing Power and Sample Size” on page 507. 3. . When the Sample Size dialog box appears. Then choose the test.49 Using Statistical Procedures 2. specify the power and the remaining parameters of the data. For more information.

50 Chapter 3 .

Chapter 4 Comparing Two or More Groups Use group comparison tests to compare random samples from two or more different groups for differences in the mean or median values that cannot be attributed to random sampling variation. they perform a comparison on ranks of the observations. see “Choosing the Group Comparison Test to Use” on page 29. use repeated measures procedures. For more information. then compare the ranks rather than the original values. Parametric tests are based on estimates of the population means and standard deviations. About Group Comparison Tests Group comparisons test two or more different groups for a significant difference in the mean or median values beyond what can be attributed to random sampling variation. For more information. 51 . see “Choosing the Procedure to Use” on page 22. If you are comparing the effects of different treatments on the same individuals. the parameters of a normal distribution. Nonparametric tests do not assume that the samples were drawn from a normal population. Rank Sum Tests automatically rank numeric data. Instead. Parametric and Nonparametric Tests Parametric tests assume samples were drawn from normally distributed populations with the same variances (or standard deviations).

These procedures are described for each test. Comparing Many Groups You can compare three or more groups using the: One Way ANOVA (analysis of variance). see “Three Way Analysis of Variance (ANOVA)” on page 123. For more information. For more information. For t-tests and One Way ANOVAs. A parametric test that compares the effect of three different factors on the means of two or more groups. For more information. This is the nonparametric analog of One Way ANOVA. see “Mann-Whitney Rank Sum Test” on page 70. see “Unpaired t-Test” on page 57. A parametric test that compares the effect of a single factor on the mean of two or more groups. you can use several multiple comparison procedures (also known as post-hoc tests) to determine exactly which groups are different and the size of the difference. If you are using one of these procedures to compare multiple groups.52 Chapter 4 Comparing Two Groups You can compare two groups using: An Unpaired t-test (a parametric test). A Mann-Whitney Rank Sum Test (a nonparametric test). Data indexed to other column(s). and you find a statistically significant difference. see “Kruskal-Wallis Analysis of Variance on Ranks” on page 147. you can also use: . Two Way ANOVA. A parametric test that compares the effect of two“Two Way Analysis of Variance (ANOVA)” on page 98. For more information. Data Format for Group Comparison Tests You can arrange data in the worksheet as: Columns for each group (raw data). see “One Way Analysis of Variance (ANOVA)” on page 80. Three Way ANOVA. For more information. Kruskal-Wallis Analysis of Variance on Ranks.

mean. Indexed data. Arranging Data for t-tests and ANOVAs There are several formats of data that can be analyzed by t-tests. the means in another column. see “Indexed Data” on page 55. analysis of variances (ANOVAs). When comparing two groups. standard deviation. with the data for each group in the same row. this is the format used by SigmaPlot . mean. and their nonparametric analogs. mean.53 Comparing Two or More Groups The sample size. and the corresponding data for each group in another column. including: Raw data. For more information. repeated measures ANOVAs. For more information. the sample sizes (N) must be in one worksheet column. . see “Raw Data” on page 54. and standard error of the mean (SEM) for each group. or standard error of the mean). there should be exactly two rows of data. and standard deviation for each group. and the standard deviations (or standard errors of the mean) in a third column. The sample size. Figure 4-1 Valid Data Formats for an Unpaired t-test Descriptive Statistics If your data is in the form of statistical values (sample size. which places the group names in one column. which places the data for each group in separate columns.

Raw Data The raw data format is the most common format. which can be used by unpaired t-tests and One Way ANOVAs. the appropriate steps are suggested. and the desired procedure is performed. One way ANOVA and one way ANOVA on ranks. missing values must be indicated by double dashes ("--"). The groups to be compared are always placed in two columns. with as many columns as there are groups. You can use raw data for all tests except Two and Three Way ANOVAs. see . If a two-factor ANOVA is missing entire cells. There are no problems associated with missing data or uneven columns.54 Chapter 4 Statistical summary data. One way repeated measures ANOVA and one way repeated measures ANOVA on ranks assume that the data for each subject is in the same row. For more information on arranging data for Two Way ANOVAs. Set data format in the Pick Columns dialog box that appears after choosing the Statistics menu Run Current Test. Messy and unbalanced data. where your data have not yet been analyzed or transformed. It places the data for each group to be compared or analyzed in separate columns. Raw data for two and three way ANOVAs. and Three Way ANOVA cannot analyze raw data. as the titles will also be used in the analysis report. For a description of indexed data. Note:SigmaPlot tests accept messy and unbalanced data and do not require equal sample sizes in the groups being compared. For more information.SigmaPlot automatically handles missing data points (indicated with an "--") for all situations.. not empty cells. see . see “Statistical Summary Data” on page 56. The Two way ANOVA and Two Way repeated measures ANOVA. However. command or clicking the toolbar Run icon. t-tests and rank tests. . Paired t-tests and signed rank tests (both repeated measures tests) assume that the data for each subject is in the same row. and require indexed data. Data for each group is placed in separate columns. For more information on arranging data for one way ANOVAs. see “Arranging One Way ANOVA Data” on page 82. see “Arranging Two Way ANOVA Data” on page 100. Use column titles to identify the groups.. For more information on using the Index command.

Independent t-test and Mann-Whitney rank sum test. and a data column containing the data points in corresponding rows. For more information on arranging data for thee Rank Sum Test. Note: If you are analyzing entire columns of data. they do not have to be grouped or sorted by factor level or subject. and Repeated measures ANOVAs require an additional subject column to identify the subject of the measurement. which contains the names of the groups or levels. . Three Way ANOVAs require three factor columns and one data column.. Note: Data for a Two Way ANOVA is always assumed to be indexed. i. The group index is in a factor column. and the corresponding data points to be compared are in a second column. The order of the rows containing the index and data does not matter.55 Comparing Two or More Groups Indexed Data Indexed data consists of a factor column. and data columns does not matter.e. see “Arranging Rank Sum Data” on page 72. the location in the worksheet of the factor. Figure 4-2 Data Format for a Two Way ANOVA with Two Factor Indexed Data Two way ANOVAs require two factor columns and one data column. subject. The data does not have to be organized in any particular order. For more information on arranging data for the t-test see Arranging t-test Data.

For more information on arranging data for the ANOVA on Ranks. see “Arranging ANOVA on Ranks Data” on page 149. For more information on arranging data for the Two Way ANOVA. see Arranging Signed Rank Data. Two way ANOVA. as well as a data column. Each data point should be represented by different combinations of the factors. and the levels are Male/Female and Drug A/Drug B. one for each level of the observation. Indexed data for one way ANOVA contains only two columns. Three factors are required for Three Way ANOVAs. the factors in a drug treatment test are Gender and Drug. Repeated measures ANOVA. see “Arranging Three Way ANOVA Data” on page 125. Each data point should be represented by different combinations of the factors. For more information on arranging data for the Signed Rank Sum Test. One way ANOVA and Kruskall-Wallis ANOVA on ranks. Statistical Summary Data Unpaired t-tests and one way ANOVAs can be performed on summary statistics of the data. The factor column contains the group index. which identifies the data points for each subject. and the data column contains the corresponding data points. Note: If you do not want to bother entering indexed data for a Two Way ANOVA. one for each level of observation. A Two Way Repeated Measures ANOVA requires both a subject column and two factor columns. These tests require an additional subject column. These statistics can be in the form of: . For more information on arranging data for the Paired t-test see Arranging Paired ttest Data. you can enter the data for each cell of the Two Way ANOVA table into separate columns. Two factor columns are required for Two Way ANOVAs. Three way ANOVA. For more information on arranging data for the Three Way ANOVA.56 Chapter 4 Paired t-test and Wilcoxon signed rank test. which indicates the subject for each level and data point. For more information on arranging data for the One Way ANOVA see “Arranging One Way ANOVA Data” on page 82. see “Arranging Two Way ANOVA Data” on page 100. Repeated measures comparisons require an additional subject index column. then use the Edit menu Index command to create the indexed columns. For example.

Unpaired t-Test Use an Unpaired t-test when: You want to see if the means of two different samples are significantly different. do a One Way Analysis of Variance.57 Comparing Two or More Groups The sample size. and suggest the Mann-Whitney Rank Sum Test instead. with the data for each group in the same row. SigmaPlot will inform you that the data is unsuitable for a t-test. if you attempt to perform a t-test on non-normal populations or populations with unequal variances. Note: Depending on your t-test options settings.For more information. Your samples are drawn from normally distributed populations with the same variances. For more information. the means in the middle column. The null hypothesis of an unpaired t-test is that the means of the populations that you drew the samples from are the same. When there are more than two groups to compare. and the standard deviations or SEMs in the right column. see “Setting t-Test Options” on page 59. you can conclude that the means are different. put the sample sizes in the left column. . and standard deviation for each group. If you can confidently reject this hypothesis. It tests for a difference between two groups that is greater than what can be attributed to random sampling variation. or The sample size. and standard error of the mean (SEM) for each group The sample sizes (N) must be in one worksheet column. and the standard deviations (or standard errors of the mean) in a third column. mean. the means in another column. If you plan to compare only a portion of the data by selecting a block. About the Unpaired t-test The Unpaired t-test is a parametric test based on estimates of the mean and standard deviation parameters of the normally distributed populations from which the samples were drawn. see “One Way Analysis of Variance (ANOVA)” on page 80. mean.

2. For more information. . set the t-test options. For more information. 5. the data is placed in two worksheet columns. Enter or arrange your data appropriately in the worksheet. For more information. Generate report graphs. For raw and indexed data. If desired. see “Setting t-Test Options” on page 59. 6. Arranging t-Test Data The format of the data to be tested can be raw. Statistical summary data is placed in three worksheet columns. Run the test. see “Arranging t-Test Data” on page 58. see “Interpreting t-Test Results” on page 65.58 Chapter 4 Performing an Unpaired t-Test To perform an Unpaired t-test: 1. see “t-Test Report Graphs” on page 68. see “Running a t-Test” on page 63. For more information. View and interpret the t-test report. 3. or summary statistics. From the menus select: Statistics Compare Two Groups t-test 4. indexed. For more information.

Select t-test from the Standard toolbar. 2. Compute the power or sensitivity of the test To set t-test options: 1.59 Comparing Two or More Groups Figure 4-3 Valid Data Formats for an Unpaired t-test Setting t-Test Options Use the t-test options to: Adjust the parameters of a test to relax or restrict the testing of your data for normality and equal variance. Display the statistics summary and the confidence interval for the data in the report and save residuals to a worksheet column. From the menus click: Statistics Current Test Options The Options for t-test dialog box appears with three tabs: .

For more information. Options settings are saved between SigmaPlot sessions. 3. click OK. click Run Test. Adjust the parameters of a test to relax or restrict the testing of your data for normality and equal variance. see “Options for t-Test: Assumption Checking” on page 60. The Pick Columns dialog box appears. To accept the current settings and close the options dialog box. Post Hoc Tests. Figure 4-4 The Options for t-test Dialog Box Displaying the Assumption Checking Options .60 Chapter 4 Assumption Checking. Results. For more information. Note: If you are going to run the test after changing test options. Display the statistics summary and the confidence interval for the data in the report and save residuals to a worksheet column. To continue the test. see “Options for t-Test: Results” on page 61. For more information. The equal variance assumption test checks the variability about the group means. For more information. 4. Options for t-Test: Assumption Checking The normality assumption test checks for a normally distributed population. and want to select your data before you run the test. see “Running a t-Test” on page 63. see “Options for t-Test: Post Hoc Tests” on page 62. drag the pointer over your data. Compute the power or sensitivity of the test.

the test passes. Requiring larger values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal. For example. however. the suggested value in SigmaPlot is 0. SigmaPlot uses the Kolmogorov-Smirnov test to test for a normally distributed population. a P value of 0.050.100 requires greater deviations from normality to flag the data as non-normal than a value of 0. Options for t-Test: Results Summary Table. the average value for the column or group. Larger values of P (for example. Displays the number of observations for a column or group. The P value determines the probability of being incorrect in concluding that the data is not normally distributed (P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed). Displays residuals in the report and to save the residuals of the test to the specified worksheet column. SigmaPlot tests for equal variance by checking the variability about the group means. 0. To relax the requirement of normality and equal variance.61 Comparing Two or More Groups Normality Testing. these conditions should be easily detected by simply examining the data without resorting to the automatic assumption tests. P Values for Normality and Equal Variance. Note: There are extreme conditions of data distribution that these tests cannot take into account. and the standard error of the mean for the column or group.100) require less evidence to conclude that data is not normal. the number of missing values for a column or group. decrease the P value. To change the interval. Edit the number or select a number from the drop-down list. enter any number from 1 to 99 (95 and 99 are the most commonly used intervals). the standard deviation of the column or group. For example. the Levene Median test fails to detect differences in variance of several orders of magnitude. If the P computed by the test is greater than the P set here. To require a stricter adherence to normality and equal variance. increase P. Residuals in Column. Displays the confidence interval for the difference of the means. . Equal Variance Testing.050. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions. Confidence Intervals.

The power or sensitivity of a test is the probability that the test will detect a difference between the groups if there is really a difference. Smaller values of α result in stricter requirements before concluding there is a significant difference. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. . or that you are willing to conclude there is a significant difference when P < 0. The suggested value is α = 0.62 Chapter 4 Figure 4-5 The Options for t-test Dialog Box Displaying the Summary Table. but a greater possibility of concluding there is no difference when one exists. Larger values of α make it easier to conclude that there is a difference. but also increase the risk of reporting a false positive. Confidence Intervals.05. This indicates that a one in twenty chance of error is acceptable.05. and Residuals Options Options for t-Test: Post Hoc Tests Power. Use Alpha Value.

1. From the menus select: Statistics Compare Two Groups t-test The Pick Columns for t-test dialog box appears prompting you to specify a data format. drag the pointer over your data.63 Comparing Two or More Groups Figure 4-6 The Options for t-test Dialog Box Displaying the Power Option Running a t-Test If you want to select your data before you run the test. Figure 4-7 The Pick Columns for t-test Dialog Box Prompting You to Specify a Data Format .

you are prompted to select two worksheet columns. For statistical summary data you are prompted to select three columns. To edit the report. the selected columns appear in the Selected Columns list. Select the appropriate data format (Raw or Indexed) from the Data Format drop-down list. and all successively selected columns are assigned to successive rows in the list. The first selected column is assigned to the first row in the Selected Columns list. Click Next to pick the data columns for the test. For more information. 4. 3. select the assignment in the list. then select new column from the worksheet. After the computations are completed. select the columns in the worksheet. use the Format menu commands. You can also clear a column assignment by double-clicking it in the Selected Columns list. for information on editing reports see . For raw and indexed data. . the report appears. To change your selections. If you selected columns before you chose the test. 7. Click Finish to run the t-test on the selected columns. see “Data Format for Group Comparison Tests” on page 52. The title of selected columns appears in each row. or select the columns from the Data for Data drop-down list. Figure 4-8 The Pick Columns for t-test Dialog Box Prompting You to Select Data Columns 6.64 Chapter 4 2. To assign the desired worksheet columns to the Selected Columns list. 5.

To move to the next or the previous page in the report. The other results displayed in the report are enabled and disabled in the Options for t-test dialog box. use the up and down arrow buttons in the formatting toolbar to move one page up and down in the report. These results are displayed in the t-test report which automatically appears after the t-test is performed.65 Comparing Two or More Groups Interpreting t-Test Results The t-test calculates the t statistic. you can reference any appropriate statistics reference. For descriptions of the derivations for t-test results. degrees of freedom. and P value of the specified data. Figure 4-9 The t-test Report . Note: The report scroll bars only scroll to the top and bottom of the current page.

The t-test statistic is the ratio: The standard error of the difference is a measure of the precision with which this difference can be estimated. Equal variance of the source population is assumed for all parametric tests. Mean. If the observations are normally distributed.66 Chapter 4 Result Explanations In addition to the numerical results. number of missing values. expanded explanations of the results may also appear. standard deviations. means. and about 95% of the observations will fall within two standard deviations above or below the mean. The number of non-missing observations for that column or group. Standard Error of the Mean. and the standard error of the means (SEM). You can enable or disable this explanatory text in the Options dialog box. Missing. Equal Variance Test. This result is set in the Options for t-test dialog box. . N (Size). A measure of the approximation with which the mean computed from the sample approximates the true population mean. Normality Test. Normality test results show whether the data passed or failed the test of the assumption that the samples were drawn from normal populations and the P value calculated by the test. Summary Table. Standard Deviation. Equal Variance test results display whether or not the data passed or failed the test of the assumption that the samples were drawn from populations with the same variance and the P value calculated by the test. If the observations are normally distributed the mean is the center of the distribution. A measure of variability. The number of missing values for that column or group. SigmaPlot can generate a summary table listing the sizes N for the two samples. about two-thirds will fall within one standard deviation above or below the mean. t Statistic. All parametric tests require normally distributed source populations. This result is displayed unless you disable Summary Table in the Options for t-test dialog box. The average value for the column.

where α is the acceptable probability of incorrectly concluding that there is a difference. Larger values of confidence result in wider intervals and smaller values in smaller intervals.05. or sensitivity. a value of α = 0. For a further explanation of α . The level of confidence is adjusted in the Options for t-test dialog box. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. the more sensitive the test. Alpha ( α ).e. or 95%. This can also be described as P < α (alpha). or that you are willing to conclude there is a significant difference when P < 0. Power. Traditionally. Confidence Interval for the Difference of the Means.. the probability of falsely rejecting the null hypothesis. this is typically 100(1-a). α (alpha). of a t-test is the probability that the test will detect a difference between the groups if there really is a difference. The power. and the standard deviation. the difference of the means. An α error is also called a Type I error (a Type I error is when you reject the hypothesis of no effect when this hypothesis is true).05 indicates that a one in twenty chance of error is acceptable.e. which affect the ability of the t-test to detect differences in the means. see Power below. the chance of erroneously reporting a difference. A small t (near 0) indicates that there is no significant difference between the samples. that the differences between the two groups are statistically significant). A large t indicates that the difference between the treatment group means is larger than what would be expected from sampling variability alone (i. This result is set in the Options for t-test dialog box. P Value. t-test power is affected by the sample size of both groups. the greater the probability that the samples are drawn from different populations. the ability to detect a difference with a smaller t increases. . The α value is set in the Options for t-test dialog box. or committing a Type I error.67 Comparing Two or More Groups You can conclude from "large" absolute values of t that the samples were drawn from different populations. Degrees of Freedom. The P value is the probability of being wrong in concluding that there is a true difference in the two groups (i. based on t). As degrees of freedom (sample sizes) increase. If the confidence interval does not include zero.05. Degrees of freedom represents the sample sizes.. you can conclude that there is a significant difference between the proportions with the level of confidence specified. This result is set Options for t-test dialog box. The smaller the P value. The closer the power is to 1. you can conclude there is a significant difference when P < 0.

Select the t-test report. using a defined interval set. The t-test scatter plot graphs the group means as single points with error bars indicating the standard deviation. The t-test bar chart plots the group means as vertical bars with error bars indicating the standard deviation. but a greater possibility of concluding there is no difference when one exists (a Type II error).68 Chapter 4 Smaller values of α result in stricter requirements before concluding there is a significant difference. They include a: Bar chart of the column means. The t-test histogram plots the raw residuals in a specified range. Histogram of the residuals. . see “Bar Charts of the Column Means” on page 540. Scatter plot with error bars of the column means. Point plot of the column means. t-Test Report Graphs You can generate up to five graphs using the results from a t-test. For more information. The t-test probability plot graphs the frequency of the raw residuals. For more information. 2. but also increase the risk of reporting a false positive (a Type I error). see “Histogram of Residuals” on page 547. For more information. Normal probability plot of the residuals. On the menus choose: Graph Create Graph The Create Graph dialog box appears displaying the types of graphs available for the t-test results. For more information. How to Create a Graph of the t-test Data 1. Larger values of α make it easier to conclude that there is a difference. see “Scatter Plot” on page 541. For more information. see “Normal Probability Plot” on page 549. see “Point Plot” on page 542. The t-test point plot graphs all values in each column as a point on the graph.

69 Comparing Two or More Groups Figure 4-10 The Create Graph Dialog Box for the t-test Report 3. or double-click the desired graph in the list. see “Generating Report Graphs” on page 539. then click OK. For more information. . Select the type of graph you want to create from the Graph Type list. The selected graph appears in a graph window.

if you attempt to perform a rank sum test on normal populations with equal variances. For more information. run a Kruskal-Wallis ANOVA on Ranks test. SigmaPlot informs you that the data can be analyzed with the more powerful Unpaired t-test instead. use the Unpaired t-test. When there are more than two groups to compare. see “Unpaired t-Test” on page 57. see “Setting Mann-Whitney Rank Sum Test Options” on page 72. . or you do not want to assume that they were drawn from normal populations. If you know your data was drawn from a normally distributed population. Note: Depending on your Rank Sum Test options settings. The samples are not drawn from normally distributed populations with the same variances.70 Chapter 4 Figure 4-11 A Point Plot of the Result Data for a t-test Mann-Whitney Rank Sum Test Use the Rank Sum Test when: You want to see if the medians of two different samples are significantly different. see “Kruskal-Wallis Analysis of Variance on Ranks” on page 147. For more information. For more information.

If there is no difference between the two groups.. The null hypothesis is that the two samples were not drawn from populations with different medians. . 2. which does not require assuming normality or equal variance. and conclude that the samples were drawn from different populations (i. see “Setting MannWhitney Rank Sum Test Options” on page 72. 5. For more information. The ranks for each group are summed and the rank sums compared. set the Rank Sum options. If desired. see “Running a Rank Sum Test” on page 75.For more information. Enter or arrange your data appropriately in the worksheet. Performing a Mann-Whitney Rank Sum Test To perform a Mann-Whitney Rank Sum Test: 1. If they differ by a large amount.e. the mean ranks should be approximately the same.For more information. see “Interpreting Rank Sum Test Results” on page 76. View and interpret the Rank Sum report. The Rank Sum Test is a nonparametric procedure. see “Arranging Rank Sum Data” on page 72.71 Comparing Two or More Groups About the Mann-Whitney Rank Sum Test Use the Mann-Whitney Rank Sum Test to test for a difference between two groups that is greater than what can be attributed to random sampling variation. 3. you can assume that the low ranks tend to be in one group and the high ranks are in the other. For more information. On the menus click: Statistics Compare Two Groups Rank Sum Test 4. It ranks all the observations from smallest to largest without regard to which group each observation comes from. Run the test. that there is a statistically significant difference).

Figure 4-12 Valid Data Formats for a Mann-Whitney Rank Sum Test Setting Mann-Whitney Rank Sum Test Options 1. in either case. 2. From the menus select: Statistics Current Test Options The Options for Rank Sum Test dialog box appears with three tabs: Assumption Checking. Arranging Rank Sum Data The format of the data to be tested can be raw data or indexed data.72 Chapter 4 6. see “Rank Sum Test Report Graphs” on page 78. Adjust the parameters of a test to relax or restrict the testing of your data for normality and equal variance. the data is found in two worksheet columns. For more information. Select Rank Sum Test from the toolbar drop-down list. For more information. Generate report graphs. see “Options for Rank Sum Test: Assumption Checking” on page 73. .

see “Options for Rank Sum Test: Results” on page 74. the test passes. Figure 4-13 Normality Testing. For more information.SigmaPlot tests for equal variance by checking the variability about the group means. P Values for Normality and Equal Variance. Options for Rank Sum Test: Assumption Checking The normality assumption test checks for a normally distributed population. To require a stricter adherence to normality and/or equal variance. Equal Variance Testing. Because the parametric statistical methods are relatively robust in terms of detecting . and want to select your data before you run the test.73 Comparing Two or More Groups Results. Note: If you are going to run the test after changing test options. The P value determines the probability of being incorrect in concluding that the data is not normally distributed (P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed). Display the statistics summary and the confidence interval for the data in the report and save residuals to a worksheet column. decrease the P value. If the P computed by the test is greater than the P set here. drag the pointer over your data.SigmaPlot uses the Kolmogorov-Smirnov test to test for a normally distributed population. The equal variance assumption test checks the variability about the group means.

050.100 requires greater deviations from normality to flag the data as non-normal than a value of 0.100) require less evidence to conclude that data is not normal. enter any number from 1 to 99 (95 and 99 are the most commonly used intervals). a P value of 0. Note: There are extreme conditions of data distribution that these tests cannot take into account. the average value for the column or group. the standard deviation of the column or group. Larger values of P (for example. the Levene Median test fails to detect differences in variance of several orders of magnitude. To change the interval. To relax the requirement of normality and/or equal variance. Displays the number of observations for a column or group.74 Chapter 4 violations of the assumptions. Displays the confidence interval for the difference of the means. Requiring larger values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal. these conditions should be easily detected by simply examining the data without resorting to the automatic assumption tests. For example. however.050. increase P. Options for Rank Sum Test: Results Summary Table. the number of missing values for a column or group. 0. Figure 4-14 The Options for Rank Sum Test Dialog Box Displaying the Summary Table Options . and the standard error of the mean for the column or group. Confidence Intervals. the suggested value in SigmaPlot is 0. For example.

1. Click Next to pick the data columns for the test. Select the appropriate data format from the Data Format drop-down list. For more information. drag the pointer over your data. On the menus click: Statistics Compare Two Groups Rank Sum Test The Pick Columns for Rank Sum Test dialog box appears prompting you to specify a data format. 3. . Figure 4-15 The Pick Columns for Rank Sum Test Dialog Box Prompting You to Specify a Data Format 2. If you selected columns before you chose the test. the selected columns appear in the Selected Columns list.75 Comparing Two or More Groups Running a Rank Sum Test If you want to select your data before you run the test. see “Data Format for Group Comparison Tests” on page 52.

To change your selections. SigmaPlot performs the test for normality (Kolmogorov-Smirnov) and the test for equal variance (Levene Median). The title of selected columns appear in each row. and all successively selected columns are assigned to successive rows in the list. you are prompted to select two worksheet columns. 6. For more information. The first selected column is assigned to the first row in the Selected Columns list. select the columns in the worksheet. These results are displayed in the rank sum report which appears after the rank sum test . To assign the desired worksheet columns to the Selected Columns list. see “Paired t-Test” on page 177. then select new column from the worksheet. If you elected to test for normality and equal variance. Interpreting Rank Sum Test Results The Rank Sum Test computes the Mann-Whitney T statistic and the P value for T. SigmaPlot informs you and suggests continuing your analysis using a parametric t-test. 7. 5. If your data pass both tests. After the computations are completed. You can also clear a column assignment by double-clicking it in the Selected Columns list. Click Finish to run the Rank Sum Test on the selected columns.76 Chapter 4 Figure 4-16 The Pick Columns for Rank Sum Test Dialog Box Prompting You to Select Data Columns 4. For raw and indexed data. select the assignment in the list. or select the columns from the Data for Data drop-down list. the report appears.

For descriptions of the derivations for t-test results. as nonparametric tests do not assume normally distributed source populations. You can enable or disable this explanatory text in the Options dialog box. Nonparametric tests do not . Normality Test. Figure 4-17 The Rank Sum Test Report Result Explanations In addition to the numerical results. Equal Variance test results display whether or not the data passed or failed the test of the assumption that the samples were drawn from populations with the same variance and the P value calculated by the test. expanded explanations of the results may also appear. This result is set in the Options for Rank Sum Test dialog box. you can reference any appropriate statistics reference. The other results displayed in the report are enabled and disabled in the Options for Rank Sum Test dialog box. see “Setting MannWhitney Rank Sum Test Options” on page 72. Normality test results display whether the data passed or failed the test of the assumption that they were drawn from a normal population and the P value calculated by the test. For more information. this test can have failed.77 Comparing Two or More Groups is performed. Equal Variance Test. For nonparametric procedures.

For more information. The T statistic is the sum of the ranks in the smaller sample group or from the first selected group. This value is compared to the population of all possible rankings to determine the possibility of this T occurring. the greater the probability that the samples are drawn from different populations. number of missing values.The Rank Sum Test point plot graphs all values in each column as a point on the graph. The smaller the P value.e. For more information. or committing a Type I error. Traditionally. Missing. Rank Sum Test Report Graphs You can generate up to two graphs using the results from a Rank Sum Test. The Rank Sum Test box plot graphs the percentiles and the median of column data.78 Chapter 4 assume equal variance of the source populations. the probability of falsely rejecting the null hypothesis.. The two percentile points that define the upper and lower tails of the observed values. and percentiles unless you disable the Display Summary Table option in the Options for Rank Sum Test dialog box. with a line at the median and error bars defining the 10th and 90th percentiles. see “Point Plot” on page 542. The P value is the probability of being wrong in concluding that there is a true difference in the two groups (i. if both groups are the same size. . T Statistic. you can conclude there is a significant difference when P < 0. Medians. The ends of the boxes define the 25th and 75th percentiles. Summary Table. Point plot of the column data. The number of non-missing observations for that column or group. P Value. The median observation has an equal number of observations greater than and less than that observation. based on T). see “Box Plot” on page 544. medians. They include a: Box plot of the percentiles and median of column data.05. The number of missing values for that column or group. The "middle" observation as computed by listing all the observations from smallest to largest and selecting the largest value of the smallest half of the observations. N (Size). Percentiles. SigmaPlot generates a summary table listing the sample sizes N. This result is set in the Options for Rank Sum Test dialog box.

then click OK. Figure 4-18 The Create Graph Dialog Box for the Rank Sum Test Report 3. 2. On the menus choose: Graph Create Graph The Create Graph dialog box appears displaying the types of graphs available for the Rank Sum Test results. For more information. or double-click the desired graph in the list. The selected graph appears in a graph window. Select the Rank Sum Test report. Select the type of graph you want to create from the Graph Type list. . see “Generating Report Graphs” on page 539.79 Comparing Two or More Groups How to Create a Rank Sum Test Report Graph 1.

you can do a t-test (depending on the type of results you want). For more information. use the KruskalWallis ANOVA on Ranks test. When there are only two groups to compare.80 Chapter 4 Figure 4-19 A Box Plot of the Result Data for a Rank Sum Test One Way Analysis of Variance (ANOVA) One Way Analysis of Variance is a parametric test that assumes that all the samples are drawn from normally distributed populations with the same standard deviations (variances). see “Two Way Analysis of Variance (ANOVA)” on page 98. Your samples are drawn from normally distributed populations with equal variance. For more information. If you want to consider the effects of two factors on your experimental groups. if you attempt to perform an ANOVA on non-normal populations or populations with unequal variances. see “Unpaired t-Test” on page 57. If you know that your data was drawn from non-normal populations. Note: Depending on your ANOVA options settings. Use a One Way or One Factor ANOVA when: You want to see if the means of two of more different experimental groups are affected by a single factor. use Two Way ANOVA. For more information. and suggests the Kruskal- . SigmaStat informs you that the data is unsuitable for a parametric test. see “Kruskal-Wallis Analysis of Variance on Ranks” on page 147. Performing an ANOVA for two groups yields exactly the same P value as an unpaired t-test.

Enter or arrange your data appropriately in the worksheet. see “Setting One Way ANOVA Options” on page 82.81 Comparing Two or More Groups Wallis ANOVA on Ranks. 7. . 6. The null hypothesis is that there is no difference among the populations from which the samples were drawn. About One Way ANOVA The design for a One Way ANOVA is the same as an unpaired t-test except that there can be more than two experimental groups.For more information. 2. For more information. see “Interpreting One Way ANOVA Results” on page 90. For more information. 3. For more information. For more information. see “Multiple Comparison Options for a One Way ANOVA” on page 89. set One Way ANOVA options. Specify the multiple comparisons you want to perform on your test. If desired. View and interpret the One Way ANOVA report. For more information. see “Setting One Way ANOVA Options” on page 82. see “Running a One Way ANOVA” on page 87. see “One Way ANOVA Report Graphs” on page 96. For more information. 5. Run the test. see “Arranging One Way ANOVA Data” on page 82. Generate report graphs. Performing a One Way ANOVA To perform a One Way ANOVA: 1. On the menus click: Statistics Compare Many Groups One Way ANOVA 4.

82 Chapter 4 Arranging One Way ANOVA Data Arrange data as raw data. Place indexed data in two worksheet columns. see “Options for One Way ANOVA: Assumption Checking” on page 83. Place statistical summary data in three columns. Adjust the parameters of a test to relax or restrict the testing of your data for normality and equal variance. up to 32. Figure 4-20 Valid Data Formats for a One Way ANOVA Setting One Way ANOVA Options 1. . indexed data. From the menus select: Statistics Current Test Options The Options for One Way Anova dialog box appears with three tabs: Assumption Checking. or summary statistics. Select One Way ANOVAfrom the toolbar drop-down list. For more information. each column contains the data for one group. Place raw data in as many columns as there are groups. 2.

drag the pointer over your data. To continue the test. see “Options for One Way ANOVA: Post Hoc Tests” on page 85. click Run Test. Note: If you are going to run the test after changing test options.83 Comparing Two or More Groups Results. SigmaPlot uses the Kolmogorov-Smirnov test to test for a normally distributed population. and want to select your data before you run the test. Equal Variance Testing. Display the statistics summary and the confidence interval for the data in the report and save residuals to a worksheet column. Figure 4-21 The Options for One Way ANOVA Dialog Box Displaying the Assumption Checking Options Normality Testing. Options for One Way ANOVA: Assumption Checking The normality assumption test checks for a normally distributed population. SigmaPlot tests for equal variance by checking the variability about the group means. Compute the power or sensitivity of the test and enable multiple comparisons. Post Hoc Test. For more information. 4. 3. see “Options for One Way ANOVA: Results” on page 84. . To accept the current settings and close the options dialog box. For more information. click OK. The equal variance assumption test checks the variability about the group means.

Larger values of P (for example.100 requires greater deviations from normality to flag the data as non-normal than a value of 0. To relax the requirement of normality and/or equal variance.84 Chapter 4 P Values for Normality and Equal Variance. To change the interval. 0. The P value determines the probability of being incorrect in concluding that the data is not normally distributed (P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed). the number of missing values for a column or group. If the P computed by the test is greater than the P set here. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions. Select to display residuals in the report and to save the residuals of the test to the specified worksheet column. enter any number from 1 to 99 (95 and 99 are the most commonly used intervals). For example. the average value for the column or group. the test passes. To require a stricter adherence to normality and/or equal variance.100) require less evidence to conclude that data is not normal. Requiring larger values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal. Select to display the number of observations for a column or group. increase P. . Edit the number or select a number from the drop-down list. the suggested value in SigmaPlot is 0. Residuals in Column.050. these conditions should be easily detected by simply examining the data without resorting to the automatic assumption tests. Confidence Intervals.050. Options for One Way ANOVA: Results Summary Table. and the standard error of the mean for the column or group. the Levene Median test fails to detect differences in variance of several orders of magnitude. Note: There are extreme conditions of data distribution that these tests cannot take into account. a P value of 0. For example. however. the standard deviation of the column or group. decrease the P value. Select to display the confidence interval for the difference of the means.

Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. The power or sensitivity of a test is the probability that the test will detect a difference between the groups if there is really a difference. This indicates that a one in twenty chance of error is acceptable.85 Comparing Two or More Groups Figure 4-22 The Options for One Way ANOVA Dialog Box Displaying the Summary Table.05. but a greater possibility of concluding there is no difference when one exists. but also increase the risk of reporting a false positive. The suggested value is α = 0.05. or that you are willing to conclude there is a significant difference when P < 0. Confidence Intervals. . and Residuals Options Options for One Way ANOVA: Post Hoc Tests Power. Larger values of α make it easier to conclude that there is a difference. Smaller values of α result in stricter requirements before concluding there is a significant difference. Use Alpha Value.

Significance Value for Multiple Comparisons. Select either . Only When ANOVA P Value is Significant. but do not determine which groups are different. You can choose to always perform multiple comparisons or to only perform multiple comparisons if a One Way ANOVA detects a difference.86 Chapter 4 Figure 4-23 The Options for One Way ANOVA Dialog Box Displaying the Power and Multiple Comparison Options Multiple Comparisons One-Way ANOVAs test the hypothesis of no differences between the several treatment groups. The P value used to determine if the ANOVA detects a difference is set on the Report tab of the Options dialog box.01 from the Significance Value for Multiple Comparisons drop-down list. Always Perform. This value determines the likelihood of the multiple comparison being incorrect in concluding that there is a significant difference in the treatments. . a difference in the groups is detected and the multiple comparisons are performed. Select to perform multiple comparisons whether or not the ANOVA detects a difference. or the sizes of these differences. If the P value produced by the One Way ANOVA is less than the P value specified in the box. Multiple comparison procedures isolate these differences.05 or . Select to perform multiple comparisons only if the ANOVA detects a difference.

Running a One Way ANOVA If you want to select your data before you run the test. 3. . If you selected columns before you chose the test. Note: If multiple comparisons are triggered.87 Comparing Two or More Groups A value of . For more information. the selected columns appear in the Selected Columns list. Select the appropriate data format from the Data Format drop-down list. Figure 4-24 The Pick Columns for One Way ANOVA Dialog Box Prompting You to Specify a Data Format 2. 1. see “Data Format for Group Comparison Tests” on page 52. prompting you to choose a multiple comparison method. From the menus select: Statistics Compare Many Groups One Way ANOVA The Pick Columns for One Way ANOVA dialog box appears prompting you to specify a data format. Click Next to pick the data columns for the test. drag the pointer over your data.05 indicates that the multiple comparisons will detect a difference if there is less than 5% chance that the multiple comparison is incorrect in detecting a difference. the Multiple Comparison Options dialog box appears after you pick your data from the worksheet and run the test.

You can also clear a column assignment by double-clicking it in the Selected Columns list. the report appears. The first selected column is assigned to the first row in the Selected Columns list. see “Kruskal-Wallis Analysis of Variance on Ranks” on page 147. If you elected to test for normality and equal variance. For raw and indexed data.88 Chapter 4 Figure 4-25 The Pick Columns for One Way ANOVA Dialog Box Prompting You to Select Data Columns 4. 8. SigmaPlot performs the test for normality (Kolmogorov-Smirnov) and the test for equal variance (Levene Median). and all successively selected columns are assigned to successive rows in the list. or select the columns from the Data for Data drop-down list. SigmaPlot informs you and suggests continuing your analysis using a parametric t-test. Click Finish to perform the One Way ANOVA. . and your data fails either test. 7. select the assignment in the list. To assign the desired worksheet columns to the Selected Columns list. SigmaPlot warns you and suggests continuing your analysis using the nonparametric Kruskal-Wallis ANOVA on Ranks. The title of selected columns appears in each row. 6. you are prompted to select two worksheet columns. select the columns in the worksheet. 5. then select new column from the worksheet. To change your selections. Click Finish to run the One Way ANOVA on the selected columns. If you elected to test for normality and equal variance. If your data pass both tests. For more information. After the computations are completed.

the One Way ANOVA report appears after the test is complete. and the P value is not significant. see “Student-Newman-Keuls (SNK) Test” on page 164. but does not determine which groups are different. There are two types of multiple comparisons available for the One Way ANOVA. The P value produced by the ANOVA is displayed in the upper left corner of the dialog box. or you selected to always run multiple comparisons in the Options for One Way ANOVA dialog box. or the sizes of these differences. Student-Newman-Keuls Test. There are seven multiple comparison tests to choose from for the One Way ANOVA: Holm Sidak test. or you selected to always perform multiple comparisons.89 Comparing Two or More Groups If you selected to run multiple comparisons only when the P value is significant. For more information. see “Fisher’s Least Significance Difference Test” on page 165. the Multiple Comparisons Options dialog box appears prompting you to select a multiple comparison method. see “Multiple Comparison Options for a One Way ANOVA” on page 89. For more information. Multiple comparison tests isolate these differences by running comparisons between the experimental groups. the Multiple Comparison Options dialog box appears prompting you to specify a multiple comparison test. see “Duncan’s Multiple Range” on page 165. Fisher’s LSD. For more information. and the ANOVA produces a P value equal to or less than the trigger P value. see “Dunnett’s Test” on page 165. . If you selected to run multiple comparisons only when the P value is significant. For more information. see “Bonferroni t-Test” on page 164. Duncan’s Multiple Range Test. see “Tukey Test” on page 164. For more information. Tukey Test. If the P value for multiple comparisons is significant. For more information. For more information. The type of comparison you can make depends on the selected multiple comparison test. Bonferroni t-test. see “Holm-Sidak Test” on page 163. Dunnet’s Test. For more information. see “Performing a Multiple Comparison” on page 162. For more information. Multiple Comparison Options for a One Way ANOVA The One Way ANOVA tests the hypothesis of no differences between the several treatment groups.

The Tukey and Student-Newman-Keuls tests are recommended for determining the difference among all treatments. For more information. If you have only a few treatments or observations. If you have only a few treatments. you may want to select the simpler Bonferroni t-test. The Dunnett’s test is recommended for determining the differences between the experimental treatments and a control group. This table displays the sum of squares. You can also generate tables of multiple comparisons. The test used to perform the multiple comparison is selected in the Multiple Comparison Options dialog box. For descriptions of the derivations for One Way ANOVA results. see “Setting One Way ANOVA Options” on page 82. as well as the F statistic and the corresponding P value. Note: In both cases the Bonferroni t-test is most sensitive with a small number of groups. Interpreting One Way ANOVA Results The One Way ANOVA report displays an ANOVA table describing the source of the variation in the groups. you can select the simpler Bonferroni t-test. Multiple Comparison results are also specified in the Options for One Way ANOVA dialog box. Dunnett’s test is not available if you have less than six observations. Multiple comparisons versus a control compare all experimental treatments to a single control group. The statistical summary table of the data and other results displayed in the report are enabled and disabled in the Options for One Way ANOVA dialog box. and mean squares of the groups. you can reference any appropriate statistics reference.90 Chapter 4 All pairwise comparisons compare all possible pairs of treatments. degrees of freedom. .

Equal Variance test results display whether the data passed or failed the test of the assumption that the samples were drawn from populations with the . use the buttons in the formatting toolbar to move one page up and down in the report. Normality Test. Normally distributed source populations are required for all parametric tests. Set this result in the Options for One Way ANOVA dialog box. You can turn off this text on the Options dialog box. Result Explanations In addition to the numerical results. Normality test results display whether the data passed or failed the test of the assumption that they were drawn from a normal population and the P value calculated by the test. expanded explanations of the results may also appear.91 Comparing Two or More Groups Figure 4-26 Note: The report scroll bars only scroll to the top and bottom of the current page. To move to the next or the previous page in the report. Equal Variance Test.

Summary Table. the chance of erroneously reporting a difference α (alpha). about two-thirds will fall within one standard deviation above or below the mean. This result is set in the Options for One Way ANOVA dialog box. this is typically 100(1α ). The power of the performed test is displayed unless you disable this option in the Options for One Way ANOVA dialog box. differences of the means and standard deviations. The power. Larger values of confidence result in wider intervals and smaller values in smaller intervals. and the P value calculated by the test. SigmaPlot generates a summary table listing the sample sizes N. Equal variance of the source populations is assumed for all parametric tests. mean. of a One Way ANOVA is the probability that the test will detect a difference among the groups if there really is a difference. Missing. The level of confidence is adjusted in the options dialog box. you can conclude that there is a significant difference between the proportions with the level of confidence specified. A measure of variability. the number of groups being compared. Confidence Interval for the Difference of the Means. number of missing values. or 95%. This can also be described as P < α (alpha). and about 95% of the observations will fall within two standard deviations above or below the mean. Standard Deviation. where α is the acceptable probability of incorrectly concluding that there is a difference. ANOVA power is affected by the sample sizes. N (Size). and standard error of the means. If the observations are normally distributed. The closer the power is to 1. standard deviation. Standard Error of the Mean. If the observations are normally distributed. The number of missing values for that column or group. the observed differences of the group means. Power. Mean. The average value for the column. If the confidence interval does not include zero. A measure of the approximation with which the mean computed from the sample approximates the true population mean. The number of non-missing observations for that column or group. If you enabled this option in the Options for One Way ANOVA dialog box.92 Chapter 4 same variance. the more sensitive the test. and the observed standard deviations of the samples. . the mean is the center of the distribution. or sensitivity.

Degrees of freedom represent the number of groups and sample size which affects the sensitivity of the ANOVA. adjusted for the number of groups. DF (Degrees of Freedom). Larger values of α make it easier to conclude that there is a difference but also increase the risk of seeing a false difference (a Type I error). The ANOVA table lists the results of the one way ANOVA.05 which indicates that a one in twenty chance of error is acceptable. SS (Sum of Squares). The total degrees of freedom is a measure of the total sample size. A Type I error is when you reject the hypothesis of no effect when this hypothesis is true. The total sum of squares measures the total variability of the observations about the grand mean (mean of all observations). The mean squares provide two estimates of the population variances. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. The degrees of freedom within groups (sometimes called the error or residual degrees of freedom) is a measure of the total sample size. An α error is also called a Type I error. Comparing these variance estimates is the basis of analysis of variance. ANOVA Table. The sum of squares between the groups measures the variability of the average differences of the sample groups. The degrees of freedom between groups is a measure of the number of groups. The sum of squares is a measure of variability associated with each element in the ANOVA data table.93 Comparing Two or More Groups Alpha ( α ). The sum of squares within the groups (also called error or residual sum of squares) measures the underlying variability of all individual samples. Set this value in the Options for One Way ANOVA dialog box. the suggested value is α = 0. The mean square between groups is: The mean square within groups (also called the residual or error mean square) is: . but a greater possibility of concluding there is no difference when one exists (a Type II error). MS (Mean Squares). Smaller values of α result in stricter requirements before concluding there is a significant difference.

the data groups are consistent with the null hypothesis that all the samples were drawn from the same population). Duncan’s test and the Bonferroni t-test. The control group is selected during the actual multiple comparison procedure. Bonferroni t-test Results. Multiple Comparisons. The P value is the probability of being wrong in concluding that there is a true difference between the groups (for example.. the probability of falsely rejecting the null hypothesis. The Bonferroni t-test lists the differences of the means for each pair of groups. the greater the probability that the samples are drawn from different populations. and Duncan’s tests. you can conclude that at least one of the samples was drawn from a different population (i. If you selected to perform multiple comparisons.05. since the ANOVA results only inform you that two or more of the groups are different. based on F). The tests used in the multiple comparison procedure is selected in the Multiple Comparison Options dialog box. you can conclude that there are no significant differences between groups (for example.94 Chapter 4 F Statistic. a table of the comparisons between group pairs is displayed. The smaller the P value. Multiple comparison results are used to determine exactly which treatments are different. For descriptions of the derivations of parametric multiple comparison procedure results. All pairwise comparison results list comparisons of all possible combinations of group pairs. you can conclude that there are significant differences when P < 0. Fishers LSD. The multiple comparison procedure is activated in the Options for One Way ANOVA dialog box. you can reference any appropriate statistics reference. To determine exactly which groups are different. Traditionally. the variability is larger than what is expected from random variability in the population). The comparison versus a control tests are the Bonferroni t-test and the Dunnett’s. Comparisons versus a single control group list only comparisons with the selected control group. and displays whether or not P . The F test statistic is the ratio: If the F ratio is around 1. the all pairwise tests are the Tukey. or committing a Type I error.e. Student-Newman-Keuls. computes the t values for each pair. If F is a large number. P Value. examine the multiple comparison results. The specific type of multiple comparison results depends on the comparison test used and whether the comparison was made pairwise or versus a control. Fisher LSD.

the likelihood of erroneously concluding that there is a significant difference is less than 5%. the likelihood of being incorrect in concluding that there is a significant difference is less than 5%. Fisher LSD. If the P value for the comparison is less than 0. You can conclude from "large" values of t that the difference of the two groups being compared is statistically significant. You can conclude from "large" values of q that the difference of the two groups being compared is statistically significant. The Tukey. and Duncan’s tests are all pairwise comparisons of every combination of group pairs.05. If a group is found to be not significantly different than another group.05.95 Comparing Two or More Groups < 0. the larger q needs to be to indicate a significant difference. p is an indication of the differences in the ranks of the group means being compared. if you are comparing four means. and p is the number of means spanned in the comparison. Tukey. Groups means are ranked in order from largest to smallest. Fisher LSD. Duncan’s. all groups with p ranks in between the p ranks of the two groups that are not different are also assumed not to be significantly different.05 or < 0. you cannot confidently conclude that there is a difference. p is a parameter used when computing q. The Difference of the Means is a gauge of the size of the difference between the two groups. Student-Newman-Keuls (SNK). The difference of the means is a gauge of the size of the difference between the two groups. and Duncan’s can be used to compare a control group to other groups.05 for that comparison.05. and when comparing the second smallest to the smallest p = 2. Student-Newman-Keuls. The Bonferroni t-test can be used to compare all groups or to compare versus a control. and display whether or not P < 0. All tests compute the q test statistic.05.01 for that pair comparison. The larger the p. While the Tukey Fisher LSD. If it is greater than 0. . Dunnett’s test only compares a control group to all other groups. For example. and Dunnett’s Test Results. the p is always equal to the total number of groups. If the P value for the comparison is less than 0. you cannot confidently conclude that there is a difference. and a result of DNT (Do Not Test) appears for those comparisons. For the Tukey test. they are not recommended for this type of comparison. If it is greater than 0. when comparing the largest to the smallest p = 4.

For more information. see “Scatter Plot” on page 541. For more information. Multiple comparison graphs. Select the One Way ANOVA test report. How to Create a One Way ANOVA Report Graph 1. see “Normal Probability Plot” on page 549. see “Bar Charts of the Column Means” on page 540. The One Way ANOVA histogram plots the raw residuals in a specified range. . For more information.96 Chapter 4 One Way ANOVA Report Graphs You can generate up to five graphs using the results from a One Way ANOVA. From the menus select: Graph Create Result Graph The Create Result Graph dialog box appears displaying the types of graphs available for the One Way ANOVA results. The One Way ANOVA scatter plot graphs the group means as single points with error bars indicating the standard deviation. For more information. Histogram of the residuals. Normal probability plot of the residuals. using a defined interval set. see “Multiple Comparison Graphs” on page 555. 2. see “Histogram of Residuals” on page 547. Scatter plot with error bars of the column means. The One Way ANOVA probability plot graphs the frequency of the raw residuals. The One Way ANOVA bar chart plots the group means as vertical bars with error bars indicating the standard deviation. The One Way ANOVA multiple comparison graphs plot significant differences between levels of a significant factor. For more information. They include a: Bar chart of the column means.

or double-click the desired graph in the list. For more information. Select the type of graph you want to create from the Graph Type list. The selected graph appears in a graph window. see “Generating Report Graphs” on page 539. . then click OK.97 Comparing Two or More Groups Figure 4-27 3.

use the Three Way ANOVA. For more information. If the sample size is large. If your data is non-normal. use the Transforms menu Rank command to convert the observations to ranks. If you are considering the effects of three factors on your experimental graphs. For more information. see “One Way Analysis of Variance (ANOVA)” on page 80. use the One Way ANOVA. then run a Two or Three Way ANOVA on the ranks. see “Three Way Analysis of Variance (ANOVA)” on page 123. Samples are drawn from normally distributed populations with equal variances.98 Chapter 4 Figure 4-28 Two Way Analysis of Variance (ANOVA) Use a Two Way or Two Factor ANOVA (analysis of variance) when: You want to see if two of more different experimental groups are affected by two different factors which may or may not interact. SigmaPlot has no equivalent nonparametric two or three factor comparison for samples drawn from a non-normal population. and you want to do a nonparametric test. you can transform the data to make them comply better with the assumptions of analysis of variance using Transform menu commands. . If you want to consider the effects of only one factor on your experimental groups.

For more information. Performing a Two Way ANOVA To perform a Two Way ANOVA: 1. see “Arranging Two Way ANOVA Data” on page 100. there are two experimental factors which are varied for each experimental group. A two factor analysis of variance tests three hypotheses: There is no difference among the levels of the first factor. If desired. Two Way ANOVA is a parametric test that assumes that all the samples were drawn from normally distributed populations with the same variances. see “Setting Two Way ANOVA Options” on page 105. A two factor design is used to test for differences between samples grouped according to the levels of each factor and for interactions between the factors. For more information. the differences are the same regardless of the second factor level. see “Multiple Comparison Options for a Two Way ANOVA” on page 111. . see “Running a Two Way ANOVA” on page 109. There is no difference among the levels of the second factor. On the menus click: Statistics Compare Many Groups Two Way ANOVA 4.99 Comparing Two or More Groups About the Two Way ANOVA In a two way or two factor analysis of variance. if there is any difference among groups within one factor. for example. For more information. Enter or arrange your data appropriately in the worksheet. 5. 2. Run the test. There is no interaction between the factors. Specify the multiple comparisons you want to perform on your test. set Two Way ANOVA options. For more information. 3.

SigmaPlot detects this and provides the correct solutions. Generate report graphs. For more information. gender and drug are the factors. Arranging Two Way ANOVA Data The Two Way ANOVA tests for differences between samples grouped according to the levels of each factor and the interactions between the factors. View and interpret the Two Way ANOVA report. you can use a transform to convert it into an indexed format and then run the ANOVA. in an analysis of the effect of gender on the action of two different drugs. Indexing Raw Data for a Two-Way ANOVA The Two-Way ANOVA test requires that the data be entered as indexed data. For example. and the different combinations of the levels (gender and drug) are the groups. Drug Treatment could be another factor with three levels: Drug A. If your data is in a raw format. Drug C. see “Two Way ANOVA Report Graphs” on page 122. . In any Two-Way ANOVA. or cells. For more information.100 Chapter 4 6. see “Interpreting Two Way ANOVA Results” on page 114. drug types are the levels for the drug factor. each divided into a number of levels. male and female are the levels of the gender factor. see “Missing Data and Empty Cells Data” on page 101. Figure 4-29 How to Arrange Two Way ANOVA Data If your data is missing data points or even whole cells. there are two factors. Gender could be one factor with two levels: male and female. For example. For more information. 7. Drug B.

Note that the title of each column is composed of two names separated by a hyphen. For example. Click Finish. Tip: You can either select the columns from the worksheet. Select the first six columns for the input groups (this appears as Group: in the Selected Columns list). 2. 3. To convert this data to Indexed format: 1. From the menus select: Transforms Indexed Two-Way The Pick Columns for Two Way Index Columns dialog box appears. The number of columns equals the number of cells. The example above is a worksheet containing raw data for a Two-Way ANOVA. one from each factor. The names refer to levels from different factors. The data appears as indexed data in columns 7 through 9. . Select column 7 (or First Empty from the Data for Output drop-down list) as the Output:column. 4. or you can select each column individually from the Data for Group drop-down list. When the data for each cell is written into a column of the worksheet. is called a cell. all of the data measured for males receiving Drug A would be a cell. Missing Data and Empty Cells Data Ideally. however. For example. and so there are six cells in the ANOVA.101 Comparing Two or More Groups Each combination of two levels. There are six columns. the data for a Two Way ANOVA should be completely balanced. this is known as a "raw data format" for Two-Way ANOVA. each group or cell in the experiment has the same number of observations and there are no missing data. then the title of each column uses the names of the two levels. Since each column gives the data for combining two factor levels. SigmaPlot properly handles all occurrences of missing and unbalanced data automatically.

102 Chapter 4 Missing Data Points If there are missing values. Note: It can be dangerous to assume there is no interaction between the two factors in a Two Way ANOVA. there are no observations for a combination of two factor levels. SigmaPlot automatically handles the missing data by using a general linear model approach. Figure 4-31 Data for a Two Way ANOVA with a Missing Data Cell (Male/Drug A) Assumption of no interaction analyzes the main effects of each treatment separately. particularly if you are interested in studying the interaction effect. this assumption can lead to a meaningless analysis. This approach constructs hypothesis tests using the marginal sums of squares (also commonly called the Type III or adjusted sums of squares). or a One Way ANOVA. Under some circumstances. Figure 4-30 Data for a Two Way ANOVA with a Missing Value in the Male/Drug A Cell Empty Cells When there is an empty cell. for example. . SigmaPlot stops and suggests either analysis of the data using a two way design with the added assumption of no interaction between the factors.

Figure 4-33 Example of Connected Data that You Can’t Draw a Series of Straight Vertical and Horizontal Lines Through SigmaPlot automatically checks for this condition. This approach is the most conservative analysis because it requires no additional assumptions about the nature of the data or experimental design. Connected versus Disconnected Data The no interaction assumption does not always permit a two factor analysis when there is more than one empty cell. The non-empty cells must be geometrically connected in order to do the computation.103 Comparing Two or More Groups If you treat the problem as a One Way ANOVA. Figure 4-32 Example of Drawing Straight Horizontal and Vertical Lines Through Connected Data It is important to note that failure to meet the above requirement does not imply that the data is disconnected. SigmaPlot suggests treatment of the problem as a One Way ANOVA. each cell in the table is treated as a different level of a single experimental factor. The data in the table below. for example. is connected. Arrange data in a two-dimensional grid. which is guaranteed to be connected. without changing direction in an empty cell. If disconnected data is encountered during a Two Way ANOVA. You cannot perform Two Way ANOVAs on disconnected data. . you can reference any appropriate statistics reference. For descriptions of the concept of data connectivity. where you can draw a series of straight vertical and horizontal lines connecting all occupied cells.

a data point indexed two ways consists of the first factor in one column. the second factor in a second column. and the data point in a third column. Figure 4-35 Valid Data Formats for a Two Way ANOVA . Two factor indexed data is placed in three columns. Entering Worksheet Data A Two Way ANOVA can only be performed on two factor indexed data.104 Chapter 4 Figure 4-34 Disconnected Data Because this data is not geometrically connected (the data shares no factor levels in common) a two way ANOVA cannot be performed. even assuming no interaction.

2. Select Two Way ANOVA from the toolbar drop-down list. For more information. To continue the test. drag the pointer over the data. Note: If you are going to run the test after changing test options. 3. The Pick Columns dialog box appears. From the menus select: Statistics Current Test Options The Options for Two Way ANOVA dialog box appears with three tabs: Assumption Checking. For more information. Post Hoc Tests. click Run Test. 4. or sensitivity. see “Running a Two Way ANOVA” on page 109. drag the pointer over your data. see “Options Two Way ANOVA: Results” on page 107. Compute the power or sensitivity of the test. and want to select your data before you run the test. Results. Display the statistics summary and the confidence interval for the data in the report and save residuals to a worksheet column. If you are going to run the test after changing test options and want to select your data before you run the test. For more information.105 Comparing Two or More Groups Setting Two Way ANOVA Options Use the Two Way ANOVA options to: Adjust the parameters of the test to relax or restrict the testing of your data for normality and equal variance. . For more information. see “Options Two Way ANOVA: Post Hoc Tests” on page 108. Compute the power. Enable multiple comparison testing. Display the statistics summary table and confidence interval for the data. of the test. To change Two Way ANOVA options: 1. see “Options Two Way ANOVA: Assumption Checking” on page 106. Adjust the parameters of a test to relax or restrict the testing of your data for normality and equal variance. Options settings are saved between SigmaPlot sessions.

0.106 Chapter 4 5. Options Two Way ANOVA: Assumption Checking Select the Assumption Checking tab from the options dialog box to view the options for Normality and Equal Variance. SigmaPlot tests for equal variance by checking the variability about the group means. The equal variance assumption test checks the variability about the group means. To accept the current settings and close the options dialog box. SigmaPlot uses the Kolmogorov-Smirnov test to test for a normally distributed population. Larger values of P (for example. The normality assumption test checks for a normally distributed population.050. the test passes. the suggested value in SigmaPlot is 0. The P value determines the probability of being incorrect in concluding that the data is not normally distributed (the P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed). P Values for Normality and Equal Variance.100) require less evidence to conclude that data is not normal. Figure 4-36 The Options for Two Way ANOVA Dialog Box Displaying the Assumption Checking Options Normality Testing. increase the P value. Enter the corresponding P value in the P Value to Reject box. To require a stricter adherence to normality and equal variance. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions. click OK. If the P value computed by the test is greater than the P set here. . Equal Variance Testing.

Options Two Way ANOVA: Results Figure 4-37 The Options for Two Way ANOVA Dialog Box Displaying the Summary Table. however. Note: There are extreme conditions of data distribution that these tests cannot take into account. and the standard error of the mean for the column or group. the standard deviation of the column or group. For example. these conditions should be easily detected by simply examining the data without resorting to the automatic assumption tests. For example. Confidence Intervals. a P value of 0. Select Confidence Intervals under Report to display the confidence interval for the difference of the means. Requiring smaller values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal. the average value for the column or group. decrease P. the Levene Median test fails to detect differences in variance of several orders of magnitude. . and Residuals Options Summary Table.107 Comparing Two or More Groups To relax the requirement of normality and/or equal variance. To change the interval.050 requires greater deviations from normality to flag the data as non-normal than a value of 0. the number of missing values for a column or group. Select Summary Table under Report to display the number of observations for a column or group. enter any number from 1 to 99 (95 and 99 are the most commonly used intervals). Confidence Intervals.100.

Smaller values of α result in stricter requirements before concluding there is a significant difference. edit the number or select a number from the drop-down list. but also increase the risk of reporting a false positive. Larger values of α make it easier to conclude that there is a difference. or the sizes of these . The power or sensitivity of a test is the probability that the test will detect a difference between the groups if there is really a difference. Use Alpha Value. The Residuals in Column drop-down list displays residuals in the report.05. Multiple Comparisons Two Way ANOVAs test the hypothesis of no differences between the several treatment groups. This indicates that a one in twenty chance of error is acceptable.108 Chapter 4 Residuals in Column. but a greater possibility of concluding there is no difference when one exists. To save the residuals of the test to the specified worksheet column. or that you are willing to conclude there is a significant difference when P < 0. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. Change the alpha value by editing the number in the Alpha Value box. Options Two Way ANOVA: Post Hoc Tests Figure 4-38 Power.05. but do not determine which groups are different. The suggested value is α = 0.

1. Select to perform multiple comparisons whether or not the Two Way ANOVA detects a difference. Use multiple comparisons to isolate these differences whenever a Two Way ANOVA detects a difference. A value of . Running a Two Way ANOVA If you want to select your data before you run the test. see “Performing a Multiple Comparison” on page 162. prompting you to choose a multiple comparison method. a difference in the groups is detected and the multiple comparisons are performed. From the menus click: Statistics Compare Many Groups Two Way Anova The Pick Columns dialog box appears. Perform multiple comparisons only if the ANOVA detects a difference. Significance Value for Multiple Comparisons. For more information. Only When ANOVA P Value is Significant. A value of . This value determines the likelihood of the multiple comparison being incorrect in concluding that there is a significant difference in the treatments. the Multiple Comparison Options dialog box appears after you pick your data from the worksheet and run the test. Select either . The P value used to determine if the ANOVA detects a difference is set in the Report tab of the Options dialog box.05 or . Always Perform.109 Comparing Two or More Groups differences. .01. drag the pointer over your data.01 indicates that the multiple comparisons will detect a difference if there is less than 1% chance that the multiple comparison is incorrect in detecting a difference. If the P value produced by the Two Way ANOVA is less than the P value specified in the box. Note: If multiple comparisons are triggered.05 indicates that the multiple comparisons will detect a difference if there is less than 5% chance that the multiple comparison is incorrect in detecting a difference.

or is not otherwise unbalanced. If your data is missing data points. then select new column from the worksheet. 4. The Two Way ANOVA report appears if you: Elected to test for normality and equal variance. 5. or is otherwise unbalanced. or select the columns from the Data for Data drop-down list. select the columns in the worksheet. cells. To change your selections. The number or title of selected columns appear in each row. . you are prompted to perform the appropriate procedure. To assign the desired worksheet columns to the Selected Columns list.110 Chapter 4 Figure 4-39 The Pick Columns for Two ANOVA Dialog Box Prompting You to Select Data Columns 2. or if you selected to run multiple comparisons only when the P value is significant. 3. 6. missing cells. Click Finish to perform the Two Way ANOVA. and all successively selected columns are assigned to successive rows in the list. You can also clear a column assignment by double-clicking it in the Selected Columns list. The first selected column is assigned to the first row in the Selected Columns list. Your data has no missing data points. select the assignment in the list. If you elected to test for normality and equal variance. and the P value is not significant. and your data passes both tests. and your data fails either test. You are prompted to pick a minimum three worksheet columns. either continue or transform your data. Selected to not perform multiple comparisons. then perform the Two Way ANOVA on the transformed data.

and the ANOVA produces a P value. or you selected to always run multiple comparisons in the Options for Two Way ANOVA dialog box. see “Multiple Comparison Options for a Two Way ANOVA” on page 111. . the Multiple Comparison Options dialog box appears prompting you to specify a multiple comparison test. you cannot perform a Two Way ANOVA. If your data is not geometrically connected. for either of the two factors or the interaction between the two factors. Multiple Comparison Options for a Two Way ANOVA If you selected to run multiple comparisons only when the P value is significant. If the P value for multiple comparisons is significant. or converting the problem into a one way design with each non-empty cell a different level of a single factor. but the data is connected. you can proceed by either performing a two way analysis assuming no interaction between the factor. 7. For more information. but still have at least one observation in each cell. or you selected to always perform multiple comparisons. If you are missing a cell. the Multiple Comparisons Options dialog box appears prompting you to select a multiple comparison method.111 Comparing Two or More Groups If you are missing data points. SigmaPlot automatically proceeds with the Two Way ANOVA using a general linear model. equal to or less than the trigger P value.

The Tukey and Student-Newman-Keuls tests are recommended for determining the difference among all treatments. Only the options with P values less than or equal to the value set in the Options dialog box are selected. You can choose to perform the: Holm-Sidak test. see “Bonferroni t-Test” on page 164. Tukey Test. multiple comparison results are not reported. Bonferroni t-test. For more information. Duncan’s Multiple Range Test. For more information. There are seven multiple comparison tests to choose from for the Two Way ANOVA.112 Chapter 4 Figure 4-40 The Multiple Comparison Options Dialog Box for a Two Way ANOVA This dialog box displays the P values for each of the two experimental factors and of the interaction between the two factors. see “Student-Newman-Keuls (SNK) Test” on page 164. If you have only a few treatments. see “Holm-Sidak Test” on page 163. . For more information. For more information. Fisher’s LSD. You can disable multiple comparison testing for a factor by clicking the selected option. see “Tukey Test” on page 164. see “Dunnett’s Test” on page 165. For more information. see“Fisher’s Least Significance Difference Test” on page 165. see “Duncan’s Multiple Range” on page 165. If no factor is selected. For more information. Student-Newman-Keuls Test. For more information. Dunnet’s Test. you may want to select the simpler Bonferroni t-test.

interpreting multiple comparisons among different levels of each experimental factor may not be meaningful. and vice versa. The types of comparison you can make depends on the selected multiple comparison test. When the interaction is statistically significant. SigmaPlot also suggests performing a multiple comparison between all the cells. those groups that are and are not detectably different from each other. among the different rows and columns of the data table). All pairwise comparisons test the difference between each treatment or level within the two factors separately (for example. For more information. Dunnett’s test is not available if you have less than six observations.. When comparing the two factors separately. If you have only a few treatments or observations. you can select the simpler Bonferroni t-test. There are two types of multiple comparisons available for the Two Way ANOVA. These results should be used when the interaction is not statistically significant. the levels within one factor are compared among themselves without regard to the second factor. The result of all comparisons is a listing of the similar and different group pairs.113 Comparing Two or More Groups The Dunnett’s test is recommended for determining the differences between the experimental treatments and a control group. Because no . all the cells in the data table).e. Figure 4-41 The Multiple Comparison Options Dialog Box Prompting You to Select Control Groups Note: In both cases the Bonferroni t-test is most sensitive with a small number of groups. see “Two Way Analysis of Variance (ANOVA)” on page 98. i. Multiple comparisons versus a control test the difference between all the different combinations of each factors (for example.

From the menus select: Transforms Unindex Two Way 2. Select your two way indexed data columns as the input columns. For more information. multiple comparison procedures sometimes produce ambiguous groupings. This table displays the degrees of freedom. then run a One Way ANOVA. 5. Performing a One Way ANOVA on Two Way ANOVA Data When your data is missing too many observations to perform a valid Two Way ANOVA. Select an empty column as your first output column. . 4. sum of squares. see “Running a One Way ANOVA” on page 87. you can still analyze your data using a One Way ANOVA. 3. To perform a One Way ANOVA: 1. Select the output columns. Click Finish. as well as the F statistics and the corresponding P values.114 Chapter 4 statistical test eliminates uncertainty. and mean squares for each of the elements in the data table. Interpreting Two Way ANOVA Results A full Two Way ANOVA report displays an ANOVA table describing the variation associated with each factor and their interactions.

115 Comparing Two or More Groups Figure 4-42 Two Way ANOVA Report Summary tables of least square means for each factor and for both factors together can also be generated. For descriptions of the derivations for Two Way ANOVA results. For more information. For more information. . The tests used in the multiple comparisons are selected in the Multiple Comparisons Options dialog box. Multiple Comparison results are also specified in the Options for Two Way ANOVA dialog box. see “Multiple Comparison Options for a Two Way ANOVA” on page 111. Note: The report scroll bars only scroll to the top and bottom of the current page. This result and additional results are enabled in the Options for Two Way ANOVA dialog box. see “Setting Two Way ANOVA Options” on page 105. All options are saved between SigmaPlot sessions. Click a selected check box to enable or disable a test option. you can reference any appropriate statistics reference. To move to the next or the previous page in the report. You can also generate tables of multiple comparisons. use the buttons in the formatting toolbar to move one page up and down in the report.

the results shown are identical to One Way ANOVA results. Normality Test.116 Chapter 4 Result Explanations In addition to the numerical results. This result appears if you enabled equal variance testing in the Two Way ANOVA Options dialog box. For more information. This is the data column title of the indexed worksheet data you are analyzing with the Two Way ANOVA. If you performed a One Way ANOVA. The ANOVA table lists the results of the Two Way ANOVA. Note: When there are missing data. the best estimate of these values is automatically calculated using a general linear model. If your data contained missing values but no empty cells. you either analyzed the problem assuming either no interaction or treated the problem as a One Way ANOVA If you choose no interactions. This result appears if you enabled normality testing in the Two Way ANOVA Options dialog box. DF (Degrees of Freedom). If your data contained empty cells. which affects the sensitivity of the ANOVA. Equal Variance test results display whether or not the data passed or failed the test of the assumption that the samples were drawn from populations with the same variance and the P value calculated by the test. . expanded explanations of the results may also appear. Equal variance of the source population is assumed for all parametric tests. If There Were Missing Data Cells. You can also set the number of decimal places to display the Options dialog box. You can turn off this text on the Options dialog box. Dependent Variable. the report indicates the results were computed using a general linear model. Degrees of freedom represent the number of groups in each factor and the sample size. see “Interpreting One Way ANOVA Results” on page 90. Normality test results display whether the data passed or failed the test of the assumption that they were drawn from a normal population and the P value calculated by the test. Equal Variance Test. Determining if the values in this column are affected by the different factor levels is the objective of the Two Way ANOVA. ANOVA Table. no statistics for factor interaction are calculated. Normally distributed source populations are required for all parametric tests.

SS (Sum of Squares). The factor sums of squares measure the variability between the rows or columns of the table considered separately. The error degrees of freedom (sometimes called the residual or within groups degrees of freedom) is a measure of the sample size after accounting for the factors and interaction. The interaction degrees of freedom is a measure of the total number of cells. The sum of squares is a measure of variability associated with each element in the ANOVA data table. the total sum of squares equals the sum of the other table sums of squares. i. The interaction sum of squares measures the variability of the average differences between the cell in addition to the variation between the rows and columns. The interaction mean square is an estimate of the variance of the underlying population computed from the variability associated with the interactions of the factors. MS (Mean Squares). if there are no missing data. The error sum of squares (also called residual or within group sum of squares) is a measure of the underlying random variation in the data. The total degrees of freedom is a measure of the total sample size.e. considered separately—this is a gauge of the interaction between the factors.. . The mean square for each factor is an estimate of the variance of the underlying population computed from the variability between levels of the factor. The total sum of squares is a measure of the total variability in the data. Comparing these variance estimates is the basis of analysis of variance.117 Comparing Two or More Groups The degrees of freedom for each factor is a measure of the number of levels in each factor. the variability not associated with the factors or their interaction. The mean squares provide different estimates of the population variances.

05. or committing a Type I error.. or within groups) is an estimate of the variability in the underlying population. Power. the variability is larger than what is expected from random variability in the population). The closer the power is to 1. The P value is the probability of being wrong in concluding that there is a true difference between the groups (i. P Value. To determine exactly which groups are different. of a Two Way ANOVA is the probability that the test will detect the observed difference among the groups if there really is a difference. Traditionally.. the more sensitive the test. The power. examine the multiple comparison results. If F is a large number.e.e. F Statistic The F test statistic is provided for comparisons within each factor and between the factors.118 Chapter 4 The error mean square (residual. the data groups are consistent with the null hypothesis that all the samples were drawn from the same population). The F ratio to test each factor is The F ratio to test the interaction is If the F ratio is around 1. the greater the probability that the samples are drawn from different populations. The smaller the P value. you can conclude that there are no significant differences between factor levels or that there is no interaction between factors (i.e. based on F). the probability of falsely rejecting the null hypothesis. computed from the random component of the observations. The power for the comparison of . you can conclude that at least one of the samples for that factor or combination of factors was drawn from a different population (i.. you can conclude there are significant differences if P < 0. or sensitivity.

An α error also is called a Type I error (a Type I error is when you reject the hypothesis of no effect when this hypothesis is true). If the observations are normally distributed the mean is the center of the distribution. Smaller values of α result in stricter requirements before concluding there is a significant difference. but a greater possibility of concluding there is no difference when one exists (a Type II error). the observed differences of the group means. ANOVA power is affected by the sample sizes. The tests used in the multiple comparisons are set in the Multiple Comparisons Options dialog box. and for each combination of factors (summary table cells). Summary Table. using a general linear model. the number of groups being compared. Standard Error of the Mean. These means and standard errors are used when performing multiple comparisons (see following section). the suggested value is α = 0. the least squared means provide the best estimate of these values. When there are missing data. since the ANOVA results only inform you that two or more of the groups are different. The α value is set in the Options for Two Way ANOVA dialog box. the least square means equal the cell and marginal (row and column) means. and the observed standard deviations of the samples. the chance of erroneously reporting a difference α (alpha). Alpha ( α ) Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. If a difference is found among the groups. multiple comparison tables can be computed. These results are set in the Options for Two Way ANOVA dialog box. If there are missing values.119 Comparing Two or More Groups the groups within the two factors and the power for the comparison of the interactions are all displayed. but also increase the risk of seeing a false difference (a Type I error). The average value for the column.05 which indicates that a one in twenty chance of error is acceptable. Multiple comparison results are used to determine exactly which groups are different. Two factor multiple comparison for a full Two Way ANOVA also compares: . The least square means and standard error of the means are displayed for each factor separately (summary table row and column). the least square means are estimated using a general linear model. Multiple comparison procedures are activated in the Options for Two Way ANOVA dialog box. When there are no missing data. Multiple Comparisons. Larger values of α make it easier to conclude that there is a difference. A measure of the approximation with which the mean computed from the sample approximates the true population mean. Mean.

StudentNewman-Keuls. Duncan’s.120 Chapter 4 Groups within each factor without regard to the other factor (this is a marginal comparison.05. Duncan’s. Tukey. It is more powerful than the Tukey and Bonferroni tests and. All combinations of factors (all cells in the table are compared with each other). The Bonferroni t-test lists the differences of the means for each pair of groups. The Holm-Sidak Test can be used for both pairwise comparisons and comparisons versus a control group. Tukey. and displays whether or not P < 0. you can reference any appropriate statistics reference. When performing the test. All pairwise comparison results list comparisons of all possible combinations of group pairs. The Difference of Means is a gauge of the size of the difference between the levels or cells being compared. the likelihood of erroneously concluding that there is a significant difference is less than 5%. A P value less than the critical level indicates there is a significant difference between the corresponding two groups. It is recommended as the first-line procedure for pairwise comparison testing. the P values of all comparisons are computed and ordered from smallest to largest. and Dunnett’s. The specific type of multiple comparison results depends on the comparison test used and whether the comparison was made pairwise or versus a control. only the columns or rows in the table are compared).e. the rank of the P value. and the total number of comparisons made. and Bonferroni t-test.05. The control group is selected during the actual multiple comparison procedure. you cannot confidently conclude that there is a difference. Holm-Sidak Test Results. For descriptions of the derivations of two way multiple comparison procedure results. You can conclude from "large" values of t that the difference of the two groups being compared is statistically significant. Dunnett’s and Bonferroni t-test. The Bonferroni t-test can be used to compare all groups or to compare versus a control. it is able to detect differences that these other tests do not. i. the all pairwise tests are the Holm-Sidak. Student-NewmanKeuls. The comparison versus a control tests are Holm-Sidak. consequently. . If the P value for the comparison is less than 0.05 for that comparison. Fisher LSD.. If it is greater than 0. Each P value is then compared to a critical level that depends upon the significance level of the test (set in the test options). Fisher LSD. computes the t values for each pair. Bonferroni t-test Results. Comparisons versus a single control group list only comparisons with the selected control group.

and Dunnett’s Test Results.05 for that pair comparison. This is the same as the error or residual degrees of freedom. the number of means spanned in the comparison p. Tukey. . Fisher LSD. Fisher LSD. and Duncan’s can be used to compare a control group to other groups. The degrees of freedom when comparing all cells is a measure of the sample size after accounting for the factors and interaction (this is the same as the error or residual degrees of freedom). they are not recommended for this type of comparison. For example. you cannot confidently conclude that there is a difference. Student-Newman-Keuls (SNK). The larger the p.05. If it is greater than 0. All tests compute the q test statistic. and when comparing the second smallest to the smallest p = 2. all groups with p ranks in between the p ranks of the two groups that are not different are also assumed not to be significantly different. If a group is found to be not significantly different than another group. While the Tukey Fisher LSD. Duncan’s. p is the parameter used when computing q. when comparing four means. The degrees of freedom when comparing all cells is a measure of the sample size after accounting for the factors and interaction.05. Student-Newman-Keuls. p is an indication of the differences in the ranks of the group means being compared. and Duncan’s tests are all pairwise comparisons of every combination of group pairs. and a result of DNT (Do Not Test) appears for those comparisons. The Tukey.121 Comparing Two or More Groups The degrees of freedom (DF) for the marginal comparisons are a measure of the number of groups (levels) within the factor being compared. and p is the number of means spanned in the comparison. Groups means are ranked in order from largest to smallest. and display whether or not P < 0. comparing the largest to the smallest p = 4. the larger q needs to be to indicate a significant difference. the likelihood of being incorrect in concluding that there is a significant difference is less than 5%. Dunnett’s test only compares a control group to all other groups. The Difference of Means is a gauge of the size of the difference between the groups or cells being compared. If the P value for the comparison is less than 0. You can conclude from "large" values of q that the difference of the two groups being compared is statistically significant. The degrees of freedom (DF) for the marginal comparisons are a measure of the number of groups (levels) within the factor being compared.

The selected graph appears in a graph window. For more information. For more information. see “Histogram of Residuals” on page 547. 3D plot of the residuals. For more information. They include a: Histogram of the residuals. Select the Two Way ANOVA test report.122 Chapter 4 Two Way ANOVA Report Graphs You can generate up to seven graphs using the results from a Two Way ANOVA. see “Generating Report Graphs” on page 539. Click OK. see “Grouped Bar Chart with Error Bars” on page 552. Multiple comparison graphs. 3D category scatter plot. For more information. For more information. see “3D Residual Scatter Plot” on page 551. 4. . For more information. How to Create a Two Way ANOVA Report Graph 1. 2. or double-click the desired graph in the list. see “3D Category Scatter Graph” on page 553. Select the type of graph you want to create from the Graph Type list. 3. see “Multiple Comparison Graphs” on page 555. Normal probability plot of the residuals. From the menus select: Graph Create Result Graph The Create Result Graph dialog box appears displaying the types of graphs available for the Two Way ANOVA results. For more information. see “Normal Probability Plot” on page 549. Grouped bar chart of the column means.

Samples are drawn from normally distributed populations with equal variances. SigmaPlot has no equivalent nonparametric three factor comparison for samples drawn from a non-normal population. use the Transforms menu Rank command to convert the observations to ranks. you can transform the data to make them comply better with the assumptions of analysis of variance using Transforms menu commands. then run a Three Way ANOVA on the ranks. If your data is nonnormal. . and you want to do a nonparametric test. or a Two Way ANOVA.123 Comparing Two or More Groups Figure 4-43 A Multiple Comparison for the Two Way ANOVA Three Way Analysis of Variance (ANOVA) Use a Three Way or three factor ANOVA (analysis of variance) when: You want to see if two or more different experimental groups are affected by three different factors which may or may not interact. If the sample size is large.

. if there is any difference among groups within one factor. There is no difference among the levels of the second factor. If desired. Three Way ANOVA is a parametric test that assumes that all the samples were drawn from normally distributed populations with the same variances. Enter or arrange your data appropriately in the worksheet. for example. the differences are the same regardless of the second and third factor levels. see “Arranging Three Way ANOVA Data” on page 125. set the Three Way ANOVA options. From the menus select: Statistics Compare Many Groups Three Way ANOVA 4. see “Running a Three Way ANOVA” on page 134. 3. 5. For more information. see “Setting Three Way ANOVA Options” on page 129. There is no interaction between the factors. Specify the multiple comparisons you want to perform on your test. For more information. 2. A three factor analysis of variance tests four hypotheses: There is no difference among the levels of the first factor. Run the test. see “Multiple Comparison Options for a Three Way ANOVA” on page 136. There is no difference among the levels of the third factor. For more information. there are three experimental factors which are varied for each experimental group.124 Chapter 4 About the Three Way ANOVA In a three way or three factor analysis of variance. Performing a Three Way ANOVA To perform a Three Way ANOVA: 1. For more information. A three factor design is used to test for differences between samples grouped according to the levels of each factor and for interactions between the factors.

View and interpret the Three Way ANOVA report. drugs. and time period are the factors. Drug A/Drug B. in an analysis of the effect of gender on the action of two different drugs over different periods of time. or cells. and 3. 2. and Day 1. and the different combinations of the levels (gender. 7. and time period) are the groups. see “Interpreting Three Way ANOVA Results” on page 139. male and female are the levels of the gender factor. SigmaPlot detects this and provides the correct solutions. For more information. For more information. The levels are Male/Female. For example. days are the levels of the time period factor. gender. drug. Generate report graphs. Figure 4-44 Data for a Three Way ANOVA The factors are gender. see “Missing Data and Empty Cells Data” on page 101. If your data is missing data points or even whole cells. Arranging Three Way ANOVA Data The Three Way ANOVA tests for differences between samples grouped according to the levels of each factor and the interactions between the factors. drug. For more information. see “Three Way ANOVA Report Graphs” on page 146.125 Comparing Two or More Groups 6. . drug types are the levels for the drug factor. and time period.

and column 4 is the data. column 2 is the second factor index. . If there are missing values. SigmaPlot automatically handles the missing data by using a general linear model approach. This approach constructs hypothesis tests using the marginal sums of squares (also commonly called the Type III or adjusted sums of squares). Missing Data and Empty Cells Data Ideally. the data for a Three Way ANOVA should be completely balanced. Missing Data Points. column 3 is the third factor index. each group or cell in the experiment has the same number of observations and there are no missing data. For example. SigmaPlot properly handles all occurrences of missing and unbalanced data automatically. however.126 Chapter 4 Figure 4-45 Valid Data Formats for a Three Way ANOVA Column 1 is the first factor index.

Day 1 Cell Use a general linear model approach in these situations. Select the factor you want to remove. a dialog box appears asking you if you want to analyze the data using a two way or a one way design. If you treat the problem as a Two Way ANOVA. Empty Cells. If you select a two way design.. If you treat the problem as a One Way ANOVA. a dialog box appears prompting you to remove one of the factors. When there is an empty cell. each cell in the table is treated as a single experimental factor.e. For more information. Drug A. Figure 4-47 Data for a Three Way ANOVA with a Missing Cell (Male/Drug A. there are no observations for a combination of three factor levels. i. see “Two Way Analysis of Variance (ANOVA)” on page 98. Note: It can be dangerous to assume there is no interaction between the three factors in a Three Way ANOVA. Under some circumstances. this assumption can lead to a . SigmaPlot attempts to analyze your data using two interactions. The Two Way ANOVA is performed.127 Comparing Two or More Groups Figure 4-46 Data for a Three Way ANOVA with a Missing Value in the Male. Day 1) You can use either a two factor analysis or assume no interaction between factors. then click OK. This approach is the most conservative analysis because it requires no additional assumptions about the nature of the data or experimental design. Assumption of no interaction analyzes the main effects of each treatment separately.

for example. Data arranged in a two-dimensional grid.128 Chapter 4 meaningless analysis. a One Way ANOVA is performed. Figure 4-48 Example of Drawing Straight Horizontal and Vertical Lines through Connected Data It is important to note that failure to meet the above requirement does not imply that the data is disconnected. If the disconnected data is still encountered during a Two Way ANOVA. . The non-empty cells must be geometrically connected in order to do the computation. If disconnected data is encountered during a Three Way ANOVA. is guaranteed to be connected. you can reference any appropriate statistics reference. You cannot perform Three Way ANOVAs on disconnected data. where you can draw a series of straight vertical and horizontal lines connecting all occupied cells. The data in the table below. particularly if you are interested in studying the interaction effect. Figure 4-49 Example of Connected Data that You Can’t Draw a Series of Straight Vertical and Horizontal Lines Through SigmaPlot automatically checks for this condition. without changing direction in an empty cell. Connected versus Disconnected Data The no interaction assumption does not always permit a two factor analysis when there is more than one empty cell. is connected. SigmaPlot suggests treatment of the problem as a Two Way ANOVA. For descriptions of the concept of data connectivity.

Include the statistics summary table and confidence interval for the data in the report. a data point indexed three ways consists of the first factor in one column. the second factor in a second column. Select Three Way ANOVA from the Standard toolbar drop-down list. To set Three Way ANOVA options: 1. drag the pointer over the data. If you are going to run the test after changing test options and want to select your data before you run the test. and the data in a forth column. a Three Way ANOVA cannot be performed. Compute the power. Enable multiple comparison testing. the third factor in a third column. Three factor indexed data is placed in four columns. Entering Worksheet Data A Three Way ANOVA can only be performed on three factor indexed data. or sensitivity. of the test. and save residuals to the worksheet. .129 Comparing Two or More Groups Figure 4-50 Disconnected Data Because this data is not geometrically connected (they share no factor levels in common). even assuming no interaction. 2. Setting Three Way ANOVA Options Use the Three Way ANOVA options to: Adjust the parameters of the test to relax or restrict the testing of your data for normality and equal variance.

The normality assumption test checks for a normally distributed population. Note: If you are going to run the test after changing test options. Post Hoc Tests. Results. To accept the current settings and close the options dialog box. Options for Three Way ANOVA: Assumption Checking Select the Assumption Checking tab from the options dialog box to view the Normality and Equal Variance options. For more information. To continue the test. click OK. 5. For more information.130 Chapter 4 3. see “Running a Two Way ANOVA” on page 109. and want to select your data before you run the test. For more information. The Pick Columns dialog box appears. click Run Test. The equal variance assumption test checks the variability about the group means. . Adjust the parameters of a test to relax or restrict the testing of your data for normality and equal variance. drag the pointer over your data. see “Options Two Way ANOVA: Post Hoc Tests” on page 108. For more information. Display the statistics summary and the confidence interval for the data in the report and save residuals to a worksheet column. see “Options Two Way ANOVA: Results” on page 107. Options settings are saved between SigmaPlot sessions. From the menus select: Statistics Current Test Options The Options for Three Way ANOVA dialog box appears with three tabs: Assumption Checking. 4. see “Options Two Way ANOVA: Assumption Checking” on page 106. Compute the power or sensitivity of the test.

To require a stricter adherence to normality and/or equal variance. If the P value computed by the test is greater than the P set here. To relax the requirement of normality and/or equal variance. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions. the suggested value in SigmaPlot is 0. SigmaPlot uses the Kolmogorov-Smirnov test to test for a normally distributed population.050 requires greater deviations from normality to flag the data as non-normal than a value of 0. Equal Variance Testing. 0.100. Requiring smaller values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal. For example. a P value of 0. the Levene Median test fails to detect differences in variance of . The P value determines the probability of being incorrect in concluding that the data is not normally distributed (the P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed). Type the corresponding P value in the P Value to Reject box. the test passes.100) require less evidence to conclude that data is not normal. SigmaPlot tests for equal variance by checking the variability about the group means.050. Note: There are extreme conditions of data distribution that these tests cannot take into account. Larger values of P (for example.131 Comparing Two or More Groups Figure 4-51 The Options for Three Way ANOVA Dialog Box Displaying the Assumption Checking Options Normality Testing. increase the P value. P Values for Normality and Equal Variance. For example. decrease P.

. Confidence Intervals. To change the interval. Select Summary Table under Report to display the number of observations for a column or group. the standard deviation of the column or group. the number of missing values for a column or group.132 Chapter 4 several orders of magnitude. and the standard error of the mean for the column or group. these conditions should be easily detected by simply examining the data without resorting to the automatic assumption tests. Edit the number or select a number from the drop-down list. Residuals in Column. enter any number from 1 to 99 (95 and 99 are the most commonly used intervals). Confidence Intervals. the average value for the column or group. and Residual Options Summary Table. However. Select Confidence Intervals under Report to display the confidence interval for the difference of the means. The Residuals in Column drop-down list displays residuals in the report and to save the residuals of the test to the specified worksheet column. Options for Three Way ANOVA: Results Figure 4-52 The Options for Three Way ANOVA Dialog Box Displaying the Summary Table.

Multiple Comparisons Three Way ANOVAs test the hypothesis of no differences between the several treatment groups. but do not determine which groups are different. or that you are willing to conclude there is a significant difference when P < 0. Use Alpha Value. Larger values of α make it easier to conclude that there is a difference. Smaller values of α result in stricter requirements before concluding there is a significant difference. The power or sensitivity of a test is the probability that the test will detect a difference between the groups if there is really a difference.133 Comparing Two or More Groups Options for Three Way ANOVA: Post Hoc Tests Figure 4-53 The Options for Three Way ANOVA Dialog Box Displaying the Power and Multiple Comparisons Options Power. Change the alpha value by editing the number in the Alpha Value box. but also increase the risk of reporting a false positive. but a greater possibility of concluding there is no difference when one exists. Multiple comparisons isolate these differences whenever a Three Way ANOVA detects a difference. The suggested value is α = 0. If the P value produced by the Three Way ANOVA is .05. or the sizes of these differences. This indicates that a one in twenty chance of error is acceptable. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference.05. The P value used to determine if the ANOVA detects a difference is set in the Report tab of the Options dialog box.

prompting you to choose a multiple comparison test. This value determines that the likelihood of the multiple comparison being incorrect in concluding that there is a significant difference in the treatments. Note: If multiple comparisons are triggered.05 or . Always Perform. A value of .10 from the Significance Value for Multiple Comparisons drop-down list. a difference in the groups is detected and the multiple comparisons are performed. Only When ANOVA P Value is Significant. drag the pointer over your data. Significant Multiple Comparison Value. Perform multiple comparisons only if the ANOVA detects a difference.134 Chapter 4 less than the P value specified in the box. the Multiple Comparison Options dialog box appears after you pick your data from the worksheet and run the test. Select either . 1. From the menus click: Statistics Compare Many Groups Three Way ANOVA The Pick Columns dialog box appears. Running a Three Way ANOVA If you want to select your data before you run the test. Select to perform multiple comparisons whether or not the Two Way ANOVA detects a difference.05 indicates that the multiple comparisons will detect a difference if there is less than 5% chance that the multiple comparison is incorrect in detecting a difference. A value of . .10 indicates that the multiple comparisons will detect a difference if there is less than 10% chance that the multiple comparison is incorrect in detecting a difference.

To edit the report. cells. You are prompted to pick a minimum of three worksheet columns. 4. and your data passes both tests. either continue or transform your data. select the columns in the worksheet.135 Comparing Two or More Groups Figure 4-54 The Pick Columns for Three Way ANOVA Dialog Box 2. To change your selections. The number or title of selected columns appear in each row. . or if you selected to run multiple comparisons only when the P value is significant. To assign the desired worksheet columns to the Selected Columns list. The Three Way ANOVA report appears if you: Elected to test for normality and equal variance. Your data has no missing data points. If you elected to test for normality and equal variance. or select the columns from the Data for Data drop-down list. use the Format menu commands. 3. Selected not to perform multiple comparisons. 5. The first selected column is assigned to the first row in the Selected Columns list. select the assignment in the list. then select new column from the worksheet. and your data fails either test. Click Finish to perform the Three Way ANOVA. and all successively selected columns are assigned to successive rows in the list. You can also clear a column assignment by double-clicking it in the Selected Columns list. and the P value is not significant. then perform the Three Way ANOVA on the transformed data. or is not otherwise unbalanced.

the Multiple Comparisons Options dialog box appears prompting you to select a multiple comparison method. equal to or less than the trigger P value. . Either treat the problem as a Two Way ANOVA. or is otherwise unbalanced. For more information. you are prompted to perform the appropriate procedure. the Multiple Comparison Options dialog box appears. If your data is not geometrically connected. see “Two Way Analysis of Variance (ANOVA)” on page 98. and the ANOVA produces a P value. or cancel the test. If you are missing data points. missing cells. or you selected to always perform multiple comparisons. For more information. you cannot perform a Three Way ANOVA. but still have at least one observation in each cell. If the P value for multiple comparisons is significant. you can proceed by either performing a three way analysis assuming no interaction between the factor. but the data is connected.136 Chapter 4 If your data is missing data points. or converting the problem into a two way design with each non-empty cell a different level of two factor. Multiple Comparison Options for a Three Way ANOVA If you enabled multiple comparisons in the Three Way ANOVA Options dialog box. If you are missing a cell. for either of the three factors or the interaction between the three factors.For more information. see “Arranging Three Way ANOVA Data” on page 125. SigmaPlot automatically proceeds with the Three Way ANOVA using a general linear model. see “Multiple Comparison Options for a Three Way ANOVA” on page 136.

If no factor is selected.137 Comparing Two or More Groups Figure 4-55 The Multiple Comparison Options Dialog Box for a Three Way ANOVA This dialog box displays the P values for each of the experimental factors and of the interaction between the factors. For more information. Bonferroni t-test. see “Bonferroni t-Test” on page 164. see “Duncan’s Multiple Range” on page 165. You can disable multiple comparison testing for a factor by clicking the selected option. . There are seven multiple comparison tests to choose from for the Three Way ANOVA. Tukey Test. see “Tukey Test” on page 164. For more information. For more information. see “Fisher’s Least Significance Difference Test” on page 165. You can choose to perform the: Holm-Sidak test. see “Holm-Sidak Test” on page 163. For more information. multiple comparison results are not reported. For more information. Only the options with P values less than or equal to the value set in the Options dialog box are selected. see “Dunnett’s Test” on page 165. Dunnet’s Test. Fisher’s LSDFor more information. For more information. Student-Newman-Keuls Test. see “Student-Newman-Keuls (SNK) Test” on page 164. Duncan’s Multiple Range Test.

e. These results should be used when the interaction is not statistically significant. those groups that are and are not detectably different from each other. among the different rows and columns of the data table). When the interaction is statistically significant. the levels within one factor are compared among themselves without regard to the second factor. all the cells in the data table). All pairwise comparisons test the difference between each treatment or level within the two factors separately (for example. Multiple comparisons versus a control test the difference between all the different combinations of each factors (for example.138 Chapter 4 Figure 4-56 The Multiple Comparison Options Dialog Box Prompting You to Select a Control Group There are two types of multiple comparison available for the Three Way ANOVA. The result of both comparisons is a listing of the similar and different group pairs. and vice versa. for example. multiple comparison procedures sometimes produce ambiguous groupings. When comparing the two factors separately. among the different rows and columns of the data table) .. All pairwise comparisons test the difference between each treatment or level within the two factors separately (for example. Multiple comparisons versus a control test the difference between all the different combinations of each factors (i. The types of comparison you can make depends on the selected multiple comparison test. Because no statistical test eliminates uncertainty. . all the cells in the data table). SigmaPlot also suggests performing a multiple comparison between all the cells. interpreting multiple comparisons among different levels of each experimental factor may not be meaningful.

For descriptions of the derivations for Three Way ANOVA results. You can turn off this text on the Options dialog box. . All options are saved between SigmaPlot sessions. sum of squares. and mean squares for each of the elements in the data table. Summary tables of least square means for each factor and for all three factors together can also be generated. For more information. To move to the next or the previous page in the report. This table displays the degrees of freedom. expanded explanations of the results may also appear. The tests used in the multiple comparisons are selected in the Multiple Comparisons Options dialog box. as well as the F statistics and the corresponding P values. You can also generate tables of multiple comparisons. You can also set the number of decimal places to display the Options dialog box.139 Comparing Two or More Groups Interpreting Three Way ANOVA Results A full Three Way ANOVA report displays an ANOVA table describing the variation associated with each factor and their interactions. see “Setting Three Way ANOVA Options” on page 129. This result and additional results are enabled in the Options for Three Way ANOVA dialog box. you can reference any appropriate statistics reference. Click a check box to enable or disable a test option. Note: The report scroll bars only scroll to the top and bottom of the current page. Multiple Comparison results are also specified in the Options for Three Way ANOVA dialog box. Result Explanations In addition to the numerical results. use buttons in the formatting toolbar to move one page up and down in the report.

you either analyzed the problem assuming either no interaction or treated the problem as a Two or One Way ANOVA. Dependent Variable. For more information. see “Interpreting One Way ANOVA Results” on page 90. the report indicates the results were computed using a general linear model. This is the data column title of the indexed worksheet data you are analyzing with the Three Way ANOVA. If you choose no interactions.140 Chapter 4 Figure 4-57 Three Way ANOVA Report If your data contained missing values but no empty cells. Determining if the values in this column are affected by the different factor levels is the objective of the Three Way ANOVA. . If your data contained empty cells. no statistics for factor interaction are calculated If you performed a Two or One Way ANOVA. the results shown are identical to Two and One Way ANOVA results.

The error degrees of freedom (sometimes called the residual or within groups degrees of freedom) is a measure of the sample size after accounting for the factors and interaction. The sum of squares is a measure of variability associated with each element in the ANOVA data table. The interaction degrees of freedom is a measure of the total number of cells. which affects the sensitivity of the ANOVA. the best estimate of these values is automatically calculated using a general linear model. Note: When there are missing data.141 Comparing Two or More Groups Normality Test Normality test results display whether the data passed or failed the test of the assumption that they were drawn from a normal population and the P value calculated by the test. The total degrees of freedom is a measure of the total sample size . ANOVA Table The ANOVA table lists the results of the Three Way ANOVA. Degrees of freedom represent the number of groups in each factor and the sample size. . This result appears if you enabled equal variance testing in the Options for Three Way ANOVA dialog box. Equal variance of the source population is assumed for all parametric tests. DF (Degrees of Freedom). Equal Variance Test Equal Variance test results display whether or not the data passed or failed the test of the assumption that the samples were drawn from populations with the same variance and the P value calculated by the test. Normally distributed source populations are required for all parametric tests. This result appears if you enabled normality testing in the Options for Three Way ANOVA dialog box. The degrees of freedom for each factor is a measure of the number of levels in each factor. SS (Sum of Squares).

The error sum of squares (also called residual or within group sum of squares) is a measure of the underlying random variation in the data.e.142 Chapter 4 The factor sums of squares measure the variability between the rows or columns of the table considered separately. or within groups): is an estimate of the variability in the underlying population. The interaction sum of squares measures the variability of the average differences between the cell in addition to the variation between the rows and columns. F Statistic. The error mean square (residual. i. The F test statistic is provided for comparisons within each factor and between the factors.. considered separately—this is a gauge of the interaction between the factors. The interaction mean square: is an estimate of the variance of the underlying population computed from the variability associated with the interactions of the factors. The mean square for each factor: is an estimate of the variance of the underlying population computed from the variability between levels of the factor. the total sum of squares equals the sum of the other table sums of squares. The F ratio to test each factor is: . Comparing these variance estimates is the basis of analysis of variance. computed from the random component of the observations. the variability not associated with the factors or their interaction. The mean squares provide different estimates of the population variances. The total sum of squares is a measure of the total variability in the data. MS (Mean Squares). if there are no missing data.

the greater the probability that the samples are drawn from different populations. or committing a Type I error. the chance of erroneously reporting a difference α (alpha).05. The power for the comparison of the groups within the two factors and the power for the comparison of the interactions are all displayed. or sensitivity. and the observed standard deviations of the samples. These results are set in the Options for Three Way ANOVA dialog box.e.e. .143 Comparing Two or More Groups The F ratio to test the interaction is: If the F ratio is around 1.. To determine exactly which groups are different. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. The P value is the probability of being wrong in concluding that there is a true difference between the groups (i. examine the multiple comparison results. Power The power. of a Three Way ANOVA is the probability that the test will detect the observed difference among the groups if there really is a difference. you can conclude that there are no significant differences between factor levels or that there is no interaction between factors (i. based on F). Traditionally. the probability of falsely rejecting the null hypothesis. the data groups are consistent with the null hypothesis that all the samples were drawn from the same population).. P Value. The smaller the P value. If F is a large number. the number of groups being compared. the more sensitive the test. the variability is larger than what is expected from random variability in the population). An α error also is called a Type I error (a Type I error is when you reject the hypothesis of no effect when this hypothesis is true). ANOVA power is affected by the sample sizes. the observed differences of the group means.e. you can conclude that at least one of the samples for that factor or combination of factors was drawn from a different population (i. you can conclude there are significant differences if P < 0. The closer the power is to 1. Alpha ( α )..

for example. multiple comparison tables can be computed. using a general linear model. If the observations are normally distributed the mean is the center of the distribution. If there are missing values. but a greater possibility of concluding there is no difference when one exists (a Type II error). Multiple comparison procedures are activated in the Options for Three Way ANOVA dialog box. When there are missing data. and for each combination of factors (summary table cells). the least square means equal the cell and marginal (row and column) means. see “Multiple Comparisons” below. Larger values of α make it easier to conclude that there is a difference. When there are no missing data. since the ANOVA results only inform you that three or more of the groups are different. Summary Table The least square means and standard error of the means are displayed for each factor separately (summary table row and column).144 Chapter 4 Set the value in the Options for Three Way ANOVA dialog box. These means and standard errors are used when performing multiple comparisons. the least squared means provide the best estimate of these values. Mean. only the columns or rows in the table are compared). A measure of the approximation with which the mean computed from the sample approximates the true population mean. All combinations of factors (all cells in the table are compared with each other). but also increase the risk of seeing a false difference (a Type I error). Smaller values of α result in stricter requirements before concluding there is a significant difference.05 which indicates that a one in twenty chance of error is acceptable. Standard Error of the Mean. the least square means are estimated using a general linear model. Multiple Comparisons If a difference is found among the groups. Use multiple comparison results to determine exactly which groups are different. For more information. The average value for the column. the suggested value is α = 0. The tests used in the multiple comparisons are set in the Multiple Comparisons Options dialog box. Three factor multiple comparison for a full Three Way ANOVA also compares: Groups within each factor without regard to the other factor (this is a marginal comparison. .

145 Comparing Two or More Groups The specific type of multiple comparison results depends on the comparison test used and whether the comparison was made pairwise or versus a control. and Bonferroni t-test. All tests compute the q test statistic and display whether or not P < 0. Tukey.05. Fisher LSD. and Dunnett’s Test Results. The control group is selected during the actual multiple comparison procedure.05 for that pair comparison.05.05 for that comparison. they are not recommended for this type of comparison. and displays whether or not P < 0. you cannot confidently conclude that there is a difference. and Dunnett’s. . This is the same as the error or residual degrees of freedom. The Bonferroni t-test can be used to compare all groups or to compare versus a control. you can reference any appropriate statistics reference. All pairwise comparison results list comparisons of all possible combinations of group pairs. the likelihood of erroneously concluding that there is a significant difference is less than 5%. The degrees of freedom (DF) for the marginal comparisons are a measure of the number of groups (levels) within the factor being compared. Student-Newman-Keuls. You can conclude from "large" values of q that the difference of the two groups being compared is statistically significant. The Tukey. For descriptions of the derivations of three way multiple comparison procedure results. Duncan’s. The degrees of freedom when comparing all cells is a measure of the sample size after accounting for the factors and interaction. Dunnett’s test only compares a control group to all other groups. If it is greater than 0. and Duncan’s can be used to compare a control group to other groups. If the P value for the comparison is less than 0. The comparison versus a control tests are Bonferroni t-test and Dunnett’s test. computes the t values for each pair. Comparisons versus a single control group list only comparisons with the selected control group. Student-Newman-Keuls. Bonferroni t-test Results The Bonferroni t-test lists the differences of the means for each pair of groups. Fisher LSD. While the Tukey Fisher LSD. Duncan’s. the all pairwise tests are the Tukey. You can conclude from "large" values of t that the difference of the two groups being compared is statistically significant. Fisher LSD. The Difference of Means is a gauge of the size of the difference between the levels or cells being compared. and Duncan’s tests are all pairwise comparisons of every combination of group pairs. Student-Newman-Keuls (SNK).

05. The degrees of freedom when comparing all cells is a measure of the sample size after accounting for the factors and interaction (this is the same as the error or residual degrees of freedom). the likelihood of being incorrect in concluding that there is a significant difference is less than 5%. They include a: Histogram of the residuals. The degrees of freedom (DF) for the marginal comparisons are a measure of the number of groups (levels) within the factor being compared. Multiple comparison graphs. the larger q needs to be to indicate a significant difference. If a group is found to be not significantly different than another group. Three Way ANOVA Report Graphs You can generate up to four graphs using the results from a Three Way ANOVA. p is an indication of the differences in the ranks of the group means being compared. Groups means are ranked in order from largest to smallest. The larger the p.146 Chapter 4 If the P value for the comparison is less than 0. For more information. and when comparing the second smallest to the smallest p = 2. and p is the number of means spanned in the comparison. p is a parameter used when computing q. and a result of DNT (Do Not Test) appears for those comparisons. . For more information.05. see “Normal Probability Plot” on page 549. For more information. comparing the largest to the smallest p = 4. you cannot confidently conclude that there is a difference. see “Histogram of Residuals” on page 547. If it is greater than 0. For example. all groups with p ranks in between the p ranks of the two groups that are not different are also assumed not to be significantly different. when comparing four means. Normal probability plot of the residuals. The Difference of Means is a gauge of the size of the difference between the groups or cells being compared. Select the Three Way ANOVA test report. How to Create a Three Way ANOVA Report Graph 1. see “Multiple Comparison Graphs” on page 555.

Your samples are drawn from non-normal populations or do not have equal variances. When there are only two groups to compare. Kruskal-Wallis Analysis of Variance on Ranks Use a Kruskal-Wallis ANOVA (analysis of variance) on Ranks when: You want to see if three or more different experimental groups are affected by a single factor. For more information. Select the type of graph you want to create from the Graph Type list.For more information. see “Mann-Whitney Rank Sum Test” . From the menus select: Graph Create Result Graph The Create Result Graph dialog box appears displaying the types of graphs available for the Three Way ANOVA results. Figure 4-58 3.147 Comparing Two or More Groups 2. use One Way ANOVA. If you know that your data were drawn from normal populations with equal variances. do a MannWhitney Rank Sum Test. The selected graph appears in a graph window. then click OK. see “One Way Analysis of Variance (ANOVA)” on page 80.

If you try to perform an ANOVA on Ranks on two groups. 3. Specify the multiple comparisons you want to perform on your test. Note: If you selected normality testing in the Options for ANOVA on Ranks dialog box to perform an ANOVA on Ranks on a normal population. From the menus select: Statistics Compare Many Groups ANOVA on Ranks 4. Run the test. . SigmaPlot informs you that the data is suitable for a parametric test. The Kruskal-Wallis ANOVA on Ranks is a nonparametric test that does not require assuming all the samples were drawn from normally distributed populations with equal variances. 2. There is no two or three factor test for non-normal populations. Enter or arrange your data appropriately in the worksheet. set the ANOVA on Ranks options. Performing an ANOVA on Ranks To perform an ANOVA on Ranks: 1. 5. If desired. About the Kruskal-Wallis ANOVA on Ranks The Kruskal-Wallis Analysis of Variance on Ranks compares several different experimental groups that receive different treatments. For more information. however. This design is essentially the same as a Mann-Whitney Rank Sum Test.148 Chapter 4 on page 70. The null hypothesis you test is that there is no difference in the distribution of values between the different groups. you can transform your data using Transform menu commands so that it fits the assumptions of a parametric test. and suggests a One Way ANOVA instead. see “Mann-Whitney Rank Sum Test” on page 70. except that there are more than two experimental groups. SigmaPlot tells you to perform a Rank Sum Test instead.

with column 4 as the factor column and column 5 as the data column. If you have less than three treatments you should use the Rank Sum Test. up to 64. Arranging ANOVA on Ranks Data The format of the data to be tested can be raw data or indexed data. see “Mann-Whitney Rank Sum Test” on page 70. Generate report graphs. 7. View and interpret the ANOVA on Ranks report. Columns 4 and 5 are arranged as indexed data. .149 Comparing Two or More Groups 6. Raw data is placed in as many columns as there are groups. For more information. Indexed data is placed in two worksheet columns with at least three treatments. each column contains the data for one group. Figure 4-59 Valid Data Formats for an ANOVA on Ranks Columns 1 through 3 are arranged as raw data.

click OK. drag the pointer over your data. Display the summary table. . 2. To accept the current settings. Note: If you are going to run the test after changing test options. Adjust the parameters of a test to relax or restrict the testing of your data for normality and equal variance. For more information. see “Options for ANOVA on Ranks: Assumption Checking” on page 151. For more information. 3. To change the ANOVA on Ranks options: 1. Select ANOVA on Ranks from the Standard toolbar drop-down list. Post Hoc Test. see “Options for ANOVA on Ranks: Post Hoc Tests” on page 152.150 Chapter 4 Setting the ANOVA on Ranks Options Use the ANOVA on Ranks options to: Adjust the parameters of the test to relax or restrict the testing of your data for normality and equal variance. For more information. and want to select your data before you run the test. see “Options for ANOVA on Ranks: Results” on page 152. click Run Test. Display the statistics summary and the confidence interval for the data in the report and save residuals to a worksheet column. From the menus select: Statistics Current Test Options The Options for ANOVA on Ranks dialog box appears with three tabs: Assumption Checking. Enable multiple comparison testing. Compute the power or sensitivity of the test and enable multiple comparisons. To continue the test. 4. Results.

To require a stricter adherence to normality and/or equal variance. The equal variance assumption test checks the variability about the group means. The P value determines the probability of being incorrect in concluding that the data is not normally distributed (the P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed). Requiring smaller values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the . SigmaPlot uses the Kolmogorov-Smirnov test to test for a normally distributed population. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions. Equal Variance. Larger values of P (for example. The normality assumption test checks for a normally distributed population. P Values for Normality and Equal Variance.100) require less evidence to conclude that data is not normal. decrease P. Figure 4-60 Normality.151 Comparing Two or More Groups Options for ANOVA on Ranks: Assumption Checking Click the Assumption Checking tab from the options dialog box to view the Normality and Equal Variance options. To relax the requirement of normality and equal variance. 0. SigmaPlot tests for equal variance by checking the variability about the group means. the test passes.050. Enter the corresponding P value in the P Value to Reject box. If the P value computed by the test is greater than the P set here. increase the P value. the suggested value in SigmaPlot is 0.

Options for ANOVA on Ranks: Results The Summary Table for a Rank Sum Test lists the medians. but does not determine which groups are different. these conditions should be easily detected by simply examining the data without resorting to the automatic assumption tests. Note: There are extreme conditions of data distribution that these tests cannot take into account. For example. The P value used to determine if the ANOVA detects a difference is set in the Report Options dialog box.152 Chapter 4 data as non-normal. Multiple comparisons isolate these differences. or the size of these differences. change the percentile values by editing the boxes. The 25th and the 75th percentiles are the suggested percentiles. a P value of 0. An ANOVA on Ranks tests the hypothesis of no differences between the several treatment groups. percentiles. For example.050 requires greater deviations from normality to flag the data as non-normal than a value of 0. If desired. and sample sizes N in the Rank Sum test report. Figure 4-61 The Options for ANOVA on Ranks Dialog Box Displaying the Summary Table Option Options for ANOVA on Ranks: Post Hoc Tests Select the Post Hoc Test tab in the Options dialog box to view the multiple comparisons options. the Levene Median test fails to detect differences in variance of several orders of magnitude. If the P value produced by the ANOVA on Ranks is less than the .100. However.

Figure 4-62 Multiple Comparisons. You can choose to always perform multiple comparisons or to only perform multiple comparisons if the ANOVA on Ranks detects a difference. see “Multiple Comparison Options for ANOVA on Ranks” on page 156. For more information. Significance Value for Multiple Comparisons. A value of . Always Perform. the Multiple Comparison Options dialog box appears after you pick your data from the worksheet and run the test. Note: Because no statistical test eliminates uncertainty. multiple comparison tests sometimes produce ambiguous groupings. prompting you to choose a multiple comparison method. Select to perform multiple comparisons whether or not the ANOVA detects a difference. Select a value from the Significance Value for Multiple Comparisons drop-down list. Note: If multiple comparisons are triggered. Only When ANOVA P Value is Significant. . This value determines the likelihood of the multiple comparison being incorrect in concluding that there is a significant difference in the treatments.153 Comparing Two or More Groups P value specified in the box. a difference in the groups is detected and the multiple comparisons are performed. Select to perform multiple comparisons only if the ANOVA detects a difference.05 indicates that the multiple comparisons will detect a difference if there is less than 5% chance that the multiple comparison is incorrect in detecting a difference.

4. The first selected column is assigned to the first row in the Selected Columns list. From the menus select: Statistics Compare Many Groups ANOVA on Ranks The Pick Columns for ANOVA on Ranks dialog box appears prompting you to specify a data format.154 Chapter 4 Running an ANOVA on Ranks If you want to select your data before you run the test. To assign the desired worksheet columns to the Selected Columns list. drag the pointer over your data. and all successively selected columns are assigned to successive rows in the list. or select the columns from the Data for Data drop-down list. If you selected columns before you chose the test. the selected columns appear in the Selected Columns list. select the columns in the worksheet. To run an ANOVA on Ranks: 1. Figure 4-63 The Pick Columns for ANOVA on Ranks Dialog Box Prompting You to Specify A Data Format 2. Click Next to pick the data columns for the test. For more information. see “Data Format for Group Comparison Tests” on page 52. . Select the appropriate data format from the Data Format drop-down list. 3.

either continue or transform your data. see “Mann-Whitney Rank Sum Test” on page 70. For more information. select the assignment in the list. 8. and your data passes both tests. To change your selections. . You are prompted to pick a minimum of two and a maximum of 64 columns for raw data and two columns with at least three treatments are selected for indexed data. If you have less than three treatments. For more information. and the P value is not significant. Click Finish to perform the ANOVA on Ranks. or if you selected to run multiple comparisons only when the P value is significant. then select new column from the worksheet. 6. Selected not perform multiple comparisons. then perform the Two Way ANOVA on the transformed data. 7. and your data fails either test. The ANOVA on Ranks report appears if you: Elected to test for normality and equal variance. use the Format menu commands. To edit the report. 5. see “Interpreting ANOVA on Ranks Results” on page 157. a message appears telling you to use the Rank Sum Test. You can also clear a column assignment by double-clicking it in the Selected Columns list.155 Comparing Two or More Groups Figure 4-64 The Pick Columns for ANOVA on Ranks Dialog Box Prompting You to Select Data Columns The number or title of selected columns appear in each row. If you elected to test for normality and equal variance.

see “Multiple Comparison Options for ANOVA on Ranks” on page 156. Multiple Comparison Options for ANOVA on Ranks If you selected to run multiple comparisons only when the P value is significant. . This dialog box displays the P values for each of the two experimental factors and of the interaction between the two factors.156 Chapter 4 9. for either of the two factors or the interaction between the two factors. You can choose to perform the: Dunn’s Test Dunnett’s Test Tukey Test Student-Newman-Keuls Test There are two types of multiple comparison available for the ANOVA on Ranks. You can disable multiple comparison testing for a factor by clicking the selected option. Only the options with P values less than or equal to the value set in the Options dialog box are selected. all the cells in the data table). There are four multiple comparison tests to choose from for the ANOVA on Ranks.e. multiple comparison results are not reported. among the different rows and columns of the data table). All pairwise comparisons test the difference between each treatment or level within the two factors separately (i. the Multiple Comparison Options dialog box appears prompting you to specify a multiple comparison test. or you selected to always run multiple comparisons in the Options for ANOVA on Ranks dialog box. the Multiple Comparisons Options dialog box appears prompting you to select a multiple comparison method. For more information. If no factor is selected. If the P value for multiple comparisons is significant. or you selected to always perform multiple comparisons. equal to or less than the trigger P value.. Multiple comparisons versus a control test the difference between all the different combinations of each factor (i.e. The types of comparison you can make depends on the selected multiple comparison test. and the ANOVA produces a P value..

expanded explanations of the results may also appear. you can reference any appropriate statistics reference. The other results displayed in the report are enabled and disabled in the Options for ANOVA on Ranks dialog box. use the buttons in the formatting toolbar to move one page up and down in the report. You can turn off this text on the Options dialog box.157 Comparing Two or More Groups Interpreting ANOVA on Ranks Results The ANOVA on Ranks report displays the H statistic (corrected for ties) and the corresponding P value for H. . Result Explanations In addition to the numerical results. For descriptions of the derivations for ANOVA on Ranks results. Figure 4-65 The ANOVA on Ranks Results Report Note: The report scroll bars only scroll to the top and bottom of the current page. To move to the next or the previous page in the report.

H Statistic . the percentiles defined in the Options dialog box. this test can fail. The "middle" observation as computed by listing all the observations from smallest to largest and selecting the largest value of the smallest half of the observations. N (Size). since nonparametric tests do not assume normally distributed source populations. Equal Variance Test Equal Variance test results display whether or not the data passed or failed the test of the assumption that the samples were drawn from populations with the same variance and the P value calculated by the test. For nonparametric procedures. The two percentile points that define the upper and lower tails of the observed values. The median observation has an equal number of observations greater than and less than that observation.158 Chapter 4 You can also set the number of decimal places to display the Options dialog box. The number of non-missing observations for that column or group. These results appear unless you disabled normality testing in the Options for ANOVA on Ranks dialog box. Nonparametric tests do not assume equal variances of the source populations. Normality Test Normality test results display whether the data passed or failed the test of the assumption that it was drawn from a normal population and the P value calculated by the test. Summary Table If you selected this option in the Options for ANOVA on Ranks dialog box. Missing. SigmaPlot generates a summary table listing the medians. The number of missing values for that column or group. These results appear unless you disabled equal variance testing in the Options for ANOVA on Ranks dialog box. Median. and sample sizes N. Percentiles.

The test used in the multiple comparison procedure is selected in the Multiple Comparison Options dialog box. Student-Newman-Keuls test and Dunn’s test.e. the variability among the average ranks is larger than expected from random variability in the population. The comparison versus a control tests are Dunnett’s test and Dunn’s test. since the ANOVA results only inform you that two or more of the groups are different. The smaller the P value. the differences between the groups are statistically significant).. the actual distribution of H is used. All pairwise comparison results list comparisons of all possible combinations of group pairs: the all pairwise tests are the Tukey. The specific type of multiple comparison results depends on the comparison test used and whether the comparison was made pairwise or versus a control. The P value is the probability of being wrong in concluding that there is a true difference in the groups (i. Multiple Comparisons If a difference is found among the groups. The control group is selected during the actual multiple comparison procedure. this value is compared to the chi-square distribution (the estimate of all possible distributions of H) to determine the possibility of this H occurring. The average value of the ranks for each treatment group are computed and compared. a table of the comparisons between group pairs is displayed. If H is small. you can conclude there are significant differences when P < 0.e. The multiple comparison procedure is activated in the Options for ANOVA on Ranks dialog box.05. no treatment effect)..159 Comparing Two or More Groups The ANOVA on Ranks test statistic H is computed by ranking all observations from smallest to largest without regard for treatment group. You can conclude that the data is consistent with the null hypothesis that all the samples were drawn from the same population (i. the average ranks observed in each treatment group are approximately the same. If H is a large number. Comparisons versus a single control list only comparisons with the selected control group. . based on H). or committing a Type I error. Multiple comparison results are used to determine exactly which groups are different. and you can conclude that the samples were drawn from different populations (i.e. P Value. and you requested and elected to perform multiple comparisons. the greater the probability that the samples are significantly different. Traditionally. For small sample sizes.. For large sample sizes. the probability of falsely rejecting the null hypothesis.

160 Chapter 4 For descriptions of the derivations of nonparametric multiple comparison results. For example. Dunnett’s test only compares a control group to all other groups.05. computes the Q test statistic.01 for that pair comparison. so p is the number of rank sums spanned in the comparison.05. If a group is found to be not significantly different than another group. They also display the number of rank sums spanned in the comparison p. All tests compute the q test statistic.05. Group rank sums are ranked in order from largest to smallest in an SNK or Dunnett’s test. If it is greater than 0. and when comparing the second smallest to the smallest p = 2. If the P value for the comparison is less than 0. Dunn’s Test Results Dunn’s test is used to compare all groups or to compare versus a control. p is an indication of the differences in the ranks of the group means being compared.05. Tukey. You can conclude from "large" values of Q that the difference of the two groups being compared is statistically significant. comparing the largest to the smallest p = 4. You can conclude from "large" values of q that the difference of the two groups being compared is statistically significant. Dunn’s test lists the difference of rank means. all groups with ranks in between the rank sums of the two groups that are not different are also assumed not to be significantly different. . and a result of DNT (Do Not Test) appears for those comparisons. and display whether or not P < 0. the larger q needs to be to indicate a significant difference. p is a parameter used when computing q or. Student-Newman-Keuls. If it is greater than 0. The Difference of Ranks is a gauge of the size of the real difference between the two groups. If the P value for the comparison is less than 0. the probability of being incorrect in concluding that there is a significant difference is less than 5%. for each group pair. The larger the p. you can reference any appropriate statistics reference. and displays whether or not P < 0. when comparing four rank sums. you cannot confidently conclude that there is a difference. the likelihood of being incorrect in concluding that there is a significant difference is less than 5%. you cannot confidently conclude that there is a difference.05. and Dunnett’s Test Results The Tukey and Student-Newman-Keuls (SNK) tests are all pairwise comparisons of every combination of group pairs.05 or < 0.

3. ANOVA on Ranks Report Graphs You can generate up to three graphs using the results from an ANOVA on Ranks. 2. For more information. then click OK. Select the type of graph you want to create from the Graph Type list. Multiple comparison graphs. The selected graph appears in a graph window. How to Create an ANOVA on Ranks Graph 1. . They include a: Point plot of the column data. Select the ANOVA on Ranks test report. From the menus select: Graph Create Graph The Create Result Graph dialog box appears displaying the types of graphs available for the ANOVA on Ranks results.161 Comparing Two or More Groups The Difference of Rank Means is a gauge of the size of the difference between the two groups. see “Generating Report Graphs” on page 539. Box plot.

Click Cancel if you do not want to perform a multiple comparison test. Figure 4-66 The Multiple Comparison Options Dialog Box .162 Chapter 4 Performing a Multiple Comparison The multiple comparison test you choose depends on the treatments you are testing.

. It is not recommended for the Tukey. Select which factors you wish to compare under Select Factors to Compare. The Multiple Comparisons Options dialog box prompts you to select a control group. Fisher LSD. and Duncan’s tests. The types of comparisons available depend on the selected test. This option is automatically selected if the P value produced by the ANOVA (displayed in the upper left corner of the dialog box) is less than or equal to the P value set in the Options dialog box. Select a Comparison Type. All Pairwise compares all possible pairs of treatments and is available for the Tukey.163 Comparing Two or More Groups To perform a multiple comparison test: 1. click Finish to continue with the test and view the report. 5. If you select Versus Control. or Duncan’s test. It is more powerful than the Tukey and Bonferroni tests and. Fisher LSD. multiple comparisons are not performed. see “Interpreting One Way ANOVA Results” on page 90. Select the desired multiple comparison test from the Suggested Test drop-down list. 6. 2. Select the desired control group from the list. For more information. is able to detect differences that these other tests do not. then click Finish to continue the test and view the report. If the P value displayed in the dialog box is greater than the P value set in the Options dialog box. It is recommended as the first-line procedure for pairwise comparison testing. Bonferroni. and multiple comparisons are performed. Student-Newman-Keuls. If you selected an all pairwise comparison test. 4. If you selected a multiple comparisons versus a control test. you must also select the control group from the list of groups. Holm-Sidak Test Use the Holm-Sidak Test for both pairwise comparisons and comparisons versus a control group. 3. Versus Control compares all experimental treatments to a single control group and is available for the Tukey. and Duncan’s tests. click Next. Bonferroni. Dunnett’s. consequently. Fisher LSD.

The StudentNewman-Keuls Test is usually more sensitive than the Bonferroni t-test. because it controls the errors of all comparisons simultaneously.164 Chapter 4 When performing the test. For less conservative all pairwise . the rank of the P value. and the total number of comparisons made. Bonferroni t-Test The Bonferroni t-test performs pairwise comparisons with paired t-tests. Student-Newman-Keuls (SNK) Test The Student-Newman-Keuls Test and the Tukey Test are conducted similarly to the Bonferroni t-test. it is more likely to determine that a give differences is statistically significant. while the Student-Newman-Keuls test controls errors among tests of k means. except that they use a table of critical values that is computed based on a better mathematical model of the probability structure of the multiple comparisons. the P values of all comparisons are computed and ordered from smallest to largest. and is only available for all pairwise comparisons. Because it is more conservative. The Student-Newman-Keuls Test is less conservative than the Tukey Test because it controls errors among tests of k means. it is less likely to determine that a give differences is statistically significant and it is the recommended test for all pairwise comparisons. and is the most conservative test for both each comparison type. Each P value is then compared to a critical level that depends upon the significance level of the test (set in the test options). The P values are then multiplied by the number of comparisons that were made. Because it is less conservative. except that they use a table of critical values that is computed based on a better mathematical model of the probability structure of the multiple comparisons. Tukey Test The Tukey Test and the Student-Newman-Keuls test are conducted similarly to the Bonferroni t-test. The Tukey Test is more conservative than the Student-Newman-Keuls test. It can perform both all pairwise comparisons and multiple comparisons versus a control. while the Tukey Test controls the errors of all comparisons simultaneously. A P value less than the critical level indicates there is a significant difference between the corresponding two groups.

165 Comparing Two or More Groups comparison tests. by allowing a wider range for error rates. see the Dunnett’s Test. Duncan’s Multiple Range The Duncan’s Test is the same way as the Tukey and the Student-Newman-Keuls tests. but with a more sophisticated mathematical model of the way the error accumulates in order to derive the associated table of critical values for hypothesis testing. Because of this it is not recommended. Dunnett’s Test Dunnett’s test is the analog of the Student-Newman-Keuls Test for the case of multiple comparisons against a single control group. Fisher’s Least Significance Difference Test Fisher’s Least Significant Difference (LSD) Test is the least conservative of the allpairwise comparison tests. and is. and for the less conservative multiple comparison versus a control tests. therefore. see the Tukey and the Student-Newman-Keuls tests. This test is less conservative than the Bonferroni Test. it has less control over the Type 1 error rate. The all pairwise Dunn’s test is the default for data with missing values. except that it is less conservative in determining whether the difference between groups is significant. and is only available for multiple comparisons versus a control. Dunn’s test Dunn’s test must be used for ANOVA on Ranks when the sample sizes in the different treatment groups are different. it controls the error rate of individual comparisons and does not control the family error rate. not recommended. where the "family" is the whole set of comparisons. It is conducted similarly to the Bonferroni t-test. Unlike the Tukey and Student-Newman-Keuls tests. . You can perform both all pairwise comparisons and multiple comparisons versus a control with the Dunn’s test. Although it has a greater power to detect differences than the Tukey and the Student-Newman-Keuls tests.

166 Chapter 4 .

For more information. see “Arranging One Sample t-Test Data” on page 168. Enter or arrange your data appropriately in the worksheet. For more information. 167 . 5. For more information. see “Interpreting One Sample t-Test Results” on page 172. For more information. set the t-test options. Run the test. 3. see “Running a One Sample t-Test” on page 171. see “Setting One Sample t-Test Data Options” on page 168. From the menus select: Statistics Single Group One Sample t-test 4. 2.Chapter 5 One Sample t-Test About the One Sample t-Test Use the One-Sample t-Test when you want to test the hypothesis that the mean of a sampled normally-distributed population equals a specified value. Performing a One Sample t-Test To perform an a One Sample t-test: 1. If desired. View and interpret the t-test report.

Larger values of P (for example. sample size. SigmaPlot uses the Shapiro-Wilk or Kolmogorov-Smirnov test to test for a normally distributed population. This data format places the mean.100) require less evidence to conclude that data is not normal. The default setting is 0. the suggested value in SigmaPlot is 0. . population mean. the test passes. see “"t-Test Report Graphs" in Chapter 4. decrease the P value. Arranging One Sample t-Test Data The format of the data to be tested can be: Raw. sample size. P Values for Normality and Equal Variance. For more information. Normality Testing. The raw data format uses separate worksheet columns for the data in each group. Mean. and standard deviation in separate worksheet columns. size. or hypothesized.050. and standard error in separate worksheet columns. size.168 Chapter 5 6. Options for One Sample t-Test: Assumption Checking The normality assumption test checks for a normally distributed population. standard error. standard deviation. If the P computed by the test is greater than the P set here. The P value determines the probability of being incorrect in concluding that the data is not normally distributed (P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed). Generate report graphs. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions. To require a stricter adherence to normality and equal variance. Enter the test. Mean. This data format places the mean. Setting One Sample t-Test Data Options Options for One Sample t-Test: Criterion Test Mean. 0.

Displays the confidence interval for the difference of the means. To change the interval. however. these conditions should be easily detected by simply examining the data without resorting to the automatic assumption tests. the number of missing values for a column or group. the Levene Median test fails to detect differences in variance of several orders of magnitude. Note: There are extreme conditions of data distribution that these tests cannot take into account. . For example.169 One Sample t-Test To relax the requirement of normality and equal variance. a P value of 0. Options for One Sample t-Test: Results Summary Table. Confidence Intervals. the average value for the column or group. Edit the number or select a number from the drop-down list.100 requires greater deviations from normality to flag the data as non-normal than a value of 0. enter any number from 1 to 99 (95 and 99 are the most commonly used intervals). the standard deviation of the column or group. Displays residuals in the report and to save the residuals of the test to the specified worksheet column. Displays the number of observations for a column or group. Requiring larger values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal.050. increase P. Residuals in Column. For example. and the standard error of the mean for the column or group.

Use Alpha Value. or that you are willing to conclude there is a significant difference when P < 0. . but a greater possibility of concluding there is no difference when one exists.05. Confidence Intervals.05. but also increase the risk of reporting a false positive.170 Chapter 5 Figure 5-1 The Options for One Sample t-Test Dialog Box Displaying the Summary Table. Larger values of a make it easier to conclude that there is a difference. The suggested value is 0. The power or sensitivity of a test is the probability that the test will detect a difference between the groups if there is really a difference. Smaller values of a result in stricter requirements before concluding there is a significant difference. Alpha is the acceptable probability of incorrectly concluding that there is a difference. and Residuals Options Options for One Sample t-Test: Post Hoc Tests Power. This indicates that a one in twenty chance of error is acceptable.

1. For more information. or select the columns from the Data for Data drop-down list. From the menus select: Statistics Compare Two Groups One Sample t-test The Pick Columns for t-test dialog box appears prompting you to specify a data format. 3.171 One Sample t-Test Figure 5-2 The Options for One Sample t-Test Dialog Box Displaying the Power Option Running a One Sample t-Test If you want to select your data before you run the test. 4. Click Next to pick the data columns for the test. you are prompted . The first selected column is assigned to the first row in the Selected Columns list. To assign the desired worksheet columns to the Selected Columns list. For raw and indexed data. drag the pointer over your data. the selected columns appear in the Selected Columns list. The title of selected columns appears in each row. select the columns in the worksheet. and all successively selected columns are assigned to successive rows in the list. see “Arranging One Sample t-Test Data” on page 168. Select the appropriate data format from the Data Format drop-down list. 2. If you selected columns before you chose the test.

N (Size). After the computations are completed. You can enable or disable this explanatory text in the Options dialog box. To move to the next or the previous page in the report. number of missing values. you can reference any appropriate statistics reference. The other results displayed in the report are enabled and disabled in the Options for t-Test dialog box. These results are displayed in the One Sample t-Test report which automatically appears after the One Sample t-Test is performed. Normality Test. and the standard error of the means (SEM). expanded explanations of the results may also appear. All parametric tests require normally distributed source populations. Summary Table. 6. and P value of the specified data. . use the up and down arrow buttons in the formatting toolbar to move one page up and down in the report. 5. The number of non-missing observations for that column or group. the report appears. Click Finish to run the t-test on the selected columns.172 Chapter 5 to select two worksheet columns. standard deviations. SigmaPlot can generate a summary table listing the sizes N for the two samples. To change your selections. Note: The report scroll bars only scroll to the top and bottom of the current page. then select new column from the worksheet. For statistical summary data you are prompted to select three columns. select the assignment in the list. Normality test results show whether the data passed or failed the test of the assumption that the samples were drawn from normal populations and the P value calculated by the test. degrees of freedom. You can also clear a column assignment by double-clicking it in the Selected Columns list. For descriptions of the derivations for t-test results. means. This result is displayed unless you disable Summary Table in the Options for t-test dialog box. Result Explanations In addition to the numerical results. Interpreting One Sample t-Test Results The One Sample t-test calculates the t statistic.

173 One Sample t-Test

Missing. The number of missing values for that column or group. Mean. The average value for the column. If the observations are normally distributed, the mean is the center of the distribution. Standard Deviation. A measure of variability. If the observations are normally distributed, about two-thirds will fall within one standard deviation above or below the mean, and about 95% of the observations will fall within two standard deviations above or below the mean. Standard Error of the Mean. A measure of the approximation with which the mean computed from the sample approximates the true population mean.

One Sample t-Test Report Graphs
You can generate up to three graphs using the results from a t-test. They include a: Scatter plot with error bars of the column means. The one sample t-test scatter plot graphs the group means as single points with error bars indicating the standard deviation. For more information, see "Scatter Plot" in Chapter 11. Histogram of the residuals. The one sample t-test histogram plots the raw residuals in a specified range, using a defined interval set.For more information, see “Histogram of Residuals” on page 547. Normal probability plot of the residuals. The one sample t-test probability plot graphs the frequency of the raw residuals.For more information, see “Normal Probability Plot” on page 549.

How to Create a Graph of the One Sample t-Test Data
1. Select the One Sample t-Test report. 2. On the menus choose:
Graph Create Graph

The Create Graph dialog box appears displaying the types of graphs available for the One Sample t-Test results.

174 Chapter 5

Figure 5-3 The Create Graph Dialog Box for the One Sample t-test Report

3. Select the type of graph you want to create from the Graph Type list, then click OK, or double-click the desired graph in the list. The selected graph appears in a graph window. For more information, see “Generating Report Graphs” on page 539.

Chapter

6

Comparing Repeated Measurements of the Same Individuals

Use repeated measures procedures to test for differences in same individuals before and after one or more different treatments or changes in condition. When comparing random samples from two or more groups consisting of different individuals, use group comparison tests. For more information, see “Choosing the Procedure to Use” on page 22.

About Repeated Measures Tests
Repeated measures tests are used to detect significant differences in the mean or median effect of treatment(s) within individuals beyond what can be attributed to random variation of the repeated treatments. Variation among individuals is taken into account, allowing concentration of the effect of the treatments rather than the differences between individuals. For more information, see “Choosing the Repeated Measures Test to Use” on page 34.

Parametric and Nonparametric Tests
Parametric tests assume treatment effects are normally distributed with the same variances (or standard deviations). Parametric tests are based on estimates of the population means and standard deviations, the parameters of a normal distribution. Nonparametric tests do not assume that the treatment effects are normally distributed. Instead, they perform a comparison on ranks of the observed effects.

175

176 Chapter 6

Comparing Individuals Before and After a Single Treatment
Use before and after comparisons to test the effect of a single experimental treatment on the same individuals. There are two tests available: The Paired t-test. This is a parametric test. For more information, see “Paired tTest” on page 177. Wilcoxon Signed Rank Test. This is a nonparametric test. For more information, see “Wilcoxon Signed Rank Test” on page 190.

Comparing Individuals Before and After Multiple Treatments
Use repeated measures procedures to test the effect of more than one experimental treatment on the same individuals. There are three tests available: One Way Repeated Measures ANOVA. A parametric test comparing the effect of a single series of treatments or conditions. For more information, see “One Way Repeated Measures Analysis of Variance (ANOVA)” on page 200. Two Way Repeated Measures ANOVA. A parametric test comparing the effect of two factors, where one or both factors are a series of treatments or conditions. For more information, see “Two Way Repeated Measures Analysis of Variance (ANOVA)” on page 218. Friedman One Way Repeated Measures ANOVA on Ranks. The nonparametric analog of One Way Repeated Measures ANOVA. For more information, see “Friedman Repeated Measures Analysis of Variance on Ranks” on page 239. When using one of these procedures to compare multiple treatments, and you find a statistically significant difference, you can use several multiple comparison procedures to determine exactly which treatments had an effect, and the size of the effect. These procedures are described for each test.

Data Format for Repeated Measures Tests
You can arrange repeated measures test data in the worksheet as: Columns for each treatment (raw data). For more information, see “Raw Data” on page 177.

177 Comparing Repeated Measurements of the Same Individuals

Data indexed to other column(s). For more information, see “Indexed Data” on page 177. You cannot use the summary statistics for repeated measures tests. Note: You can perform repeated measures tests on a portion of the data by selecting a block on the worksheet before choosing the test. If you plan to do this, make sure that all data columns are adjacent to each other.

Raw Data
To enter data in raw data format, enter the data for each treatment in separate worksheet columns. You can use raw data for all tests except Two Way ANOVAs. Note: The worksheet columns for raw data must be the same length. If a missing value is encountered, that individual is either ignored or, for parametric ANOVAs, a general linear model is used to take advantage of all available data.

Indexed Data
Indexed data contains the treatments in one column and the corresponding data points in another column. A One Way Repeated Measures ANOVA requires a subject index in a third column. Two Way Repeated Measures ANOVA requires an additional factor column, for a total of four columns. If you plan to compare only a portion of the data, put the treatment index in the left column, followed by the second factor index (for Two Way ANOVA only), then the subject index (for Repeated Measures ANOVA), and finally the data in the right-most column. Note: You can index raw data or convert indexed data to raw data.

Paired t-Test
The Paired t-test is a parametric statistical method that assumes the observed treatment effects are normally distributed. It examines the changes which occur before and after a single experimental intervention on the same individuals to determine whether or not the treatment had a significant effect. Examining the changes rather than the values

178 Chapter 6

observed before and after the intervention removes the differences due to individual responses, producing a more sensitive, or powerful, test. Use Paired t-test when: You want to see if the effect of a single treatment on the same individual is significant. The treatment effects (i.e., the changes in the individuals before and after the treatment) are normally distributed. If you know that the distribution of the observed effects are non-normal, use the Wilcoxon Signed Rank Test. For more information, see “Wilcoxon Signed Rank Test” on page 190. If you are comparing the effect of multiple treatments on the same individuals, do a Repeated Measures Analysis of Variance. For more information, see “Friedman Repeated Measures Analysis of Variance on Ranks” on page 239.

Performing a Paired t-test
To perform a Paired t-test: 1. Enter or arrange your data in the worksheet. For more information, see “Arranging Paired t-Test Data” on page 179. 2. If desired, set the Paired t-test options. For more information, see “Setting Paired t-Test Options” on page 179. 3. From the menus select:
Statistics Before and After Paired t-test

4. Run the test. For more information, see “Running a Paired t-Test” on page 183. 5. View and interpret the Paired t-test report. For more information, see “Interpreting Paired t-Test Results” on page 185. 6. Generate report graphs. For more information, see “Paired t-Test Report Graphs” on page 188.

179 Comparing Repeated Measurements of the Same Individuals

Arranging Paired t-Test Data
The format of the data to be tested can be raw data or indexed data. The data is placed in two worksheet columns for raw data and three columns (a subject, factor, and data column) for indexed data. The columns for raw data must be the same length. If a missing value is encountered, that individual is ignored. You cannot use statistical summary data for repeated measures tests.

Setting Paired t-Test Options
Use the Paired t-test options to: Adjust the parameters of a test to relax or restrict the testing of your data for normality. Display the statistics summary and the confidence interval for the data. Compute the power, or sensitivity, of the test. Options settings are saved between SigmaPlot sessions.
To change the Paired t-test options:

1. Select Paired t-test from the toolbar drop-down list. 2. From the menus click:
Statistics Current Test Options

The Options for Paired t-test dialog box appears with three tabs: Assumption Checking. Adjust the parameters of a test to relax or restrict the testing of your data for normality and equal variance. For more information, see “Options for Paired t-test: Assumption Checking” on page 180 . Results. Display the statistics summary and the confidence interval for the data in the report and save residuals to a worksheet column. For more information, see “Options for Paired t-Test: Results” on page 181 . Post Hoc Tests. Compute the power or sensitivity of the test. For more information, see “Options for Paired t-Test: Post Hoc Tests” on page 182.

180 Chapter 6

Note: If you are going to run the test after changing test options, and want to select your data before you run the test, drag the pointer over your data. Options settings are saved between SigmaPlot sessions. 3. To continue the test, click Run Test. The Pick Columns dialog box appears. For more information, see “Running a Paired t-Test” on page 183. To accept the current settings and close the options dialog box, click OK.

Options for Paired t-test: Assumption Checking
The normality assumption test checks for a normally distributed population. Note: Equal Variance is not available for the Paired t-test because Paired t-tests are based on changes in each individual rather than on different individuals in the selected population, making equal variance testing unnecessary.
Figure 3-1 The Options for Paired t-test Dialog Box Displaying the Assumption Checking Options

Normality. SigmaPlot uses the Kolmogorov-Smirnov test to test for a normally distributed population. P Value to Reject. Enter the corresponding P value in the P Value to Reject box. The P value determines the probability of being incorrect in concluding that the data is not normally distributed (the P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed). If the P value computed by the test is greater than the P set here, the test passes.

181 Comparing Repeated Measurements of the Same Individuals

To require a stricter adherence to normality, increase the P value. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions, the suggested value in SigmaPlot is 0.050. Larger values of P (for example, 0.100) require less evidence to conclude that data is not normal. To relax the requirement of normality, decrease P. Requiring smaller values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal. For example, a P value of 0.050 requires greater deviations from normality to flag the data as non-normal than a value of 0.100. Note: Although the normality test is robust in detecting data from populations that are non-normal, there are extreme conditions of data distribution that this test cannot take into account; however, these conditions should be easily detected by simply examining the data without resorting to the automatic assumption test.

Options for Paired t-Test: Results
Summary Table. Displays the number of observations for a column or group, the number of missing values for a column or group, the average value for the column or group, the standard deviation of the column or group, and the standard error of the mean for the column or group. Confidence Intervals. Displays the confidence interval for the difference of the means. To change the interval, enter any number from 1 to 99 (95 and 99 are the most commonly used intervals). Residuals in Column. Displays residuals in the report and to save the residuals of the test to the specified worksheet column. Edit the number or select a number from the drop-down list.

182 Chapter 6

Figure 3-2

Options for Paired t-Test: Post Hoc Tests
Power. The power or sensitivity of a test is the probability that the test will detect a difference between the groups if there is really a difference. Use Alpha Value. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. The suggested value is α = 0.05. This indicates that a one in twenty chance of error is acceptable, or that you are willing to conclude there is a significant difference when P < 0.05. Smaller values of α result in stricter requirements before concluding there is a significant difference, but a greater possibility of concluding there is no difference when one exists. Larger values of α make it easier to conclude that there is a difference, but also increase the risk of reporting a false positive.

183 Comparing Repeated Measurements of the Same Individuals

Figure 3-3 The Options for Paired t-test Dialog Box Displaying the Power Option

Running a Paired t-Test
If you want to select your data before you run the test, drag the pointer over your data. 1. On the menus click:
Statistics Before and After Paired t-test

The Pick Columns for t-test dialog box appears prompting you to specify a data format.
Figure 1-1 The Pick Columns for Paired t-test Dialog Box Prompting You to Specify a Data Format

184 Chapter 6

2. Select the appropriate data format (Raw or Indexed) from the Data Format drop-down list. For more information, see “Data Format for Repeated Measures Tests” on page 176. 3. Click Next to pick the data columns for the test. If you selected columns before you chose the test, the selected columns appear in the Selected Columns list.
Figure 3-1 The Pick Columns for Paired t-test Dialog Box Prompting You to Select Data Columns

4. To assign the desired worksheet columns to the Selected Columns list, select the columns in the worksheet, or select the columns from the Data for Data drop-down list. The first selected column is assigned to the first row in the Selected Columns list, and all successively selected columns are assigned to successive rows in the list. The titles of selected columns appear in each row. For raw and indexed data, you are prompted to select two worksheet columns. For statistical summary data you are prompted to select three columns. 5. To change your selections, select the assignment in the list, then select new column from the worksheet. You can also clear a column assignment by double-clicking it in the Selected Columns list. 6. Click Finish to run the t-test on the selected columns. After the computations are completed, the report appears. For more information, see “Interpreting Paired t-Test Results” on page 185.

185 Comparing Repeated Measurements of the Same Individuals

Interpreting Paired t-Test Results
The Paired t-test report displays the t statistic, degrees of freedom, and P value for the test. The other results displayed in the report are selected in the Options for Paired ttest dialog box. For more information, see “Setting Paired t-Test Options” on page 179. For descriptions of the derivations for paired t-test results, you can reference an appropriate statistics reference.
Figure 6-1 The Paired t-Test Report

Result Explanations

In addition to the numerical results, expanded explanations of the results may also appear. You can turn off this text on the Options dialog box. You can also set the number of decimal places to display in the Options dialog box.

186 Chapter 6

Normality Test
Normality test results display whether the data passed or failed the test of the assumption that the changes observed in each subject are consistent with a normally distributed population, and the P value calculated by the test. A normally distributed source is required for all parametric tests. This result appears unless you disabled normality testing in the Paired t-test Options dialog box. For more information, see “Setting Paired t-Test Options” on page 179.

Summary Table
SigmaPlot can generate a summary table listing the sample size N, number of missing values (if any), mean, standard deviation, and standard error of the means (SEM). This result is displayed unless you disabled it in the Paired t-test Options dialog box. For more information, see “Setting Paired t-Test Options” on page 179. N (Size). The number of non-missing observations for that column or group. Missing. The number of missing values for that column or group. Mean. The average value for the column. If the observations are normally distributed, the mean is the center of the distribution. Standard Deviation. A measure of variability. If the observations are normally distributed, about two-thirds will fall within one standard deviation above or below the mean, and about 95% of the observations will fall within two standard deviations above or below the mean. Standard Error of the Mean. A measure of the approximation with which the mean computed from the sample approximates the true population mean.

Difference
The difference of the group before and after the treatment is described in terms of the mean of the differences (changes) in the subjects before and after the treatment, and the standard deviation and standard error of the mean difference. The standard error of the mean difference is a measure of the precision with which the mean difference estimates the true difference in the underlying population.

187 Comparing Repeated Measurements of the Same Individuals

t Statistic
The t-test statistic is computed by subtracting the values before the intervention from the value observed after the intervention in each experimental subject. The remaining analysis is conducted on these differences. The t-test statistic is the ratio:

You can conclude from large (bigger than ~2) absolute values of t that the treatment affected the variable of interest (you reject the null hypothesis of no difference). A large t indicates that the difference in observed value after and before the treatment is larger than one would be expected from effect variability alone (for example, that the effect is statistically significant). A small t (near 0) indicates that there is no significant difference between the samples (little difference in the means before and after the treatment). Degrees of Freedom. The degrees of freedom is a measure of the sample size, which affects the ability of t to detect differences in the mean effects. As degrees of freedom increase, the ability to detect a difference with a smaller t increases. P Value. The P value is the probability of being wrong in concluding that there is a true effect (i.e., the probability of falsely rejecting the null hypothesis, or committing a Type I error, based on t). The smaller the P value, the greater the probability that the treatment effect is significant. Traditionally, you can conclude there is a significant difference when P < 0.05.

Confidence Interval for the Difference of the Means
If the confidence interval does not include a value of zero, you can conclude that there is a significant difference with that level of confidence. Confidence can also be described as P < a, where a is the acceptable probability of incorrectly concluding that there is an effect. The level of confidence is adjusted in the Options for Paired t-test dialog box; this is typically 100(1- a), or 95%. Larger values of confidence result in wider intervals. This result is displayed unless you disabled it in the Options for Paired t-test dialog box. For more information, see “Setting Paired t-Test Options” on page 179.

188 Chapter 6

Power
The power, or sensitivity, of a Paired t-test is the probability that the test will detect a difference between treatments if there really is a difference. The closer the power is to 1, the more sensitive the test. Paired t-test power is affected by the sample sizes, the chance of erroneously reporting a difference α (alpha), the observed differences of the subject means, and the observed standard deviations of the samples. This result is displayed unless you disabled it in the Options for Paired t-test dialog box. For more information, see “Setting Paired t-Test Options” on page 179. Alpha ( α ). Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. An α error is also called a Type I error. A Type I error is when you reject the hypothesis of no effect when this hypothesis is true. Set the value in the Options for Paired t-test dialog box; the suggested value is α = 0.05 which indicates that a one in twenty chance of error is acceptable. Smaller values of a result in stricter requirements before concluding there is a significant difference, but a greater possibility of concluding there is no difference when one exists (a Type II error). Larger values of α make it easier to conclude that there is a difference, but also increase the risk of seeing a false difference (a Type I error).

Paired t-Test Report Graphs
You can generate up to three graphs using the results from a paired t-test. They include a: Before and after line graph. The Paired t-test graph uses lines to plot a subject’s change after each treatment. For more information, see “Before and After Line Plots” on page 554. Normal probability plot of the residuals. The Paired t-test probability plot graphs the frequency of the raw residuals. For more information, see “Normal Probability Plot” on page 549. Histogram of the residuals. The Paired t-test histogram plots the raw residuals in a specified range, using a defined interval set. For more information, see “Histogram of Residuals” on page 547.

189 Comparing Repeated Measurements of the Same Individuals

How to Create a Graph of the Paired t-test Data
1. Select the Paired t-test report. 2. On the menus choose:
Graph Create Graph

The Create Graph dialog box appears displaying the types of graphs available for the Paired t-test results.
Figure 2-1 The Create Graph Dialog Box for Paired t-test Report Graphs

3. Select the type of graph you want to create from the Graph Type list, then click OK, or double-click the desired graph in the list. The selected graph appears in a graph window.

190 Chapter 6

Figure 3-1 A Normal Probability Plot of the Report Data

Wilcoxon Signed Rank Test
The Signed Rank Test is a nonparametric procedure which does not require assuming normality or equal variance. Use a Signed Rank Test when: You want to see if the effect of a single treatment on the same individual is significant. The treatment effects are not normally distributed with the same variances. If you know that the effects are normally distributed, use the Paired t-test.For more information, see “Paired t-Test” on page 177. When there are multiple treatments to compare, do a Friedman Repeated Measures ANOVA on Ranks. For more information, see “Friedman Repeated Measures Analysis of Variance on Ranks” on page 239. Note: Depending on your Signed Rank Test option settings, if you attempt to perform a Signed Rank Test on a normal population, SigmaPlot suggests that the data can be analyzed with the more powerful Paired t-test instead. For more information, see “Setting Signed Rank Test Options” on page 192.

Performing a Signed Rank Test To perform a Signed Rank Test: 1. For more information. 5. see “Running a Signed Rank Test” on page 195. 2. Run the test. The signed ranks are summed and compared. This procedure uses the size of the treatment effects and the sign. that there is a statistically significant difference before and after the treatment). If desired. 3. For more information. 6. For more information. If there is no treatment effect. For more information. From the menus select: Statistics Before and After Signed Rank Test 4. then attaches the sign of each difference to the ranks. Enter or arrange your data in the data worksheet. see “Signed Rank Test Report Graphs” on page 198. see “Arranging Signed Rank Data” on page 192. set the Signed Rank Test options. the positive ranks should be similar to the negative ranks. you can conclude that there was a treatment effect (for example. see “Interpreting Signed Rank Test Results” on page 196. For more information. see “Setting Signed Rank Test Options” on page 192. Generate report graphs. View and interpret the Signed Rank Test report.191 Comparing Repeated Measurements of the Same Individuals About the Signed Rank Test A Signed Rank Test Ranks all the observed treatment differences from smallest to largest without regard to sign (based on their absolute value). The Wilcoxon Signed Rank Test tests the null hypothesis that a treatment has no effect on the subject. If the ranks tend to have the same sign. .

Figure 6-1 Valid Data Formats for a Wilcoxon Signed Rank Test Setting Signed Rank Test Options Use the Signed Rank Test options to: Adjust the parameters of the test to relax or restrict the testing of your data for normality. Display the summary table.192 Chapter 6 Arranging Signed Rank Data The format of the data to be tested can be raw data or indexed data. the data is found in two worksheet columns. Options settings are saved between SigmaPlot sessions. in either case. . To change the Signed Rank Test options: 1. and want to select your data before you run the test. drag the pointer over your data. If you are going to run the test after changing test options.

For more information. For more information. click OK. From the menus select: Statistics Current Test Options The Options for Signed Rank Test dialog box appears with two tabs: Assumption Checking. The normality assumption test checks for a normally distributed population. see “Options for Signed Rank Test: Results” on page 194.193 Comparing Repeated Measurements of the Same Individuals 2. see “Options for Signed Rank Test: Assumption Checking” on page 193. Results. Figure 4-1 Options for Signed Rank Test dialog box . 3. Options for Signed Rank Test: Assumption Checking Click the Assumption Checking tab on the Options for Signed Rank Test dialog box to set Normality. Display the statistics summary and the confidence interval for the data in the report. The Pick Columns dialog box appears. For more information. Adjust the parameters of a test to relax or restrict the testing of your data for normality. To continue the test. To accept the current settings and close the options dialog box. click Run Test. 4. see “Running a Signed Rank Test” on page 195.

these conditions should be easily detected by simply examining the data without resorting to the automatic assumption test.194 Chapter 6 Note: Equal Variance is not available for the Signed Rank Test because Signed Rank Tests are based on changes in each individual rather than on different individuals in the selected population. increase the P value. Requiring smaller values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal. a P value of 0. SigmaPlot uses the Kolmogorov-Smirnov test to test for a normally distributed population. P Value to Reject. there are extreme conditions of data distribution that this test cannot take into account. The P value determines the probability of being incorrect in concluding that the data is not normally distributed (the P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed). For example. To require a stricter adherence to normality. the suggested value in SigmaPlot is 0. making equal variance testing unnecessary. Larger values of P (for example. The summary table for a Signed Rank Test lists the medians.100. decrease P. when compared with the 2 2 actual distribution of the χ test statistic. the χ calculated tends to produce P values which are too small. Normality.050. and sample sizes N in the Rank Sum test report. When a statistical test uses a χ distribution with one degree 2 of freedom. such as analysis of a 2 x 2 contingency table or McNemar’s test. Yates Correction Factor. 0. If desired. however. If the P value computed by the test is greater than the P set here. The theoretical χ distribution is continuous. change the percentile values by editing the boxes. The 25th and the 75th percentiles are the suggested percentiles. Note: Although this assumption test is robust in detecting data from populations that are non-normal. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions. the test passes. 2 whereas the distribution of the χ test statistic is discrete. Enter the corresponding P value in the P Value to Reject box. percentiles. 2 . Options for Signed Rank Test: Results Summary Table.100) require less evidence to conclude that the data is not normal. To relax the requirement of normality.050 requires greater deviations from normality to flag the data as non-normal than a value of 0.

Figure 1-1 The Pick Columns for Signed Rank Test Dialog Box Prompting You to Specify a Data Format 2. . Then use the Pick Columns dialog box to select the worksheet columns with the data you want to test and to specify how your data is arranged in the worksheet. Using the Yates correction makes a test more conservative. you can reference any appropriate statistics reference. For descriptions of the derivation of the Yates correction. you need to select the data to test by dragging the pointer over your data. Select the appropriate data format from the Data Format drop-down list. for example. The Yates correction is applied to 2 x 2 tables and other statistics where the P value 2 is computed from a χ distribution with one degree of freedom. To run a Signed Rank Test: 1. From the menus select: Statistics Before and After Signed Rank Test The Pick Columns dialog box appears prompting you to specify a data format. 2 Running a Signed Rank Test To run a test.195 Comparing Repeated Measurements of the Same Individuals Use the Yates Correction Factor to adjust the computed χ value down to compensate for this discrepancy. it increases the P value and reduces the chance of a false positive conclusion.

see “Setting Signed Rank Test Options” on page 192. select the assignment in the list. If your data pass the test. or select the columns from the Data for Data drop-down list. To change your selections. To assign the desired worksheet columns to the Selected Columns list. You can also clear a column assignment by double-clicking it in the Selected Columns list. you can reference an appropriate statistics reference. then select new column from the worksheet. Click Finish to perform the test. When the test is complete. and all successively selected columns are assigned to successive rows in the list.196 Chapter 6 If your data is grouped in columns. SigmaPlot performs the test for normality (Kolmogorov-Smirnov). If you selected columns before you chose the test. Click Next to pick the data columns for the test. Interpreting Signed Rank Test Results The Signed Rank Test computes the Wilcoxon W statistic and the P value for W. For descriptions of the derivations for Wilcoxon Signed Rank Test results. . 4. 5. select Raw. If your data is in the form of a group index column(s) paired with a data column(s). select the columns in the worksheet. Additional results to be displayed are selected in the Options for Signed Rank Test dialog box. For more information. select Indexed. the report appears displaying the results of the Signed Rank Test. 6. SigmaPlot informs you and suggests continuing your analysis using a Paired t-test. You are prompted to pick two columns for raw data and three columns for indexed data. the selected columns appear in the Selected Columns list. The number or title of selected columns appear in each row. If you elected to test for normality. The first selected column is assigned to the first row in the Selected Columns list. 3.

see “Setting Signed Rank Test Options” on page 192. since nonparametric tests do not require normally distributed source populations. For more information. . Figure 6-1 The Wilcoxon Signed Rank Test Results Report Normality Test Normality test results display whether the data passed or failed the test of the assumption that the difference of the treatment originates from a normal distribution.197 Comparing Repeated Measurements of the Same Individuals Result Explanations In addition to the numerical results. For nonparametric procedures this test can fail. You can also set the number of decimal places to display in the Options dialog box. expanded explanations of the results may also appear. and the P value calculated by the test. You can turn off this text on the Options dialog box. This result appears unless you disabled normality testing in the Options for Signed Rank Test dialog box.

If the absolute value of W is "large". and percentiles. so there is a statistically significant difference before and after the treatment).198 Chapter 6 Summary Tables SigmaPlot generates a summary table listing the sample sizes N. Medians. If W is small. then attaching the signs of the difference to the corresponding ranks. The number of non-missing observations for that column or group. or committing a Type I error. the probability of falsely rejecting the null hypothesis.. medians. All of these results are displayed in the report unless you disable them in the Signed Rank Test Options dialog box. Signed Rank Test Report Graphs You can generate a line scatter graph of the changes after treatment for a Signed Rank Test report. and you can conclude that there is no treatment effect.05. N (Size). Traditionally. The "middle" observation as computed by listing all the observations from smallest to largest and selecting the largest value of the smallest half of the observations. the positive ranks are similar to the negative ranks. see “Setting Signed Rank Test Options” on page 192. The two percentile points that define the upper and lower tails of the observed values. For more information. The smaller the P value. number of missing values (if any). based on W). you can conclude that there was a treatment effect (i.e. you can conclude there is a significant difference when P < 0. W Statistic The Wilcoxon test statistic W is computed by ranking all the differences before and after the treatment based on their absolute value. Percentiles. The median observation has an equal number of observations greater than and less than that observation. Missing. The P value is the probability of being wrong in concluding that there is a true effect (for example. The signed ranks are summed and compared. . The number of missing values for that column or group. the greater the probability that the there is a treatment effect. P Value. the ranks tend to have the same sign.

see “Before and After Line Plots” on page 554. see “Generating Report Graphs” on page 539. For more information. The Signed Rank Test graph uses lines to plot a subject’s change after each treatment. From the menus select: Graph Create Graph The Create Graph dialog box appears displaying the types of graphs available for the Signed Rank Test results. The specified graph appears in a graph window or in the report. How to Create a Graph of the Signed Rank Test Data 1. .199 Comparing Repeated Measurements of the Same Individuals Before and After Line Graph. Figure 2-1 The Create Graph Dialog Box for the Signed Rank Test Report 3. For more information. or double-click the desired graph in the list. 4. Click OK. Select the type of graph you want to create from the Graph Type list. Select the Signed Rank Test report. 2.

If you want to consider the effects of an additional factor on your experimental treatments. Only one factor or one type of intervention is considered in each treatment or condition. use the Friedman Repeated Measures ANOVA on Ranks. The treatment effects are normally distributed with the same variances. use Two Way Repeated Measures ANOVA. If you know that the treatment effects are not normally distributed. When there is only a single treatment. Note: Depending on your One Way Repeated Measures ANOVA options settings if you attempt to perform an ANOVA on a non-normal population. SigmaPlot informs .200 Chapter 6 Figure 4-1 A Before & After Scatter Graph One Way Repeated Measures Analysis of Variance (ANOVA) Use a one way or one factor repeated measures ANOVA (analysis of variance) when: You want to see if a single group of individuals was affected by a series of experimental treatments or conditions. you can do a Paired t-test (depending on the type of results you want).

On the menus click: Statistics Repeated Measures One Way Repeated Measures ANOVA 4. except that there can be multiple treatments on the same group. View and interpret the One Way ANOVA report. Specify the multiple comparisons you want to perform on your test. 2. 3.201 Comparing Repeated Measurements of the Same Individuals you that the data is unsuitable for a parametric test. For more information. set One Way Repeated Measures ANOVA options. For more information. One Way Analysis of Variance is a parametric test that assumes that all treatment effects are normally distributed with the same standard deviations (variances). Enter or arrange your data in the worksheet. and suggests the Friedman ANOVA on Ranks instead. If desired. Examining the changes rather than the values observed before and after interventions removes the differences due to individual responses. The null hypothesis is that there are no differences among all the treatments. Run the test. For more information. The design for a One Way Repeated Measures ANOVA is essentially the same as a Paired t-test. About the One Way Repeated Measures ANOVA A One Way or One Factor Repeated Measures ANOVA tests for differences in the effect of a series of experimental interventions on the same group of subjects by examining the changes in each individual. see “Running a One Way Repeated Measures ANOVA” on page 206. 5. producing a more sensitive (or more powerful) test. see “Arranging One Way Repeated Measures ANOVA Data” on page 202. 6. Performing a One Way Repeated Measures ANOVA To perform a One Way Repeated Measures ANOVA: 1. . see “Interpreting One Way Repeated Measures ANOVA Results” on page 209.

You cannot use statistical summary data for repeated measures tests. Columns 4. 5. and 6 are arranged as indexed data. each column contains the data for one treatment. Figure 7-1 Valid Data Formats for a One Way Repeated Measures ANOVA Columns 1 through 3 in the worksheet above are arranged as raw data. This approach constructs hypothesis tests using the marginal sums of squares (also commonly called the Type III or adjusted sums of squares). Generate report graphs. see “One Way Repeated Measures ANOVA Report GraphsOne Way Repeated Measures ANOVA Report Graphs” below. up to 64. The columns for raw data must be the same length. Place Indexed data in two worksheet columns. the columns must still be equal in length. however. .202 Chapter 6 7. Missing Data Points If there are missing values. Place raw data in as many columns as there are treatments.For more information. SigmaPlot automatically handles the missing data by using a general linear model. Arranging One Way Repeated Measures ANOVA Data The format of the data to be tested can be raw data or indexed data. with column 4 as the treatment index column and column 5 as the subject index column.

click OK. 3. or sensitivity. From the Standard Toolbar select One Way RM ANOVA. . Display the statistics summary and the confidence interval for the data in the report and save residuals to a worksheet column. see“Options for One Way RM ANOVA: Post Hoc Tests” on page 205. To continue the test. Results. and want to select your data before you run the test. click Run Test. To change the One Way Repeated Measures ANOVA options: Note: If you are going to run the test after changing test options. Display the statistics summary table and the confidence interval for the data. Post Hoc Test. Adjust the parameters of a test to relax or restrict the testing of your data for normality and equal variance. see “Options for One Way RM ANOVA: Results” on page 205. Compute the power. Compute the power or sensitivity of the test and enable multiple comparisons. For more information. and assign residuals to a worksheet column. 4. 2. Enable multiple comparisons. of the test. For more information. see “About the One Way Repeated Measures ANOVA” on page 201. see “Options for One Way RM ANOVA: Assumption Checking” on page 204. drag the pointer over your data. To accept the current settings and close the options dialog box. For more information. For more information. 1. On the menus click: Statistics Current Test Options The Options for One Way RM Anova dialog box appears with three tabs: Assumption Checking.203 Comparing Repeated Measurements of the Same Individuals Setting One Way Repeated Measures ANOVA Options Use the One Way Repeated Measures ANOVA options to: Adjust the parameters of the test to relax or restrict the testing of your data for normality and equal variance.

050. The P value determines the probability of being incorrect in concluding that the data is not normally distributed (P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed). If the P computed by the test is greater than the P set here. the test passes. Requiring larger values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal.204 Chapter 6 Options for One Way RM ANOVA: Assumption Checking The normality assumption test checks for a normally distributed population. The equal variance assumption test checks the variability about the group means. .100) require less evidence to conclude that data is not normal.100 requires greater deviations from normality to flag the data as non-normal than a value of 0. Figure 4-1 The Options for One Way RM ANOVA Dialog Box Displaying the Assumption Checking Options Normality Testing. 0. SigmaPlot tests for equal variance by checking the variability about the group means.050. Larger values of P (for example. To require a stricter adherence to normality and/or equal variance. For example. a P value of 0. To relax the requirement of normality and/or equal variance. Equal Variance Testing. increase P. SigmaPlot uses the Kolmogorov-Smirnov test to test for a normally distributed population. P Values for Normality and Equal Variance. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions. the suggested value in SigmaPlot is 0. decrease the P value.

enter any number from 1 to 99 (95 and 99 are the most commonly used intervals). but also increase the risk of reporting a false positive. but a greater possibility of concluding there is no difference when one exists.05. however. Multiple Comparisons A One Way Repeated Measures ANOVA tests the hypothesis of no differences between the several treatment groups. these conditions should be easily detected by simply examining the data without resorting to the automatic assumption tests. the average value for the column or group. but does not determine which groups are . To change the interval.205 Comparing Repeated Measurements of the Same Individuals Note: There are extreme conditions of data distribution that these tests cannot take into account. the Levene Median test fails to detect differences in variance of several orders of magnitude. For example. Use Alpha Value. the number of missing values for a column or group. or that you are willing to conclude there is a significant difference when P < 0. and the standard error of the mean for the column or group. Confidence Intervals. Larger values of α make it easier to conclude that there is a difference. the standard deviation of the column or group. Options for One Way RM ANOVA: Post Hoc Tests Power. Options for One Way RM ANOVA: Results Summary Table. Select to display the confidence interval for the difference of the means. The suggested value is α = 0. Edit the number or select a number from the drop-down list. The power or sensitivity of a test is the probability that the test will detect a difference between the groups if there is really a difference. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference.05. Smaller values of α result in stricter requirements before concluding there is a significant difference. Select to display the number of observations for a column or group. This indicates that a one in twenty chance of error is acceptable. Select to display residuals in the report and to save the residuals of the test to the specified worksheet column. Residuals in Column.

Always Perform. The P value used to determine if the ANOVA detects a difference is set on the Report tab of the Options dialog box. If the P value produced by the One Way ANOVA is less than the P value specified in the box. Significance Value for Multiple Comparisons. Multiple comparison procedures isolate these differences.206 Chapter 6 different. prompting you to choose a multiple comparison method. drag the pointer over your data. Select to perform multiple comparisons only if the ANOVA detects a difference. Only When ANOVA P Value is Significant. This value determines the likelihood of the multiple comparison being incorrect in concluding that there is a significant difference in the treatments.01 from the Significance Value for Multiple Comparisons drop-down list.05 or .05 indicates that the multiple comparisons will detect a difference if there is a less than 5% chance that the multiple comparison is incorrect in detecting a difference. Select to perform multiple comparisons whether or not the ANOVA detects a difference. the Multiple Comparison Options dialog box appears after you pick your data from the worksheet and run the test. Select either . 1. a difference in the groups is detected and the multiple comparisons are performed. From the menus select: Statistics Repeated Measures One Way Repeated Measures ANOVA The Pick Columns for One Way RM ANOVA dialog box appears prompting you to specify a data format. A value of . or the sizes of these differences.10 indicates that the multiple comparisons will detect a difference if there is less than 10% chance that the multiple comparison is incorrect in detecting a difference. Note: If multiple comparisons are triggered. . A value of . Running a One Way Repeated Measures ANOVA If you want to select your data before you run the test.

If you selected columns before you chose the test. For more information. select the columns in the worksheet. To assign the desired worksheet columns to the Selected Columns list. Figure 3-1 The Pick Columns for One Way RM ANOVA Dialog Box Prompting You to Select Data Columns 4. and all successively selected columns are assigned to successive rows in the list. see “Data Format for Repeated Measures Tests” on page 176. you are prompted to select two worksheet columns. Select the appropriate data format from the Data Format drop-down list. . The first selected column is assigned to the first row in the Selected Columns list. Click Next to pick the data columns for the test. The title of selected columns appears in each row. the selected columns appear in the Selected Columns list. 3. or select the columns from the Data for Data drop-down list. For raw and indexed data.207 Comparing Repeated Measurements of the Same Individuals Figure 1-1 The Pick Columns for One Way RM ANOVA Dialog Box Prompting You to Specify a Data Format 2.

but does not determine which groups are different. Multiple comparison tests isolate these differences by running comparisons between the experimental groups. If you selected to run multiple comparisons only when the P value is significant. For more information. For more information. the Multiple Comparison Options dialog box appears prompting you to specify a multiple comparison test. and the ANOVA produces a P value equal to or less than the trigger P value. see “Multiple Comparison Options (One Way RM ANOVA)” on page 208. including: Tukey Test. see “Interpreting One Way Repeated Measures ANOVA Results” on page 209. the One Way ANOVA report appears after the test is complete. select the assignment in the list. The P value produced by the ANOVA is displayed in the upper left corner of the dialog box. 6. To change your selections. Multiple Comparison Options (One Way RM ANOVA) The One Way Repeated Measures ANOVA tests the hypothesis of no differences between the several treatment groups. and your data fails either test. see “Setting One Way Repeated Measures ANOVA Options” on page 203. There are seven kinds of multiple comparison tests available for the One Way Repeated Measures ANOVA. You can also clear a column assignment by double-clicking it in the Selected Columns list. Click Finish to run the One Way RM ANOVA on the selected columns. For more information. If you selected to run multiple comparisons only when the P value is significant. If you elected to test for normality and equal variance. then select new column from the worksheet. If the P value for multiple comparisons is significant. SigmaPlot warns you and suggests continuing your analysis using the nonparametric Friedman Repeated Measures ANOVA on Ranks. For more information. . see “Friedman Repeated Measures Analysis of Variance on Ranks” on page 239. or you selected to always perform multiple comparisons. or you selected to always run multiple comparisons in the Options for One Way RM ANOVA dialog box. and the P value is not significant.208 Chapter 6 5. For more information. see “Tukey Test” on page 164. or the sizes of these differences. the Multiple Comparisons Options dialog box appears prompting you to select a multiple comparison method.

209 Comparing Repeated Measurements of the Same Individuals Student-Newman-Keuls Test. The other results displayed are in the Options for One Way RM ANOVA dialog box. There are two types of multiple comparisons available for the One Way Repeated Measures ANOVA. For more information. Multiple Comparison results are also specified in the Options for One Way RM ANOVA dialog box. as well as the F statistic and the corresponding P value. . see “Setting One Way Repeated Measures ANOVA Options” on page 203. This table displays the degrees of freedom. The types of comparison you can make depends on the selected multiple comparison test. For more information. see “Bonferroni t-Test” on page 164. Multiple comparisons versus a control compare all experimental treatments to a single control group. For descriptions of the derivations for One Way RM ANOVA results. sum of squares. For more information. Dunnett’s Test. Duncan’s Multiple Range Test. see “Fisher’s Least Significance Difference Test” on page 165. Fisher’s LSD. see “Duncan’s Multiple Range” on page 165. see “Dunnett’s Test” on page 165. Bonferroni t-test. The test used to perform the multiple comparison is selected in the Multiple Comparison Options dialog box. For more information. see “Student-Newman-Keuls (SNK) Test” on page 164. For more information. You can also generate tables of multiple comparisons. and mean squares of the treatments. Interpreting One Way Repeated Measures ANOVA Results The One Way Repeated Measures ANOVA report generates an ANOVA table describing the source of the variation in the treatments. The tests are: All pairwise comparisons compare all possible pairs of treatments. For more information. you can reference any appropriate statistics reference.

You can turn off this text on the Options dialog box. For descriptions of the derivations for One Way Repeated Measures ANOVA results. the report indicates the results were computed using a general linear model. If There Were Missing Data Cells If your data contained missing values. and the summary table displays the estimated least square means.210 Chapter 6 Figure 6-1 Example of the One Way Repeated Measures ANOVA Report Result Explanations In addition to the numerical results. The ANOVA table includes the degrees of freedom used to compute F. the estimated mean square equations are listed. expanded explanations of the results may also appear. You can also set the number of decimal places to display in the Options dialog box. . you can reference an appropriate statistics reference.

A measure of the approximation with which the mean computed from the sample approximates the true population mean. Standard Error of the Mean. see “Setting One Way Repeated Measures ANOVA Options” on page 203. standard deviation. Missing. If the observations are normally distributed. Equal variances of the source populations are assumed for all parametric tests. The number of missing values for that column or group.211 Comparing Repeated Measurements of the Same Individuals Normality Test Normality test results display whether the data passed or failed the test of the assumption that the differences of the changes originate from a normal distribution. the mean is the center of the distribution. Equal Variance Test Equal Variance test results display whether or not the data passed or failed the test of the assumption that the differences of the changes originate from a population with the same variance. and standard error of the means. . SigmaPlot generates a summary table listing the sample sizes N. and the P value calculated by the test. Mean. Normally distributed source populations are required for all parametric tests. and the P value calculated by the test. number of missing values. If the observations are normally distributed. This result appears unless you disabled equal variance testing in the Options for One Way RM ANOVA dialog box. For more information. mean. This result appears unless you disabled equal variance testing in the Options for One Way RM ANOVA dialog box. and about 95% of the observations will fall within two standard deviations above or below the mean. The number of non-missing observations for that column or group. Summary Table If you enabled this option in the Options for One Way RM ANOVA dialog box . A measure of variability. Standard Deviation. about two-thirds will fall within one standard deviation above or below the mean. For more information. see “Setting One Way Repeated Measures ANOVA Options” on page 203. N (Size). The average value for the column. differences of the means and standard deviations.

Alpha ( α ). The power. Repeated measures ANOVA power is affected by the sample sizes. ANOVA Table The ANOVA table lists the results of the One Way Repeated Measures ANOVA. The degrees of freedom for the treatments is a measure of the number of treatments. the chance of erroneously reporting a difference α (alpha). Larger values of α make it easier to conclude that there is a difference but also increase the risk of seeing a false difference (a Type I error). The closer the power is to 1. the suggested value is α = 0. of a One Way Repeated Measures ANOVA is the probability that the test will detect a difference among the treatments if there really is a difference. the more sensitive the test. the observed differences of the group means. The degrees of freedom between subjects is a measure of the number of subjects. The residual degrees of freedom is a measure of the difference between the number of observations. the number of treatments being compared. adjusted for the number of treatments. and the observed standard deviations of the samples. The degrees of freedom within subjects is a measure of the total number of observations.05 which indicates that a one in twenty chance of error is acceptable. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference.212 Chapter 6 Power The power of the performed test is displayed unless you disable this option in the Options for One Way RM ANOVA dialog box. but a greater possibility of concluding there is no difference when one exists (a Type II error). DF (Degrees of Freedom). An α error is also called a Type I error. Degrees of freedom represent the number of groups and sample size which affects the sensitivity of the ANOVA. Set this value in the Options for One Way RM ANOVA dialog box. The total degrees of freedom is a measure of both number of subjects and treatments. adjusted for the number of subjects and treatments. A Type I error is when you reject the hypothesis of no effect when this hypothesis is true. . or sensitivity. Smaller values of α result in stricter requirements before concluding there is a significant difference.

213 Comparing Repeated Measurements of the Same Individuals SS (Sum of Squares). If there are no missing data. F is calculated as: If the F ratio is around 1. The sum of squares between the subjects measures the variability of the average responses of each subject. you can conclude that there are no differences among treatments (the data is consistent with the null hypothesis that there are no treatment effects). If F is a large number. . The sum of squares within the subjects measures the underlying total variability within each subject. the variability among the effect means is larger than expected from random variability in the treatments. The mean square of the treatments is: The residual mean square is: F Statistic The F test statistic is a ratio used to gauge the differences of the effects. Comparing these variance estimates is the basis of analysis of variance. The mean squares provide two estimates of the population variances. The sum of squares is a measure of variability associated with each element in the ANOVA data table. you can conclude that the treatments have different effects (the differences among the treatments are statistically significant). MS (Mean Squares). The residual sum of squares measures the underlying variability among all observations after accounting for differences between subjects. The sum of squares of the treatments measures the variability of the mean treatment responses within the subjects. The total sum of squares measures the total variability.

you can reference an appropriate statistics reference. The comparison versus a control tests are the Bonferroni t-test and the Dunnett’s. the probability of falsely rejecting the null hypothesis. For descriptions of the derivation of parametric multiple comparison procedure results. and Duncan’s tests. Multiple Comparisons If you selected to perform multiple comparisons. The multiple comparison procedure is activated in the Options for One Way RM ANOVA dialog box. the greater the probability that the samples are drawn from different populations. or committing a Type I error. see “Setting One Way Repeated Measures ANOVA Options” on page 203. Holm-Sidak Test Results.The tests used in the multiple comparison procedure is selected in the Multiple Comparison Options dialog box. Expected Mean Squares If there was missing data and a general linear model was used. The specific type of multiple comparison results depends on the comparison test used and whether the comparison was made pairwise or versus a control. you can conclude that there are significant differences when P < 0. based on F). Student-NewmanKeuls. It is more powerful than the . Tukey. The P value is the probability of being wrong in concluding that there is a true difference between the groups (for example. The control group is selected during the actual multiple comparison procedure. These equations are displayed only if a general linear model was used. The smaller the P value. All pairwise comparison results list comparisons of all possible combinations of group pairs. see“Multiple Comparison Options (One Way RM ANOVA)” on page 208. the linear equations for the expected mean squares computed by the model are displayed. Fisher LSD. Fishers LSD. since the ANOVA results only inform you that two or more of the groups are different. Duncan’s test and the Bonferroni t-test.214 Chapter 6 P Value. For more information. the all pairwise tests are the Holm Sidak. Comparisons versus a single control group list only comparisons with the selected control group.05. Multiple comparison results are used to determine exactly which treatments are different. Traditionally. a table of the comparisons between group pairs is displayed. For more information. The Holm-Sidak Test can be used for both pairwise comparisons and comparisons versus a control group.

The larger the p. The Bonferroni t-test lists the differences of the means for each pair of groups. they are not recommended for this type of comparison. Dunnett’s test only compares a control group to all other groups. If it is greater than 0. The Tukey. If it is greater than 0. and Duncan’s can be used to compare a control group to other groups. and displays whether or not P < 0.05. p is an indication of the differences in the ranks of the group means being compared. Fisher LSD. If the P value for the comparison is less than 0. You can conclude from "large" values of t that the difference of the two treatments being compared is statistically significant. The difference of the means is a gauge of the size of the difference between the two treatments. the likelihood of erroneously concluding that there is a significant difference is less than 5%. and Duncan’s tests are all pairwise comparisons of every combination of group pairs. Tukey. The Bonferroni t-test can be used to compare all groups or to compare versus a control. it is able to detect differences that these other tests do not.215 Comparing Repeated Measurements of the Same Individuals Tukey and Bonferroni tests and.05 for that comparison. The Difference of the Means is a gauge of the size of the difference between the two groups. p is the parameter used when computing q. All tests compute the q test statistic. you cannot confidently conclude that there is a difference. When performing the test. the rank of the P value. you cannot confidently conclude that there is a difference. A P value less than the critical level indicates there is a significant difference between the corresponding two groups. Student-Newman-Keuls (SNK).05. Bonferroni t-test Results.05. the larger q needs to be to indicate a significant difference.01 for that pair comparison. consequently. You can conclude from "large" values of q that the difference of the two groups being compared is statistically significant.05 or < 0. Groups means are ranked in order from largest to . the likelihood of being incorrect in concluding that there is a significant difference is less than 5%. Duncan’s. While the Tukey Fisher LSD. and the total number of comparisons made. Student-Newman-Keuls. and display whether or not P < 0. Each P value is then compared to a critical level that depends upon the significance level of the test (set in the test options). It is recommended as the first-line procedure for pairwise comparison testing. the P values of all comparisons are computed and ordered from smallest to largest. computes the t values for each pair. If the P value for the comparison is less than 0. Fisher LSD. and Dunnett’s Test Results.05.

. 2. How to Create a One Way Repeated Measures ANOVA Report Graph 1. One Way Repeated Measures ANOVA Report Graphs You can generate up to three graphs using the results from a One Way RM ANOVA. For more information. see “Multiple Comparison Graphs” on page 555. and a result of DNT (Do Not Test) appears for those comparisons. For more information. Multiple comparison graphs. The One Way Repeated Measures ANOVA uses lines to plot a subject’s change after each treatment. see “Before and After Line Plots” on page 554. using a defined interval set. and when comparing the second smallest to the smallest p = 2.216 Chapter 6 smallest in an SNK test. The One Way Repeated Measures ANOVA histogram plots the raw residuals in a specified range. when comparing four means. Normal probability plot of the residuals. comparing the largest to the smallest p = 4. so p is the number of means spanned in the comparison. From the menus select: Graph Create Result Graph The Create Result Graph dialog box appears displaying the types of graphs available for the One Way Repeated Measure ANOVA results. The One Way Repeated Measures ANOVA probability plot graphs the frequency of the raw residuals. all treatments with p ranks in between the p ranks of the two treatments that are not different are also assumed not to be significantly different. The One Way Repeated Measures ANOVA multiple comparison graphs plot significant differences between levels of a significant factor. see “Normal Probability Plot” on page 549. They include a: Before and after line graph. For more information. For more information. Select the One Way Repeated Measures ANOVA test report. see “Histogram of Residuals” on page 547. If a treatment is found to be not significantly different than another treatment. Histogram of the residuals. For example.

217 Comparing Repeated Measurements of the Same Individuals Figure 2-1 The Create Graph Dialog Box for a One Way RM ANOVA Report 3. Select the type of graph you want to create from the Graph Type list. Figure 3-1 A Normal Probability Plot for a One Way RM ANOVA . The selected graph appears in a graph window. then click OK. or double-click the desired graph in the list.

use the Transform menu Rank command to convert the observations to ranks. (2) There is no difference among the levels or treatments of the second factor. there are two experimental factors which may affect each experimental treatment. and you want to do a nonparametric test. and uses the appropriate procedures.e.218 Chapter 6 Two Way Repeated Measures Analysis of Variance (ANOVA) Use Two Way or two factor Repeated Measures ANOVA (analysis of variance) when: You want to see if the same group of individuals is affected by a series of experimental treatments or conditions. The treatment effects are normally distributed with equal variances. then do a Two Way ANOVA on the ranks. If your data is non-normal. A two factor design tests for differences between the different levels of each treatment and for interactions between the treatments. the differences are the same regardless of the second factor. . About the Two Way Repeated Measures ANOVA In a two way or two factor repeated measures analysis of variance. For more information. If your want to consider the effects of only one factor on your experimental groups. and may or may not be another series of treatments or conditions. Either or both of these factors are repeated treatments on the same group of individuals. You want to consider the effect of an additional factor which may or may not interact. A two factor analysis of variance tests three hypotheses: (1) There is no difference among the levels or treatments of the first factor. i. Note: SigmaPlot performs Two Way Repeated Measures ANOVAs for one factor repeated or both factors repeated. see “Arranging Two Way Repeated Measures ANOVA Data” on page 219. if there is any difference among treatments within one factor. There is no equivalent in SigmaPlot for a two factor repeated measure comparison for samples drawn from non-normal populations. use .. and (3) There is no interaction between the factors. If the sample size is large. you can transform the data to make it comply better with the assumptions of analysis of variance using Transform Menu commands. SigmaPlot automatically determines if one or both factors are repeated from the data.

3. Performing a Two Way Repeated Measures ANOVA To perform a Two Way Repeated Measures ANOVA: 1. Different salinity treatment and shrimp type are the levels. For more information. see “Two way repeated measures ANOVA report graphs” on page 238 Arranging Two Way Repeated Measures ANOVA Data Either or both of the two factors used in the Two Way Repeated Measures ANOVA can be repeated on the same group of individuals. For example. 2. Set the Two Way Repeated Measures ANOVA options. From the menus select: Statistics Repeated Measures 4. see “Running a Two Way Repeated Measures ANOVA” on page 227. For more information. Run the test. Generate report graphs. For more information. see “Arranging Two Way Repeated Measures ANOVA Data” on page 219.219 Comparing Repeated Measurements of the Same Individuals Two Way Repeated Measures ANOVA is a parametric test that assumes that all the treatment effects are normally distributed with the same variance. see “Interpreting Two Way Repeated Measures ANOVA Results” on page 230.For more information. 6. For more information. see “Set Two Way Repeated Measures ANOVA Options” on page 224. . SigmaPlot does not have an automatic nonparametric test if these assumptions are violated. Enter or arrange your data in the data worksheet. you have a two factor experiment with a single repeated treatment (salinity). if you analyze the effect of changing salinity on the activity of two different species of shrimp. View and interpret the Two Way Repeated Measures ANOVA report. 5. .

salinity and temperature. However. . the different combinations of treatments/factors levels are the cells of the comparison.e. i. you have a two factor experiment with two repeated treatments. the data for a Two Way ANOVA should be completely balanced.220 Chapter 6 Figure 6-1 Data for a Two Way Repeated Factor ANOVA with one repeated factor (salinity). Missing Data and Empty Cells Ideally. In both cases. Figure 6-2 Data for a Two Way Repeated Factor ANOVA with two repeated factors (temperature and salinity). If you wanted to test the effect of different salinities and temperatures on the activity on a single species of shrimp. each group or cell in the experiment has the same number of observations and there are no missing data.. SigmaPlot automatically handles both one and two repeated treatment factors. SigmaPlot properly handles all occurrences of missing and unbalanced data automatically.

Empty Cell(s). If there are missing values. SigmaPlot automatically handles the missing data by using a general linear model.e.. Under some circumstances. Assumption of no interaction analyzes the effects of each treatment separately. When there is an empty cell. . but there is still at least one repeated factor for every subject. i. particularly if you are interested in studying the interaction effect. This approach constructs hypothesis tests using the marginal sums of squares (also commonly called the Type III or adjusted sums of squares). SigmaPlot stops and suggests either analysis of the data assuming no interaction between the factors. or using One Way ANOVA.221 Comparing Repeated Measurements of the Same Individuals Missing Data Point(s). this assumption can lead to a meaningless analysis. Figure 6-3 Data for a Two Way Repeated Factor ANOVA with one repeated factor (salinity) and a missing data point SigmaPlot uses a general linear model to handle missing data points. Note: Assuming there is no interaction between the two factors in Two Way ANOVA can be dangerous. there are no observations for a combination of two factor levels.

222 Chapter 6 Figure 6-4 Data for a Two Way Repeated Factor ANOVA with two repeated factors (temperature and salinity) and a missing cell. Connected versus Disconnected Data The no interaction assumption requires that the non-empty cells must be geometrically connected in order to do the computation of a two factor no interaction model. You cannot perform Two Way Repeated Measures ANOVA on data disconnected by empty cells. . Figure 6-5 Data for a Two Way Repeated Factor ANOVA with geometrically disconnected data. each cell in the table is treated as a different level of a single experimental factor. If you treat the problem as One Way ANOVA. This approach is the most conservative analysis because it requires no additional assumptions about the nature of the data or experimental design. Data with missing cells that still have repeated factor data for every subject can be analyzed either by assuming no interaction or a One Way ANOVA.

SigmaPlot automatically handles this situation by converting the problem to a One Way Repeated Measures ANOVA. Entering Worksheet Data You can only perform a Two Way Repeated Measures ANOVA on data indexed by both subject and two factors. the first factor is in one column. and the actual data is in a fourth column. you can reference an appropriate statistics reference. Missing Factor Data for One Subject Another case of an empty cell can occur when both factors are repeated. This data cannot be analyzed as a Two Way Repeated Measures ANOVA problem. . Figure 6-6 Data for a Two Way Repeated Factor ANOVA with two factors repeated and no data for one level for a subject. For descriptions of the concept of connectivity. and there are no data for one level for one of the subjects. When the data is geometrically connected. If disconnected data is encountered during Two Way Repeated Measures ANOVA. SigmaPlot suggests treatment of the problem as a One Way Repeated Measures ANOVA. you can draw a series of straight vertical and horizontal lines connecting all cells containing data without changing direction in any empty cells. the subject index is in a third column. SigmaPlot automatically checks for this condition. The data is placed in four columns.223 Comparing Repeated Measurements of the Same Individuals This data cannot be analyzed with a Two Way Repeated Measures ANOVA. the second factor is in a second column.

see “Options for Two Way RM ANOVA: Results” on page 226. . Set Two Way Repeated Measures ANOVA Options Use the Two Way Repeated Measures ANOVA to: Adjust the parameters of a test to relax or restrict the testing of your data for normality and equal variance. Compute the power. drag the pointer over your data.224 Chapter 6 Note: SigmaPlot performs two way repeated measures for one factor repeated or both factors repeated. Results. or sensitivity. Compute the power or sensitivity of the test and enable multiple comparisons. For more information. 2. For more information. of the test. 3. Adjust the parameters of a test to relax or restrict the testing of your data for normality and equal variance. and want to select your data before you run the test. Post Hoc Test. For more information. Select Two Way RM ANOVA from the Standard toolbar drop-down list. SigmaPlot automatically determines if one or both factors are repeated from the data. To change the Two Way Repeated Measures ANOVA options: 1. From the menus select: Statistics Current Test Options The Options for Two Way RM ANOVA dialog box appears with three tabs: Assumption Checking. see “Options for Two Way RM ANOVA: Assumption Checking” on page 225. Display the statistics summary table and the confidence interval for the data and assign residuals to the worksheet. Display the statistics summary and the confidence interval for the data in the report and save residuals to a worksheet column. see “Options for Two Way RM ANOVA: Post Hoc Tests” on page 226. If you are going to run the test after changing test options. Enable multiple comparison testing. and uses the appropriate procedures.

To require a stricter adherence to normality and/or equal variance. these conditions should be easily detected by simply examining the data without resorting to the automatic assumption tests. increase the P value. Larger values of P (for example. To continue the test.100. click OK. click Run Test. P Values for Normality and Equal Variance. Requiring smaller values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal. . SigmaPlot tests for equal variance by checking the variability about the group means.050. SigmaPlot uses the Kolmogorov-Smirnov test to test for a normally distributed population. a P value of 0. Note: Although the assumption tests are robust in detecting data from populations that are non-normal or with unequal variances. Options for Two Way RM ANOVA: Assumption Checking Click the Assumption Checking tab to view options for normality and equal variance. there are extreme conditions of data distribution that these tests cannot take into account. 0. For example. To relax the requirement of normality and/or equal variance. To accept the current settings and close the options dialog box. the Levene Median test fails to detect differences in variance of several orders of magnitude. However. see “Running a Two Way Repeated Measures ANOVA” on page 227.050 requires greater deviations from normality to flag the data as non-normal than a value of 0. If the P value computed by the test is greater than the P set here.225 Comparing Repeated Measurements of the Same Individuals 4. For example. the test passes. Normality Testing. The P value determines the probability of being incorrect in concluding that the data is not normally distributed (the P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed). For more information. Equal Variance Testing. The normality assumption test checks for a normally distributed population. decrease P. the suggested value in SigmaPlot is 0. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions.100) require less evidence to conclude that the data is not normal. 5. The equal variance assumption test checks the variability about the group means.

or that you are willing to conclude there is a significant difference when P < 0. the average value for the column or group. Options for Two Way RM ANOVA: Post Hoc Tests Click the Post Hoc Tests tab to view options for power and multiple comparisons. but a greater possibility of concluding there is no difference when one exists. This indicates that a one in twenty chance of error is acceptable.05.05. Summary Table. The power or sensitivity of a test is the probability that the test will detect a difference between the groups if there is really a difference. The suggested value is α = 0. but also increase the risk of reporting a false positive. Multiple Comparisons The Two Way Repeated Measures ANOVA tests the hypothesis of no differences between the several treatment groups. Select Residuals to display residuals in the report and to save the residuals of the test to the specified worksheet column. confidence intervals. the standard deviation of the column or group. Larger values of α make it easier to conclude that there is a difference. To change the column the residuals are saved to. and the standard error of the mean for the column or group. edit the number or select a number from the drop-down list. Click the selected check box if you do not want to include the confidence interval in the report. the number of missing values for a column or group. but does not determine which groups are . To change the interval. Confidence Interval. enter any number from 1 to 99 (95 and 99 are the most commonly used intervals). Power. Alpha ( α ).226 Chapter 6 Options for Two Way RM ANOVA: Results Click the Results tab to view options for the summary table. Select Confidence Intervals to display the confidence interval for the difference of the means. and residuals. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. Smaller values of α result in stricter requirements before concluding there is a significant difference. Select Summary Table to display the number of observations for a column or group.

Note: If multiple comparisons are triggered.05 indicates that the multiple comparisons will detect a difference if there is a less than 5% chance that the multiple comparison is incorrect in detecting a difference. The P value used to determine if the ANOVA detects a difference is set in the Report Options dialog box. From the menus select: Statistics Repeated Measures Two Way Repeated Measures ANOVA .227 Comparing Repeated Measurements of the Same Individuals different. the Multiple Comparison Options dialog box appears after you pick your data from the worksheet and run the test. Running a Two Way Repeated Measures ANOVA To run a test. or the sizes of these differences. you need to select the data to test. If the P value produced by the Two Way RM ANOVA is less than the P value specified in the box. This value determines the likelihood of the multiple comparison being incorrect in concluding that there is a significant difference in the treatments. Multiple comparison procedures isolate these differences. Select Always Perform to perform multiple comparisons whether or not the ANOVA detects a difference. 1. prompting you to choose a multiple comparison method. A value of . a difference in the groups is detected and the multiple comparisons are performed. A value of . Select Only When ANOVA P Value is Significant to perform multiple comparisons only if the ANOVA detects a difference. Significant Multiple Comparison Value. drag the pointer over your data. Performing Multiple Comparisons. You can choose to always perform multiple comparisons or to only perform multiple comparisons if a Two Way Repeated Measures ANOVA detects a difference.05 or . Select either .10 from the Significance Value for Multiple Comparisons drop-down list. If you want to select your data before you run the test.10 indicates that the multiple comparisons will detect a difference if there is a less than 10% chance that the multiple comparison is incorrect in detecting a difference.

If you elected to test for normality and equal variance. but the data is still connected. or if a subject is missing data for one level. If you selected columns before you chose the test. To change your selections. 7. you are prompted to select two worksheet columns. and all successively selected columns are assigned to successive rows in the list. select the columns in the worksheet. 8. SigmaPlot informs you. 2. Continue using a One Way ANOVA. SigmaPlot performs the test for normality (Kolmogorov-Smirnov) and the test for equal variance (Levene Median). You can also clear a column assignment by double-clicking it in the Selected Columns list. The title of selected columns appears in each row. then select new column from the worksheet. You can either continue. Select the appropriate data format from the Data Format drop-down list. For raw and indexed data. the selected columns appear in the Selected Columns list. 5. If your data have empty cells. 4. you cannot perform a Two Way Repeated Measures ANOVA. select the assignment in the list. 6. Click Next to pick the data columns for the test. The first selected column is assigned to the first row in the Selected Columns list. or by performing a one factor analysis on each cell If your data is not geometrically connected. or cancel the test . For more information. 3. then perform a Two Way Repeated Measures ANOVA on the transformed data. or select the columns from the Data for Data drop-down list. or transform your data. To assign the desired worksheet columns to the Selected Columns list. see “Data Format for Repeated Measures Tests” on page 176. you are prompted to perform the appropriate procedure. Click Finish to run the Two Way RM ANOVA on the selected columns. you may have to proceed by either assuming no interaction between the factors. If you are missing a cell. If your data fail either test.228 Chapter 6 The Pick Columns for Two Way RM ANOVA dialog box appears prompting you to specify a data format.

SigmaPlot automatically proceeds. There are six multiple comparison tests to choose from for the Two Way Repeated Measures ANOVA. see “Tukey Test” on page 164. If the P value for multiple comparisons is significant. For more information. Bonferroni t-test. but does not determine which groups are different. Student-Newman-Keuls Test. see “Holm-Sidak Test” on page 163. see “Set Two Way Repeated Measures ANOVA Options” on page 224. see “Arranging Two Way Repeated Measures ANOVA Data” on page 219. the Multiple Comparison Options dialog appears prompting you to specify a multiple comparison test. Multiple comparison tests isolate these differences by running comparisons between the experimental groups. You can choose to perform the: Holm-Sidak Test. Multiple Comparison Options (Two Way RM ANOVA) The Two Way Repeated Measures ANOVA tests the hypothesis of no differences between the several treatment groups. For more information. For more information. For more information. and the P value is not significant. Tukey Test. use the Format Menu commands. For more information on the P value and how if affects multiple comparison testing. For more information. or the sizes of these differences. or you selected to always run multiple comparisons in the Options for Two Way RM ANOVA dialog box. or you selected to always perform multiple comparisons.For more information. the One Way ANOVA report appears after the test is complete. The P value produced by the ANOVA is displayed in the upper left corner of the dialog box.For more information. the Multiple Comparisons Options dialog box appears prompting you to select a multiple comparison method. If you selected to run multiple comparisons only when the P value is significant. see “Fisher’s Least Significance Difference Test” on page 165. but there is still at least one observation in each cell. For more information. and the ANOVA produces a P value equal to or less than the trigger P value. To edit the report. If you selected to run multiple comparisons only when the P value is significant. 9. Fisher’s LSD. see “Bonferroni t-Test” on page 164. see “Student-Newman-Keuls (SNK) Test” on page 164. . see the section in Setting Two Way Repeated Measures ANOVA Options.229 Comparing Repeated Measurements of the Same Individuals If you are missing a few data points. see “Set Two Way Repeated Measures ANOVA Options” on page 224.

Because no statistical test eliminates uncertainty. interpreting multiple comparisons among different levels of each experimental factor may not be meaningful.230 Chapter 6 Dunnet’s Test. This table displays the sum of squares. degrees of freedom. i.. the treatments within one factor are compared among themselves without regard to the second factor. The corresponding F statistics and the corresponding P values are also displayed. see “Duncan’s Multiple Range” on page 165. Additional results for both forms of Two Way Repeated Measure ANOVA can be disabled and enabled in the Options for Two Way RM ANOVA dialog box. Multiple comparisons versus a control compare all experimental treatments to a single control group. Interpreting Two Way Repeated Measures ANOVA Results A Two Way Repeated Measures ANOVA of one repeated factor generates an ANOVA table describing the source of the variation among the treatments. A Two Way Repeated Measures ANOVA of two repeated factors includes the sum of squares. The result of both comparisons is a listing of the similar and different treatment pairs. see “Dunnett’s Test” on page 165. and vice versa. Corresponding F statistics and the corresponding P values are also displayed. For more information. For more information. These results should be used when the interaction is not statistically significant. There are two types of multiple comparisons available for the Two Way Repeated Measures ANOVA. and for the subject and the repeated factor. The types of comparison you can make depends on the selected multiple comparison test. degrees of freedom. SigmaPlot also performs a multiple comparison between all the cells. and mean squares for the subjects. multiple comparison procedures sometimes produce ambiguous groupings. and mean squares for the subjects with both factors. for both factors together. Tables of least square means for each of the levels of factor and for the levels of both factors together are also generated for both one and two factor two way repeated measures ANOVA. When comparing the two factors separately. For more . Duncan’s Multiple Range Test. since both factors are repeated. those treatments that are and are not different from each other. for each factor. When the interaction is statistically significant. All pairwise comparisons compare all possible pairs of treatments.e.

expanded explanations of the results may also appear. the estimated mean square equations are listed. If you performed a One Way ANOVA. Determining if the values in this column are affected by the different factor levels is the objective of the Two Way Repeated Measures ANOVA. If you choose no interactions.231 Comparing Repeated Measurements of the Same Individuals information. you can reference an appropriate statistics reference. the results shown are identical to one way ANOVA results. and the summary table displays the estimated least square means. see “Interpreting One Way Repeated Measures ANOVA Results” on page 209. you either analyzed the problem assuming no interaction. If There Were Missing Data or Empty Cells If your data contained missing values but no empty cells. . The ANOVA table includes the approximate degrees of freedom used to compute F. the report indicates the results were computed using a general linear model. or treated the problem as a One Way ANOVA. For descriptions of the derivations for two way repeated measures ANOVA results. You can also set the number of decimal places to display in the Options dialog box. Dependent Variable This is the column title of the indexed worksheet data you are analyzing with the Two Way Repeated Measures ANOVA. no statistics for factor interaction are calculated. Result Explanations In addition to the numerical results. If your data contained empty cells. Multiple comparisons are enabled in the Options for Two Way RM ANOVA dialog box. For more information. You can turn off this text on the Options dialog box. see “Set Two Way Repeated Measures ANOVA Options” on page 224.

For more information. A normally distributed source is required for all parametric tests. This result appears if you enabled equal variance testing in the Options for Two Way RM ANOVA dialog box. and the P value calculated by the test. . DF (Degrees of Freedom). Factor degrees of freedom are measures of the number of treatments in each factor (columns in the table).232 Chapter 6 Normality Test Normality test results display whether the data passed or failed the test of the assumption that the differences of the changes originate from a normal distribution. see “Set Two Way Repeated Measures ANOVA Options” on page 224. see “Set Two Way Repeated Measures ANOVA Options” on page 224. Equal Variance Test Equal Variance test results display whether or not the data passed or failed the test of the assumption that the differences of the changes originate from a population with the same variance. and the P value calculated by the test. The subject x factor degrees of freedom is a measure of the number of subjects and treatments for the factor. This result appears if you enabled normality testing in the Options for Two Way RM ANOVA dialog box. The results are calculated for each factor. Equal variance of the source is assumed for all parametric tests. The degrees of freedom are a measure of the numbers of subjects and treatments. The factor x factor interaction degrees of freedom is a measure of the total number of cells. which affects the sensitivity of the ANOVA. ANOVA Table The ANOVA table lists the results of the two way repeated measures ANOVA. and then between the factors. For more information. The subjects degrees of freedom is a measure of the number of subjects (rows in the table).

The error mean square (residual. MS (Mean Squares). Factor sum of squares measures variability of treatments in each factor (between the rows and columns of the table. The subject x factor sum of squares is a measure of the variability of the subjects within each factor. considered separately). The mean square for each factor is an estimate of the variance of the underlying population computed from the variability between levels of the factor. considered separately. SS (Sum of Squares). The sum of squares is a measure of variability associated with each element in the ANOVA table. Comparing these variance estimates is the basis of analysis of variance. The factor x factor interaction sum of squares measures the variability of the treatments for both factors. The subjects sum of squares measures the variability of all subjects. or within groups) .233 Comparing Repeated Measurements of the Same Individuals The residual degrees of freedom is a measure of difference between the number of subjects and the number of treatments after accounting for factor and interaction. The mean squares provide estimates of the population variances. The residual sum of squares is a measure of the underlying variability of all observations. this is the variability of the average differences between the cell in addition to the variation between the rows and columns. The interaction mean square is an estimate of the variance of the underlying population computed from the variability associated with the interactions of the factors.

Power The power of the performed test is displayed unless you disable this option in the Options for Two Way RM ANOVA dialog box. or committing a Type I error. the differences between the treatments are statistically significant). If the F ratio is around 1. The F test statistic is provided for comparisons within each factor and between the factors If there are no missing data. and you can conclude that the samples were drawn from different populations (i. See DF (Degrees of Freedom) above for an explanation of the degrees of freedom for each variable.05.e.. the ANOVA table also includes the approximate degrees of freedom that allow for the missing value(s). F Test Statistic. If a general linear model was used. you can conclude there are significant differences if P < 0... the data is consistent with the null hypothesis that there is no effect (i. the greater the probability that the samples are drawn from different populations. based on F). If F is a large number. The P value is the probability of being wrong in concluding that there is a true difference between the treatments (i.234 Chapter 6 is an estimate of the variability in the underlying population. Approximate DF (Degrees of Freedom). the variability among the means is larger than expected from random variability in the population. The smaller the P value. .e. SigmaPlot automatically adjusts the F computations to account for the offsets of the expected mean squares. P value. Traditionally. no differences among treatments). the F statistic within the factors is: and the F ratio between the factors is: Note: If there are missing data or empty cells. computed from the random component of the observations. the probability of falsely rejecting the null hypothesis.e.

but also increase the risk of seeing a false difference (a Type I error). These equations are displayed only if a general linear model was used. Smaller values of α result in stricter requirements before concluding there is a significant difference. The closer the power is to 1. A measure of uncertainty in the mean. or sensitivity. and for each combination of factors (summary table cells). (If the sample sizes in different cells are different. the number of treatments being compared. Set the value in the Options for Two Way RM ANOVA dialog box. the least square means are estimated using a general linear model. Summary Table The least square means and standard error of the means are displayed for each factor separately (summary table row and column). The Least Squares Mean and associated Standard Error are computed based on all the data.235 Comparing Repeated Measurements of the Same Individuals The power. all the least square errors will be equal for all cells.05 which indicates that a one in twenty chance of error is acceptable. Mean. the least . In particular. The average value for the condition or group. and the observed standard deviations of the samples. but a greater possibility of concluding there is no difference when one exists (a Type II error). Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. Repeated Measures ANOVA power is affected by the sample sizes. the chance of erroneously reporting a difference α (alpha). If there are missing values. the linear equations for the expected mean squares computed by the model are displayed. A Type I error is when you reject the hypothesis of no effect when this hypothesis is true. if the design is balanced. Expected Mean Squares If there were missing data and a general linear model was used. the suggested value is α = 0. An α error is also called a Type I error. Standard Error of the Mean. Larger values of α make it easier to conclude that there is a difference. Alpha ( α ). These values can differ from the values computed from the data in the individual cells. of a Two Way Repeated Measures ANOVA is the probability that the test will detect a difference among the treatments if there really is a difference. the more sensitive the test. the observed differences of the group means.

236 Chapter 6

squares standard errors will be different, depending on the sample sizes, with larger standard errors associated with smaller sample sizes.) These standard errors will be different than the standard errors computed from each cell separately. This table is generated if you select to display summary table in the Options for Two Way RM ANOVA dialog box. For more information, see “Set Two Way Repeated Measures ANOVA Options” on page 224.

Multiple Comparisons
If SigmaPlot finds a difference among the treatments, then you can compute a multiple comparison table. Multiple comparisons are enabled in the Options for Two Way Repeated Measures ANOVA dialog box. Use the multiple comparison results to determine exactly which treatments are different, since the ANOVA results only inform you that two or more of the treatments are different. Two factor multiple comparison for a full Two Way ANOVA also compares: Treatments within each factor without regard to the other factor (this is a marginal comparison, i.e., only the columns or rows in the table are compared). All combinations of factors (all cells in the table are compared). The specific type of multiple comparison results depends on the comparison test used and whether the comparison was made pairwise or versus a control. All pairwise comparison results list comparisons of all possible combinations of group pairs; the all pairwise tests are the Tukey, Student-Newman-Keuls, Fisher LSD, Duncan’s, and Dunnett’s, and Bonferroni t-test. Comparisons versus a single control group list only comparisons with the selected control group. The control group is selected during the actual multiple comparison procedure. The comparison versus a control tests are a Bonferroni t-test and Dunnett’s test. Bonferroni t-test Results. The Bonferroni t-test lists the differences of the means for each pair of treatments, computes the t values for each pair, and displays whether or not P < 0.05 for that comparison. The Bonferroni t-test can be used to compare all treatments or to compare versus a control. You can conclude from "large" values of t that the difference of the two treatments being compared is statistically significant.

237 Comparing Repeated Measurements of the Same Individuals

If the P value for the comparison is less than 0.05, the likelihood of erroneously concluding that there is a significant difference is less than 5%. If it is greater than 0.05, you cannot confidently conclude that there is a difference. The Difference of Means is a gauge of the size of the difference between the treatments or cells being compared. The degrees of freedom DF for the marginal comparisons are a measure of the number of treatments (levels) within the factor being compared. The degrees of freedom when comparing all cells is a measure of the sample size after accounting for the factors and interaction (this is the same as the error or residual degrees of freedom). Tukey, Student-Newman-Keuls, Fisher LSD, Duncan’s, and Dunnett’s Test Results. The Tukey, Student-Newman-Keuls (SNK), Fisher LSD, and Duncan’s tests are all pairwise comparisons of every combination of group pairs. While the Tukey Fisher LSD, and Duncan’s can be used to compare a control group to other groups, they are not recommended for this type of comparison. Dunnett’s test only compares a control group to all other groups. All tests compute the q test statistic, the number of means spanned in the comparison p, and display whether or not P < 0.05 for that pair comparison. You can conclude from "large" values of q that the difference of the two treatments being compared is statistically significant. If the P value for the comparison is less than 0.05, the likelihood of being incorrect in concluding that there is a significant difference is less than 5%. If it is greater than 0.05, you cannot confidently conclude that there is a difference. p is the parameter used when computing q. The larger the p, the larger q needs to be to indicate a significant difference. p is an indication of the differences in the ranks of the group means being compared. Groups means are ranked in order from largest to smallest in an SNK test, so p is the number of means spanned in the comparison. For example, when comparing four means, comparing the largest to the smallest p = 4, and when comparing the second smallest to the smallest p = 2. If a treatment is found to be not significantly different than another treatment, all treatments with p ranks in between the p ranks of the two treatments that are not different, are also assumed not to be significantly different, and a result of DNT (Do Not Test) appears for those comparisons. Note: SigmaPlot does not apply the DNT logic to all pairwise comparisons because of differences in the degrees of freedom between different cell pairs. The Difference of Means is a gauge of the size of the difference between the treatments or cells being compared.

238 Chapter 6

The degrees of freedom DF for the marginal comparisons are a measure of the number of treatments (levels) within the factor being compared. The degrees of freedom when comparing all cells is a measure of the sample size after accounting for the factors and interaction (this is the same as the error or residual degrees of freedom).

Two way repeated measures ANOVA report graphs
You can generate up to five graphs using the results from a Two Way Repeated Measures ANOVA. They include a: Histogram of the residuals. For more information, see “Histogram of Residuals” on page 547. Normal probability plot of the residuals. For more information, see “Normal Probability Plot” on page 549. 3D scatter plot of the residuals. For more information, see “3D Residual Scatter Plot” on page 551. 3D category scatter plot. For more information, see “3D Category Scatter Graph” on page 553. Multiple comparison graphs. For more information, see “Multiple Comparison Graphs” on page 555.

How to Create a Two Way Repeated Measures ANOVA Report Graph
1. Select the Two Way Repeated Measures ANOVA test report. 2. From the menus select:
Graph Create Result Graph

The Create Result Graph dialog box appears displaying the types of graphs available for the Two Way Repeated Measure ANOVA results. 3. Select the type of graph you want to create from the Graph Type list, then click OK, or double-click the desired graph in the list. The selected graph appears in a graph window.

239 Comparing Repeated Measurements of the Same Individuals

Friedman Repeated Measures Analysis of Variance on Ranks
Use a Repeated Measures ANOVA (analysis of variance) on Ranks when: You want to see if a single group of individuals was affected by a series of three or more different experimental treatments, where each individual received treatment. The treatment effects are not normally distributed. If you know the treatment effects are normally distributed, use One Way Repeated Measures ANOVA. If there are only two treatments to compare, do a Wilcoxon Signed Rank Test. There is no two factor test for non-normally distributed treatment effects; however, you can transform your data using Transform Menu commands so that it fits the assumptions of a parametric test. Note: Depending on your Repeated Measures ANOVA on Ranks option settings, if you attempt to perform a Repeated Measures ANOVA on Ranks on a normal population, SigmaPlot informs you that the data is suitable for a parametric test, and suggests One Way Repeated Measures ANOVA instead. For more information, see “Setting the Repeated Measures ANOVA on Ranks Options” on page 240.

About the Repeated Measures ANOVA on Ranks
The Friedman Repeated Measures Analysis of Variance on Ranks compares effects of a series of different experimental treatments on a single group. Each subject’s responses are ranked from smallest to largest without regard to other subjects, then the rank sums for the treatments are compared. The Friedman Repeated Measures ANOVA on Ranks is a nonparametric test that does not require assuming all the differences in treatments are from a normally distributed source with equal variance.

Performing a Repeated Measures ANOVA on Ranks
To perform a Repeated Measures ANOVA on Ranks: 1. Enter or arrange your data in the worksheet. For more information, see “Arranging Repeated Measures ANOVA on Ranks Data” on page 240.

240 Chapter 6

2. Set the rank sum options. For more information, see “Setting the Repeated Measures ANOVA on Ranks Options” on page 240. 3. From the menus select:
Statistics Repeated Measures Repeated Measures ANOVA on Ranks

4. Run the test. For more information, see “Running a Repeated Measures ANOVA on Ranks” on page 243. 5. Specify the multiple comparisons you want to perform on your data. For more information, see “Multiple Comparison Options (RM ANOVA on ranks)” on page 244. 6. View and interpret the Repeated Measures ANOVA on Ranks report. For more information, see “Interpreting Repeated Measures ANOVA on Ranks Results” on page 245. 7. Generate report graph. For more information, see “Repeated Measures ANOVA on Ranks Report Graphs” on page 249.

Arranging Repeated Measures ANOVA on Ranks Data
The format of the data to be tested can be raw data or indexed data. Data for raw data is placed in as many columns as there are treatments, up to 64; each column contains the data for one treatment and each row contains the treatments of one subject. Indexed data is placed in three worksheet columns: a factor column, a subject index column, and a data column. The columns for raw data must be the same length. If a missing value is encountered, that individual is ignored.

Setting the Repeated Measures ANOVA on Ranks Options
Use the Repeated Measures ANOVA on Ranks options to:

241 Comparing Repeated Measurements of the Same Individuals

Adjust the parameters of the test to relax or restrict the testing of your data for normality and equal variance. Display the summary table. Enable and disable multiple comparison testing.
To change the Repeated Measures ANOVA on Ranks options:

1. Select RM ANOVA on Ranks from the Standard toolbar. 2. On the menus click:
Statistics Current Test Options

The Options for RM ANOVA on Ranks dialog box appears with three tabs: Assumption Checking. Select the Assumption Checking tab to view the Normality and Equal Variance options. Results. Select the Results tab to view the Summary Table option. Post Hoc Tests. Select the Post Hoc Test tab to view the multiple comparisons options. 3. To continue the test, click Run Test. For more information, see “Running a Repeated Measures ANOVA on Ranks” on page 243. 4. To accept the current settings and close the options dialog box, click OK.

Options for RM ANOVA on Ranks: Assumption Checking
The normality assumption test checks for a normally distributed population. The equal variance assumption test checks the variability about the group means. Normality Testing. SigmaPlot Uses the Kolmogorov-Smirnov test to test for a normally distributed population. Equal Variance Testing. SigmaPlot Tests for equal variance by checking the variability about the group means. P Values for Normality and Equal Variance. Enter the corresponding P value in the P Value to Reject box. The P value determines the probability of being incorrect in

242 Chapter 6

concluding that the data is not normally distributed (the P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed). If the P value computed by the test is greater than the P set here, the test passes. To require a stricter adherence to normality and/or equal variance, increase the P value. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions, the suggested value in SigmaPlot is 0.050. Larger values of P (for example, 0.100) require less evidence to conclude that data is not normal. To relax the requirement of normality and/or equal variance, decrease P. Requiring smaller values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal. For example, a P value of 0.050 requires greater deviations from normality to flag the data as non-normal. Note: Although the assumption tests are robust in detecting data from populations that are non-normal or with unequal variances, there are extreme conditions of data distribution that these tests cannot take into account. For example, the Levene Median test fails to detect differences in variance of several orders of magnitude. However, these conditions should be easily detected by simply examining the data without resorting to the automatic assumption tests.

Options for RM ANOVA on Ranks: Results
The Summary Table for ANOVA on Ranks lists the medians, percentiles, and sample sizes N in the ANOVA on Ranks report. If desired, change the percentile values by editing the boxes. The 25th and the 75th percentiles are the suggested percentiles.

Options for RM ANOVA on Ranks: Post Hoc Tests
Select the Post Hoc Test tab in the Options dialog box to view the multiple comparisons options. Repeated Measures ANOVA on Ranks test the hypothesis of no differences between the several treatment groups, but do not determine which groups are different, or the sizes of these differences. Multiple comparison procedures isolate these differences. The P value used to determine if the ANOVA detects a difference is set in the Report Options dialog box. If the P value produced by the One Way ANOVA is less than the P value specified in the box, a difference in the groups s detected and the multiple comparisons are performed.

243 Comparing Repeated Measurements of the Same Individuals

Performing Multiple Comparisons. You can choose to always perform multiple comparisons or to only perform multiple comparisons if the Two Way ANOVA detects a difference. Select the Always Perform option to perform multiple comparisons whether or not the ANOVA detects a difference. Select the Only When ANOVA P Value is Significant option to perform multiple comparisons only if the ANOVA detects a difference. Significant Multiple Comparison Value. Select either .05 or .10 from the Significance Value for Multiple Comparisons drop-down list. This value determines the likelihood of the multiple comparison being incorrect in concluding that there is a significant difference in the treatments. A value of .05 indicates that the multiple comparisons will detect a difference if there is less than 5% chance that the multiple comparison is incorrect in detecting a difference. A value of .10 indicates that the multiple comparisons will detect a difference if there is less than 10% chance that the multiple comparison is incorrect in detecting a difference. Note: If multiple comparisons are triggered, the Multiple Comparison Options dialog box appears after you pick your data from the worksheet and run the test, prompting you to choose a multiple comparison method. For more information, see “Multiple Comparison Options (RM ANOVA on ranks)” on page 244.

Running a Repeated Measures ANOVA on Ranks
To run an Repeated Measures ANOVA on Ranks, you need to select the data to test. If you want to select your data before you run the test, drag the pointer over your data. 1. From the menus select:
Statistics Repeated Measures Repeated Measures ANOVA on Ranks

The Pick Columns for RM ANOVA on Ranks dialog box appears prompting you to specify a data format. 2. Select the appropriate data format from the Data Format drop-down list. For more information, see “Data Format for Repeated Measures Tests” on page 176.

244 Chapter 6

3. Click Next to pick the data columns for the test. If you selected columns before you chose the test, the selected columns appear in the Selected Columns list. 4. To assign the desired worksheet columns to the Selected Columns list, select the columns in the worksheet, or select the columns from the Data for Data drop-down list. The first selected column is assigned to the first row in the Selected Columns list, and all successively selected columns are assigned to successive rows in the list. The title of selected columns appears in each row. For raw and indexed data, you are prompted to select two worksheet columns. 5. To change your selections,select the assignment in the list, then select new column from the worksheet. You can also clear a column assignment by double-clicking it in the Selected Columns list. 6. Click Finish to run the RM ANOVA on Ranks test on the selected columns. If you elected to test for normality and equal variance, SigmaPlot performs the test for normality (Kolmogorov-Smirnov) and the test for equal variance (Levene Median). If your data passes both tests, SigmaPlot informs you and suggests continuing your analysis using One Way Repeated Measures ANOVA. If you did not enable multiple comparison testing in the Options for RM ANOVA on Ranks dialog box, the Repeated Measures ANOVA on Ranks report appears after the test is complete. If you did enable the Multiple Comparisons option in the options dialog box, the Multiple Comparison Options dialog box appears prompting you to select a multiple comparison method. For more information, see “Multiple Comparison Options (RM ANOVA on ranks)” on page 244.

Multiple Comparison Options (RM ANOVA on ranks)
If you selected to run multiple comparisons only when the P value is significant, and the ANOVA produces a P value, for either of the two factors or the interaction between the two factors, equal to or less than the trigger P value, or you selected to always run multiple comparisons in the Options for RM ANOVA on Ranks dialog box, the Multiple Comparison Options dialog box appears prompting you to specify a multiple comparison test. For more information, see “Setting the Repeated Measures ANOVA on Ranks Options” on page 240.

245 Comparing Repeated Measurements of the Same Individuals

This dialog box displays the P values for each of the two experimental factors and of the interaction between the two factors. Only the options with P values less than or equal to the value set in the Options dialog box are selected. You can disable multiple comparison testing for a factor by clicking the selected option. If no factor is selected, multiple comparison results are not reported. There are four multiple comparison tests to choose from for the ANOVA on Ranks. You can choose to perform the: Dunn’s Test. Dunnett’s Test. Tukey Test. Student-Newman-Keuls Test. There are two kinds of multiple comparison procedures available for the Repeated Measures ANOVA on Ranks. All pairwise comparisons test the difference between each treatment or level within the two factors separately (i.e., among the different rows and columns of the data table) Multiple comparisons versus a control test the difference between all the different combinations of each factors (i.e., all the cells in the data table)

Interpreting Repeated Measures ANOVA on Ranks Results
The Friedman Repeated Measures ANOVA on Ranks report displays the results for χ r For descriptions of the derivations for ANOVA on Ranks results, you can reference an appropriate statistics reference.
2

Result Explanations. In addition to the numerical results, expanded explanations of the results may also appear. You can turn off this text on the Options dialog box. You can also set the number of decimal places to display in the Options dialog box.

Normality Test
Normality test results display whether the data passed or failed the test of the assumption that the differences of the treatments originate from a normal distribution, and the P value calculated by the test. For nonparametric procedures this test can fail, as nonparametric tests do not require normally distributed source populations. This

246 Chapter 6

result appears unless you disabled normality testing in the Options for RM ANOVA on Ranks dialog box. For more information, see “Setting the Repeated Measures ANOVA on Ranks Options” on page 240.

Equal Variance Test
Equal Variance test results display whether or not the data passed or failed the test of the assumption that the differences of the treatments originate from a population with the same variance, and the P value calculated by the test. Nonparametric tests do not assume equal variance of the source. This result appears unless you disabled equal variance testing in the Options for RM ANOVA on Ranks dialog box. For more information, see “Setting the Repeated Measures ANOVA on Ranks Options” on page 240.

Summary Table
SigmaPlot can generate a summary table listing the sample sizes N, number of missing values, medians, and percentiles defined in the Options for RM ANOVA on Ranks dialog box. N (Size). The number of non-missing observations for that column or group. Missing. The number of missing values for that column or group. Medians. The "middle" observation as computed by listing all the observations from smallest to largest and selecting the largest value of the smallest half of the observations. The median observation has an equal number of observations greater than and less than that observation. Percentiles.The two percentile points that define the upper and lower tails of the observed values. These results appear in the report unless you disable them in the Options for RM ANOVA on Ranks dialog box. For more information, see “Setting the Repeated Measures ANOVA on Ranks Options” on page 240.

The test used in the multiple comparison procedure is selected in the Multiple Comparison Options dialog box.05. see “Setting the Repeated Measures ANOVA on Ranks Options” on page 240 . It is a measure of the number of treatments.e.. .. and you requested and elected to perform multiple comparisons. If the value of χ r is large. the ranks within each subject are random. 2 Values of χ r near zero indicates that there is no significant difference in treatments. The P value is the probability of being wrong in concluding that there is a true difference in the treatments (i. you can conclude there are significant differences when P < 0. The ranks are summed for each treatment and χ r is computed from the sum of squares. 2 or committing a Type I error.e.The degrees of freedom is an indication of the sensitivity of χ r . Student-Newman-Keuls test and Dunn’s test. 2 P value. based on χ r . 2 Degrees of Freedom. you can conclude that the treatment effects are different (i. The multiple comparison procedure is activated in the Options for ANOVA on Ranks dialog box. For more information. Multiple comparison results are used to determine exactly which groups are different. Multiple Comparisons If a difference is found among the groups. The specific type of multiple comparison results depends on the comparison test used and whether the comparison was made pairwise or versus a control. that the differences in the rank sums are greater than would be expected by chance). 2 χ r is computed by ranking all observations for each subject from smallest to largest 2 without regard for other subjects. a table of the comparisons between group pairs is displayed. Traditionally. The smaller the P value.247 Comparing Repeated Measurements of the Same Individuals Chi-Square Statistic Statistic The Friedman test statistic χ r is used to evaluate the null hypothesis that all the rank 2 sums are equal. All pairwise comparison results list comparisons of all possible combinations of group pairs: the all pairwise tests are the Tukey. the probability of falsely rejecting the null hypothesis. the greater the probability that the samples are significantly different. since the ANOVA results only inform you that two or more of the groups are different.

For example. You can conclude from "large" values of q that the difference of the two treatments being compared is statistically significant. If the P value for the comparison is less than 0.05. comparing the largest to the smallest p = 4. The comparison versus a control tests are Dunnett’s test and Dunn’s test. Note: SigmaPlot does not apply the DNT logic to all pairwise comparisons because of differences in the degrees of freedom between different cell pairs.05. the larger q needs to be to indicate a significant difference. so p is the number of ranks spanned in the comparison. Dunnett’s test only compares a control group to all other groups.05 for that pair comparison. p is an indication of the differences in the ranks of the rank sums being compared. for each treatment pair. Dunn’s Test Results. Student-Newman-Keuls. Dunn’s test is used to compare all treatments or to compare versus a control when the group sizes are unequal. and display whether or not P < 0. All tests compute the q test statistic.05. and when comparing the second smallest to the smallest p = 2. p is parameter used when computing q. computes the Q test statistic. and displays whether or not P < 0. the likelihood of being incorrect in concluding that there is a significant difference is less than 5%. Tukey. you cannot confidently conclude that there is a difference. Group rank sums are ranked in order from largest to smallest in an SNK test. Dunn’s test lists the difference of ranks. The rank sums is a gauge of the size of the difference between the two treatments. If a treatment is found to be not significantly different than another treatment. all treatments with p ranks in between the p ranks of the two treatments that are not different are also assumed not to be significantly different. The control group is selected during the actual multiple comparison procedure.248 Chapter 6 Comparisons versus a single control list only comparisons with the selected control group. The rank sums is a gauge of the size of the difference between the two treatments. and Dunnett’s Test Results. and a result of Do Not Test appears for those comparisons. If it is greater than 0. the likelihood of being incorrect in concluding that there is a significant difference is less than 5%.05. The larger the p. The Tukey and StudentNewman-Keuls (SNK) tests are all pairwise comparisons of every combination of group pairs. . when comparing four rank means. the number of rank sums spanned in the comparison p. You can conclude from "large" values of Q that the difference of the two treatments being compared is statistically significant. you cannot confidently conclude that there is a difference. If the P value for the comparison is less than 0.05. If it is greater than 0.

They include a: How to Create a Repeated Measures ANOVA on Ranks Report Graph 1. see “Repeated Measures ANOVA on Ranks Report Graphs” on page 249. Repeated Measures ANOVA on Ranks Report Graphs You can generate up to three graphs using the results from a Repeated Measures ANOVA on Ranks. then click OK.249 Comparing Repeated Measurements of the Same Individuals A result of DNT (do not test) appears for those comparison pairs whose difference of rank means is less than the differences of the first comparison pair which is found to be not significantly different. Select the type of graph you want to create from the Graph Type list. Select the Repeated Measures ANOVA on Ranks test report. The selected graph appears in a graph window. or double-click the desired graph in the list. 2. 3. . From the menus select: Graph Create Report Graph The Create Result Graph dialog box appears displaying the types of graphs available for the One Way Repeated Measure ANOVA results. For more information.

250 Chapter 6 .

see "Choosing the Rate and Proportion Comparison to Use" in Chapter 3. and Proportions Use rate and proportion tests to compare two or more sets of data for differences in the number of individuals that fall into different classes or categories. A 2 x 3 table has two groups and three categories or three groups and two categories. etc. 251 . For more information. For more information.Chapter 7 Comparing Frequencies. Rates. see "Choosing the Procedure to Use" in Chapter 3. Contingency Tables Many rate and proportion tests utilize a contingency table which lists the groups and/or categories to be compared as the table column and row titles. You can find all of these tests by going to the menus and selecting: If you are comparing groups where the data is measured on a numeric scale. two rows and two columns). About Rate and Proportion Tests Rate and proportion tests are used when the data is measured on a nominal scale. use the appropriate group comparison or repeated measures tests. A contingency table is used to determine whether or not the distribution of a group is contingent on the categories it falls in. and the number of observations for each combination of category or group as the table cells. Rate and proportion comparisons test for significant differences in the categorical distribution of the data beyond what can be attributed to random variation. A 2 x 2 contingency table has two groups and two categories (for example.

Yates Correction The Yates Correction for continuity can be automatically applied to the z-test and for 2 all tests using 2 x 2 tables or comparisons with the χ distribution with one degree of freedom. To perform a z-Test. Use Chi-Square χ analysis of contingency if there are more than two groups or categories. Note: SigmaPlot computes a two-tailed Fisher Exact Test. 2 Use the Fisher Exact Test when the expected number of observations is less than five in any cell of a 2 x 2 contingency table.252 Chapter 7 Comparing the Proportions of Two Groups in One Category Use a z-test to compare the proportions of two groups found within a single category for a significant difference. Note than you can perform the Fisher Exact Test on any 2 x 2 contingency table. It is generally accepted that the Yates Correction yields a more accurately computed P value in these cases. from the menus select: Statistics Rates and Proportions z-Test Comparing Proportions of Multiple Groups in Multiple Categories You can use analysis of contingency tables to test if the distributions of two or more groups within two or more categories are significantly different. . or if the expected number of observations per cell in a 2 x 2 contingency table is greater than five. SigmaPlot automatically checks your data during a Chi-Square analysis and suggests the Fisher Exact Test when applicable. Comparing Proportions of the Same Group to Two Treatments You can test for differences in the proportions of the responses in the same individuals to a series of two different treatments using McNemar’s Test for changes.

The size (total number of observations) of each group is in one column. Note that the order and location of the rows or columns corresponding to the groups and categories is unimportant. Rates. and Proportions For descriptions of the Yates Correction Factor. Tabulated Data Tabulated data is arranged in a contingency table showing the number of observations for each cell. . Note: Whenever numbers of observations are listed. or vice versa. Data Format for Rate and Proportion Tests The exact format for each rate and proportion test varies from test to test. The number of observations must always be an integer. You can use the rows for category and the columns for group. The number of observations must always be an integer. The worksheet rows and columns correspond to the groups and categories. Application of the Yates Correction Factor is selected in the Options dialog box for each test. you can reference any appropriate statistics reference. and the proportions p must be between 0 and 1. they must always be integers. z-test The data for a z-test is always placed in two worksheet rows by two columns. and the corresponding proportion p of the observations within the category is in a second column.253 Comparing Frequencies. Chi—Squared Analysis of Contingency Tables The data can be arranged in the worksheet as either the contingency table data or as indexed raw data.

2 SigmaPlot automatically cross tabulates these data and performs the χ analysis on the resulting contingency table. For more information. so there should be as many rows of data as there are total numbers of observations. Each row corresponds to a single observation. Raw Data You can report the group and category of each individual observation by placing the group in one worksheet column and the corresponding category in another column. . and columns 4 and 5 are raw data. see “Arranging Chi-Square Data” on page 267. Figure 0-2 Worksheet Data Arrangement for Contingency Table Data from the Table above Columns 1 through 3 in the workshhet above are in tabular format.254 Chapter 7 Figure 0-1 A Contingency Table describing the number of Lowland and Alpine species found at different locations.

Figure 0-3 A 2 x 2 Contingency Table describing the number of harbor seals and sea lions found on two different islands. The number of observations must always be an integer. and Proportions Fisher Exact Test The data must form a 2 x 2 contingency table. . For more information. Tabulated Data. see “Arranging Fisher Exact Test DataArranging Fisher Exact Test Data”. so there should be as many rows of data as there are total numbers of observations. You can test tabulated data or raw data observations. SigmaPlot automatically cross-tabulates this data and performs the Fisher Exact Test on the resulting contingency table. A group identifier is placed in one worksheet column and the corresponding category in another column. Each row corresponds to a single observation. Rates. The worksheet rows and columns correspond to the groups and categories. There must be exactly two kinds of groups and two types of categories. with the number of observations in each cell. Raw Data. Tabulated data is arranged in a contingency table showing the number of observations for each cell.255 Comparing Frequencies.

The worksheet rows and columns correspond to the two groups of categories. since both the treatments must have the same number of categories. The number of observations must always be an integer.256 Chapter 7 Figure 0-4 Data Formats for a Fisher Exact Test Columns 1 and 2 in the worksheet above are in tabular format and columns 3 and 4 are raw data observations. . Tabulated Data. McNemar’s Test The data must form a table with the same number of rows and columns. The number of category types must be the same for both groups. Tabulated data is arranged in a contingency table showing the number of observations for each cell. You can test tabulated data or raw data observations. so that the contingency table is square. A Fisher Exact Test requires data for a 2 x 2 table.

There must be the same number of the types of categories.257 Comparing Frequencies.here a 3 x 3 table. Raw Data A category identifier is placed in one worksheet column and the corresponding category in another column. see “Arranging McNemar Test Data” on page 282. For more information. Each row corresponds to a single observation. Rates. and columns 4 through 6 are raw data observations. McNemar’s Test requires data for tables with equal numbers of columns and rows . . so there should be as many rows of data as there are total numbers of observations. and Proportions Figure 0-5 A 3 x 3 Contingency Table describing the effect of a report on the opinion of surveyed people. Figure 0-6 Data Formats for McNemar’s Test Columns 1 through 3 in the worksheet above are in tabular format. SigmaPlot automatically cross tabulates this data and performs McNemar’s Test on the resulting contingency table.

Enter or arrange your data in the data worksheet. The z-test assumes that: Each observation falls into one of two mutually exclusive categories. If desired.258 Chapter 7 Comparing Proportions Using the z-Test Compare proportions with a z-test when: You have two groups to compare. . set the z-test options. This will produce the 2 same P value as the z-test. Performing a z-test To perform a z-test: 1. see “Running a z-Test” on page 261. 3. Run the test. see “Setting z-test Options” on page 259. 2. If you have data for the numbers of observations for each group that fall in two 2 categories perform χ analysis of contingency tables instead. From the menus select: Statistics Rates and Proportions z-test 4. You can also run the χ analysis of contingency tables if you have more than two groups or categories.For more information. All observations are independent. You have the proportions p for each group that falls within a single category. For more information. You know the total sample size (number of observations) for each group. About the z-test The z-test comparison of proportions is used to determine if the proportions of two groups within one category or class are significantly different. see “Arranging z-test Data” on page 259. For more information.

There must be exactly two rows and two columns. Enable the Yates Correction Factor. For more information. see “Interpreting Proportion Comparison Results” on page 262. Display the power of a performed test for Compare Proportion tests in the reports. If you are going to run the test after changing test options. enter the two sample sizes in one column and the corresponding observed proportions p in a second column. For more information. 2. and want to select your data before you run the test. Select z-test from the Standard toolbar drop-down list. . Arranging z-test Data To compare two proportions. drag the pointer over your data. The sample sizes must be whole numbers and the observed proportions must be between 0 and 1. View and interpret the z-test report. Setting z-test Options Use the Compare Proportion options to: Display the confidence interval for the data in Compare Proportion test reports. see “Data Format for Rate and Proportion Tests” on page 253. To change z-test options: 1.259 Comparing Frequencies. Rates. and Proportions 5. see “Options for ztest” on page 260. For more information. From the menus select: Statistics Current Test Options The Options for z-test dialog box appears. 3.

The power or sensitivity of a test is the probability that the test will detect a difference between the proportions of two groups if there is really a difference. Larger values of α make it easier to conclude that there is a difference. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. Change the alpha value by editing the number in the Alpha Value box. see “Running a z-Test” on page 261. 5. This indicates that a one in twenty chance of error is acceptable. Click a check box to enable or disable a test option. All options are saved between SigmaPlot sessions. click OK. or that you are willing to conclude there is a significant difference when P < 0. Options for z-test Power. Select to detect the sensitivity of the test.05. but a greater possibility of concluding there is no difference when one exists. .05. Use Alpha Value. 6.260 Chapter 7 Figure 3-1 The Options for z-test Dialog Box 4. Smaller values of α result in stricter requirements before concluding there is a significant difference. click Run Test. The suggested value is α = 0. For more information. To accept the current settings and close the options dialog box. but also increase the risk of reporting a false positive. To continue the test.

e. select the box and type any number from 1 to 99 (95 and 99 are the most commonly used intervals). 2 Use the Yates Correction Factor to adjust the computed χ value down to compensate for this discrepancy. and Proportions The Yates Correction Factor. If you have not selected columns. From the menus select: Statistics Rates and Proportions z-test The Pick Columns dialog box appears. Running a z-Test To run a test. it increases the P value and reduces the chance of a false positive conclusion. Rates. Using the Yates correction makes a test more conservative. drag the pointer over your data. i. 2. such as analysis of a 2 x 2 contingency table or McNemar’s test. 2 Confidence Interval. the selected columns appear in the column list.261 Comparing Frequencies. you can reference any appropriate statistics reference. 2 the χ calculated tends to produce P values which are too small. The Yates correction is applied to 2 x 2 tables and other statistics where 2 the P value is computed from a χ distribution with one degree of freedom. To run a z-test: 1. For descriptions of the derivation of the Yates correction. When a statistical test uses a χ distribution with one degree of freedom. The Pick Columns dialog box is used to select the worksheet columns with the data you want to test and to specify how your data is arranged in the worksheet. when compared with 2 2 the actual distribution of the χ test statistic.. the dialog box prompts you to pick your data. you need to select the data to test. To change the specified interval. Click the selected check box to turn the Yates Correction Factor on or off. If you want to select your data before you run the test. The theoretical χ distribution is 2 continuous. . whereas the distribution of the χ test statistic is discrete. If you selected columns before you chose the test. This is the confidence interval for the difference of proportions.

The report appears displaying the results of the z-test. For more information. and the P for the test. the z statistic. 1. see “Interpreting Proportion Comparison Results” on page 262. The first selected column is assigned to the Size row in the Selected Columns list. Click Finish to perform the test. You can also clear a column assignment by double-clicking it in the Selected Columns list. . For descriptions of the derivation for z-test results.262 Chapter 7 Figure 2-1 The Pick Columns for z-test Dialog Box Prompting You to Select Data Columns 3. Interpreting Proportion Comparison Results The z-test report displays a table of the statistical values used. or select the columns from the Data for Size or Proportion drop-down list. You can also display a confidence interval for the difference of the proportions using the Options for z-test dialog box. you can reference any appropriate statistics reference. select the assignment in the list. and the second column is assigned to Proportion row in the list. You can only select one Size and one Proportion data column. see “Setting z-test Options” on page 259. select the columns in the worksheet. The titles of selected columns appear in each row. 4. For more information. then select new column from the worksheet. To change your selections. To assign the desired worksheet columns to the Selected Columns list.

Statistical Summary The summary table for a z-test lists the sizes of the groups n and the proportion of each group in the category p. This is the difference between the p proportions for the two groups. Figure 1-1 The z-test Comparison of Proportions Results Report Results Explanations In addition to the numerical results. To move to the next or the previous page in the report. use the Up and Down buttons in the formatting toolbar to move one page up and down in the report. You can also set the number of decimal places to display in the Options dialog box.263 Comparing Frequencies. Rates. These values are taken directly from the data. expanded explanations of the results may also appear. Difference of Proportions. and Proportions Note: The report scroll bars only scroll to the top and bottom of the current page. . You can turn off this text on the Options dialog box.

264 Chapter 7 Pooled Estimate for P. or committing a Type I error). The standard error of the difference is a measure of the precision with which this difference can be estimated. This can also be described as P < α . It depends on both the nature of the underlying population and the specific samples drawn. you can conclude that there is a significant difference between the proportions with the level of confidence specified. A large z indicates that the difference between the proportions is larger than what would be expected from sampling variability alone (i. The smaller the P value. that the difference between the proportions of the two groups is statistically significant). If you enabled the Yates correction in the Options for z-test dialog box. where α is the acceptable probability of incorrectly concluding that there is a difference. P Value. the greater the probability that the samples are drawn from populations with different proportions. Confidence Interval for the Difference If the confidence interval does not include zero.05. the calculation of z is slightly smaller to account for the difference between the theoretical and calculated values of z. z statistic The z statistic is You can conclude from "large" absolute values of z that the proportions of the populations are different. the probability of falsely rejecting the null hypothesis. A small z (near 0) indicates that there is no significant difference between the proportions of the two groups. you conclude that there are significant differences when P < 0. This is the estimate of the population proportion p based on pooling the two samples to test the hypothesis that they were drawn from the same population. . Traditionally. The P value is the probability of being wrong in concluding that there is a difference in the proportions of the two groups (for example. For more information. see “Setting z-test Options” on page 259.e. Standard Error of the Difference..

the more sensitive the test. For more information. Smaller values of a result in stricter requirements before concluding there is a difference in distribution. Rates. For a further explanation of a. The α value is set in the z-test Power dialog box.α ). this is typically 100(1. and Proportions Adjust the level of confidence in the Options dialog box. and smaller values in smaller intervals. 2 The χ test is computed based on the assumption that the rows and columns are independent: if the rows and columns are dependent. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. i. or 95%. Chi-square Analysis of Contingency Tables Use χ analysis of contingency tables when: 2 You want to compare the distributions of two or more groups whose individuals fall into two or more different classes or categories There are five or more observations expected in each cell of a 2 x 2 contingency table. Alpha ( α ).05 which indicates that a one in twenty chance of error is acceptable. but also increase the risk of seeing a false difference (a Type I error).e. This result is displayed unless you disable it in the Options for z-test dialog box. or sensitivity. Larger values of confidence result in wider intervals. see “The Fisher Exact Test” on page 275. For more information. the suggested value is α = 0.. z-test power is affected by the sample size and the observed proportions of the samples. of a z-test is the probability that the test will detect a difference among the groups if there really is a difference. An a error is also called a Type I error (a Type I error is when you reject the hypothesis of no effect when this hypothesis is true).265 Comparing Frequencies. If you have fewer than five observations in any cell of a 2 x 2 contingency table. For more information. the same group undergoes . Power The power. see “Setting z-test Options” on page 259. see “Setting z-test Options” on page 259. but a greater possibility of concluding there is no difference when one exists (a Type II error). The closer the power is to 1. see Power below. This result is displayed unless you disabled it in the Options for z-test dialog box. Larger values of a make it easier to conclude that there is a difference. use the Fisher Exact Test.

2.266 Chapter 7 two consecutive treatments. two rows and two columns). (for example. The χ test uses the percentages of the row and column totals for each cell to compute the expected number of observations per cell if the treatment had no effect. and the groups are the rows of the table (or vice versa). For more information. The different characteristics or categories are the columns of the table. 2 Performing a Chi-Square Test To perform a Chi-Square Test: 1. . Enter or arrange your data appropriately in the data worksheet. etc. Figure 1-2 A Contingency Table describing the number of Lowland and Alpine species found at different locations. see “McNemar’s Test” on page 281. 2 The χ statistic summarizes the difference between the expected and the observed frequencies. see “Setting Chi-Square Options” on page 268. About the Chi-Square Test The Chi-Square Test analyzes data in a contingency table. see “Data Format for Rate and Proportion Tests” on page 253. a 2 x 3 table has two groups and three categories or three groups and two categories. A contingency table is a table of the number of individuals in each group that fall in each category. A 2 x 2 contingency table has two groups and two categories. If desired. For more information. For more information. For more information. set the Chi-Square options. Each cell in the table lists the number of individuals for that combination of category and group. see “Arranging Chi-Square Data” on page 267. use McNemar’s Test.

For more information. see “Running a Chi-Square Test” on page 270. For more information. Specify the data format to use in the test in the Pick Columns dialog box. and Proportions 3. View and interpret the Chi-Square report. Columns 4 and 5 are raw data for the observations. For more information. 5. Run the test. Figure 5-1 Valid Data Formats a Chi Square Test Columns 1 through 3 in the worksheet above are arranged as a contingency table. see “Interpreting Results of a Chi-Squared Analysis of Contingency tablesInterpreting Results of a ChiSquared Analysis of Contingency tables” below. From the menus select: Statistics Rates and Proportions Chi-Square 4. Each row corresponds to a single . Arranging Chi-Square Data Analysis of contingency tables can be done directly from a contingency table entered in the worksheet or from two columns of raw data observations.267 Comparing Frequencies. Rates. see “Running a Chi-Square Test” on page 270.

Tabulated data is arranged in a contingency table using the worksheet rows and columns as the groups and categories. 2. Raw data uses a row for each individual observation. drag the pointer over your data. Setting Chi-Square Options Use the Chi-Square options to: Display the power of a performed test for Compare Proportion tests in the reports. Raw Data. and want to select your data before you run the test. Tabulated Data. . as the columns are longer than fifteen rows.268 Chapter 7 observation. From the menus select: Statistics Current Test Options The Options for Chi-Square dialog box appears. and places the corresponding groups for the observations in one column and the categories in a second column. To change Chi-Square options: 1. SigmaPlot automatically determines the number of groups and categories used. see “Data Format for Rate and Proportion Tests” on page 253. Enable the Yates Correction Factor. If you are going to run the test after changing test options. The number of observations for each combination of the group are entered into the appropriate cells. Note that not all the raw data points are shown. For more information.

Use Alpha Value. To accept the current settings and close the options dialog box. Select to detect the sensitivity of the test.05.05. Larger values of a make it easier to conclude that there is a difference. All options are saved between SigmaPlot sessions.269 Comparing Frequencies. but a greater possibility of concluding there is no difference when one exists. see “Running a Chi-Square Test” on page 270. The suggested value is α = 0. 4. but also increase the risk of reporting a false positive. Click a check box to enable or disable a test option. click OK. To continue the test. This indicates that a one in twenty chance of error is acceptable. Smaller values of a result in stricter requirements before concluding there is a significant difference. 5. click Run Test. or that you are willing to conclude there is a significant difference when P < 0. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. The power or sensitivity of a test is the probability that the test will detect a difference between the proportions of two groups if there is really a difference. . Options for Chi Square Power. For more information. Change the alpha value by editing the value in the Alpha Value box. Rates. and Proportions Figure 2-1 The Options for Chi-Square Dialog Box 3.

drag the pointer over your data. whereas the χ produced with real data is discrete. when compared with 2 2 the actual distribution of the χ test statistic. To run a Chi-Square Test: 1. 2 Running a Chi-Square Test To run a test. you need to select the data to test. i.e. When a statistical test uses a χ distribution with one degree of freedom. If you want to select your data before you run the test. The Yates correction is applied to 2 x 2 tables and other statistics where 2 the P value is computed from a χ distribution with one degree of freedom. Use the Pick Columns dialog box to select the worksheet columns with the data you want to test and to specify how your data is arranged in the worksheet.270 Chapter 7 The Yates Correction Factor.. Using the Yates correction makes a test more conservative. 2 the χ calculated tends to produce P values which are too small. . such as analysis of a 2 x 2 contingency table or McNemar’s test. For descriptions of the derivation of the Yates correction. 2 You can use the Yates Continuity Correction to adjust the computed χ value down to compensate for this discrepancy. it increases the P value and reduces the chance of a false positive conclusion. From the menus select: Statistics Rates and Proportions Chi-Square The Pick Columns dialog box appears prompting you to specify a data format. Click the check box to turn the Yates Correction Factor on or off. 2. The theoretical χ distribution is 2 continuous. you can reference any appropriate statistics reference.

select the columns in the worksheet.271 Comparing Frequencies. For tabulated data you are prompted to select up to 64 columns. 5. 4. If you are testing contingency table data. To assign the desired worksheet columns to the Selected Columns list. then select new column from the worksheet. If your data is arranged in raw format. The first selected column is assigned to the first Observation or Category row in the Selected Columns list. or select the columns from the Data for Observations or Category drop-down list. Rates. For more information. Select the appropriate data format from the Data Format drop-down list. and Proportions Figure 2-1 The Pick Columns for Chi-Square Test Dialog Box Prompting You to Specify a Data Format 3. You can also clear a column assignment by double-clicking it in the Selected Columns list. select Raw. The titles of selected columns appear in each row. select the assignment in the list. select Tabulated. the dialog box prompts you to pick your data. To change your selections. . and all successively selected columns are assigned to successive rows in the list. you are prompted to select two worksheet columns. If you have not selected columns. see “Arranging Chi-Square Data” on page 267. Click Next to pick the data columns for the test. the selected columns appear in the Selected Columns list. If you selected columns before you chose the test. For raw data. 6.

When the test is complete. To move to the next or the previous page in the report. the χ statistic 2 calculated from the distributions. 2 For descriptions of the derivations for χ test results. 2 Interpreting Results of a Chi-Squared Analysis of Contingency tables The report for a χ test lists a summary of the contingency table data. Suggests the Fisher Exact Test if the table is a 2 x 2 contingency table. the theoretical χ 2 distribution does not accurately describe the actual distribution of the χ test statistic. SigmaPlot either: Suggests that you redefine the groups or categories in the contingency table to reduce the number of cells and increase the number of observations per cell. 2 2 Note: The report scroll bars only scroll to the top and bottom of the current page.272 Chapter 7 Figure 6-1 The Pick Columns for Chi-Square Dialog Box Prompting You to Select Data Columns 7. If there are too many cells in a contingency table with expected values below 5. and the resulting P values may not be accurate. see “Interpreting Results of a Chi-Squared Analysis of Contingency tables” on page 272. Fisher Exact Test computes the exact two-tailed probability of observing a specific 2 x 2 contingency table. Click Finish to run the test. and the P value for χ . you can reference any appropriate statistics reference. When there are many cells with expected observations of 5 or less. . use the Up and Down buttons in the Formatting toolbar to move one page up and down in the report. the χ test report appears. and does not require that the expected frequencies in all cells 2 exceed 5. For more information.

expanded explanations of the results may also appear. . Expected Frequencies. You can also set the number of decimal places to display in the Options dialog box. Rates. Contingency Table Summary Each cell in the table is described with a set of statistics. The expected frequencies for each cell in the contingency table. and Proportions Figure 7-1 A Chi-Square Test Results Report Results Explanations In addition to the numerical results.273 Comparing Frequencies. You can turn off this text on the Options dialog box. obtained from the contingency table data. These are the number of observations per cell. as predicted using the row and columns percentages. Observed Counts.

2 If the value of χ is large. 2 Values of χ near zero indicate that the pattern in the contingency table is no different from what one would expect if the counts were distributed at random. Yates Correction. obtained by dividing the observed frequency counts in the cells by the total number of observations in that column.e. the greater the probability that the samples are drawn from populations with different distributions among the categories. Traditionally. you can conclude that the distributions are different (for example. or 2 This computation assumes that the rows and columns are independent. Chi-Square χ is the summed squared differences between the observed frequencies in each cell of the table and the expected frequencies. 2 P Value. Column Percentage. indicating that the rows and columns are independent).05. The Yates correction is enabled in the Options for Chi-Square dialog box. The P value is the probability of being wrong in concluding that there is a true difference in the distribution of the numbers of observations (i. or committing a Type I error. The smaller the P value. obtained by dividing the observed frequency counts in the cells by the total number of observations in that row. the probability of 2 falsely rejecting the null hypothesis. you conclude that there are significant differences when P < 0. The Yates correction is used to adjust the χ and therefore the P 2 value for 2 x 2 tables to more accurately reflect the true distribution of χ . and is only applied to 2 x 2 tables. based on χ ). obtained by dividing the observed frequency in the cells by the total number of observations in the table.274 Chapter 7 Row Percentage. The percentage of total number of observations in the contingency table. Total Cell Percentage.. The percentage of observations in each column of the contingency table. The percentage of observations in each row of the contingency table. . that there is a large differences between the expected and observed frequencies.

or sensitivity. The Fisher Exact Test Use the Fisher Exact Test to compare the distributions in a 2 x 2 contingency table that has 5 or less expected observations in one or more cells. 2 . which indicates that a one in twenty chance of error is acceptable.275 Comparing Frequencies. Larger values of α make it easier to conclude that there is a difference. but also increase the risk of seeing a false difference (a Type I error).05. Use the Fisher Exact Test instead of χ analysis of a 2 x 2 contingency table when the expected frequencies of one or more cells is less than 5. The suggested value is α = 0. and Proportions Power The power. SigmaPlot automatically suggests the Fisher Exact Test when a χ analysis of a 2 x 2 contingency table is performed and less than 5 expected observations are encountered in any cell. Rates. 2 If no cells have less than five expected observations. The closer the power is to 1. you can use a χ test. Chi-Square power is affected by the sample size and the observed proportions of the samples. Set the α value is set in the Power Option dialog box. This result is displayed if you selected this option in the Options for Chi-Square dialog box. About the Fisher Exact Test The Fisher Exact Test determines the exact probability of observing a specific 2 x 2 2 contingency table (or a more extreme pattern). but a greater possibility of concluding there is no difference when one exists (a Type II error). Smaller values of α result in stricter requirements before concluding there is a difference in distribution. Alpha ( α ) Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. of a Chi-Square test is the probability that the test will detect a difference among the groups if there really is a difference. the more sensitive the test. An α error is also called a Type I error (a Type I error is when you reject the hypothesis of no effect when this hypothesis is true).

exactly two rows by two columns. Arranging Fisher Exact Test Data The data of a Fisher Exact Test must form a 2 x 2 contingency table. Figure 3-1 Valid Data Formats for a Fisher Exact Test . For more information. Enter or arrange your data in the data worksheet. For more information.276 Chapter 7 Performing a Fisher Exact Test To perform a Fisher Exact Test: 1. For more information. 2. see “Running a Fisher Exact Test” on page 277. that is. see “Arranging Fisher Exact Test DataArranging Fisher Exact Test Data”. View and interpret the Fisher Exact Test report. see “Interpreting Results of a Fisher Exact Test” on page 279. 3. The data can be tabulated data in a 2 x 2 table entered in the worksheet or from two columns of raw data. From the menus select: Statistics Rates and Proportions Fisher Exact Test Run the test.

or vice versa. Running a Fisher Exact Test To run a test. 2. Rates. Use the Pick Columns dialog box to select the worksheet columns with the data you want to test and to specify how your data is arranged in the worksheet. If you want to select your data before you run the test. . Raw Data. you need to select the data to test. To run a Fisher Exact Test: 1. and the columns to represent the two categories. The number of individuals that fall into each combination of groups and categories is entered into each cell. and Proportions Columns 1 and 2 in the worksheet above are arranged as a 2 x 2 contingency table. From the menus select: Statistics Rates and Proportions Fisher Exact Test The Pick Columns dialog box appears prompting you to specify a data format. Tabulated Data. and columns 3 and 4 are the raw observation data. There should be no more than two rows and two columns. Raw data uses a row for each individual observation.277 Comparing Frequencies. drag the pointer over your data. There should be no more than two different groups and two types of categories. Tabulated or contingency table data uses the rows to represent the two groups. and places the corresponding groups for the observations in one column and the categories in a second column.

The title of selected columns appears in each row. you are prompted to select up two worksheet columns. select the assignment in the list. select Raw. . For raw data. To assign the desired worksheet columns to the Selected Columns list. select the columns in the worksheet. If you selected columns before you chose the test. For tabulated data you are prompted to select up to 64 columns. select Tabulated. the dialog box prompts you to pick your data. then select new column from the worksheet. The first selected column is assigned to the first Observation or Category row in the Selected Columns list.278 Chapter 7 Figure 2-1 The Pick Columns for Fisher Exact Test Dialog Box Prompting You to Specify a Data Format 3. If you have not selected columns. You can also clear a column assignment by double-clicking it in the Selected Columns list. If you are testing contingency table data. If your data is arranged in raw format. To change your selections. 5. Click Next to pick the data columns for the test. 4. the selected columns appear in the Selected Columns list. For more information. Select the appropriate data format from the Data Format drop-down list. and all successively selected columns are assigned to successive rows in the list. or select the columns from the Data for Observations or Category drop-down list. see “Arranging Fisher Exact Test DataArranging Fisher Exact Test Data”. 6.

use the Up and Down buttons in the formatting toolbar to move one page up and down in the report. and does not require that the expected frequencies in all cells exceed 5. SigmaPlot suggests the χ test instead.) Note: The Fisher Exact Test computes the exact two-tailed probabilities of observing a specific 2 x 2 contingency table. Click Finish to run the test. When the test is complete. the Fisher Exact Test report appears (see Interpreting Results of a Fisher Exact Test). If there are no cells in the table with expected values below 2 5.279 Comparing Frequencies. For descriptions of the derivations for Fisher Exact Test results. (You can use the Fisher Exact Test. To move to the next or the previous page in the report. Note: The report scroll bars only scroll to the top and bottom of the current page. Rates. Interpreting Results of a Fisher Exact Test Fisher Exact Test computes the two-tailed P value corresponding to the exact probability distribution of the table. 8. . but it takes longer to compute. The Fisher Exact Test is performed. and Proportions Figure 6-1 The Pick Columns for Fisher Exact Test Dialog Box Prompting You to Select Data Columns 7. you can reference any appropriate statistics reference.

280 Chapter 7 Figure 8-1 A Fisher Exact Test Results Report Results Explanations In addition to the numerical results.. . The smaller the P value. You can also set the number of decimal places to display in the Options dialog box.05. you conclude that there are significant differences when P < 0. expanded explanations of the results may also appear. Note: The Fisher Exact Test computes P directly using a two tailed probability. Traditionally. the greater the probability that the samples are drawn from populations with different distributions among the two categories. or committing a Type I error). You can turn off this text on the Options dialog box. P Value The P value is the two-tailed probability of being wrong in concluding that there is a true difference in the distribution of the numbers of observations (i. the probability of falsely rejecting the null hypothesis.e.

McNemar’s Test is similar to a regular analysis of a contingency table. and . or don’t know) of the same people before and after a report. obtained by dividing the observed frequency in the cells by the total number of observations in the table. Total Cell Percentage. obtained by dividing the observed frequency counts in the cells by the total number of observations in that column. it ignores individuals who responded the same way to the same treatments. Column Percentage. About McNemar’s Test McNemar’s Test is an analysis of contingency tables that have repeated observations of the same individuals. and Proportions Contingency Table Summary Each cell in the table is described with a set of statistics. Observed Counts. which uses observations before and after the treatment. Comparing the results of two different treatments or conditions that result in the same type of responses. These table designs are used when: Determining whether or not an individual responded to a treatment or change in condition.281 Comparing Frequencies. The percentage of observations in each row of the contingency table. surveying the opinion (approve. disapprove. obtained from the contingency table data. These are the number of observations per cell. Rates. obtained by dividing the observed frequency counts in the cells by the total number of observations in that row. Counting the distributions in the same categories after two different treatments or changes in condition. However. McNemar’s Test Use McNemar’s Test when you are: Making observations on the same individuals. Row Percentage. The percentage of total number of observations in the contingency table. The percentage of observations in each column of the contingency table. for example.

4. Select McNemar Test from the Standard toolbar. Performing McNemar’s Test To perform McNemar’s Test: 1. see “Arranging McNemar Test Data” on page 282. Arranging McNemar Test Data The data for McNemar’s Test must form a contingency table that has exactly the same number of rows and columns. For more information.282 Chapter 7 calculates the expected frequencies using the remaining cells as the average number of individuals who responded differently to the treatments. 6. View and interpret the McNemar Test report. set the McNemar’s Test options. 3. see “Interpreting Results of McNemar’s Test” on page 286. From the menus select: Statistics Run Current Test 5. 2. . If desired. Run the test. You can tabulate the data from a table that you enter in the worksheet or from two columns of raw data. For more information. Enter or arrange your data appropriately in the data worksheet.

Raw Data. The number of individuals that correspond to that combination of categories is entered into each cell. Raw data uses a row for each individual observation. the number of rows and columns in the table are always the same. Rates. There should be the same number of categories in each column. Specify the data format to use when running a test in the Pick Columns dialog box. For tabulated or contingency table data. the worksheet rows correspond to one set of treatment categories and the columns to the other set of treatment categories. Because the same set of categories are used for the two different treatments.283 Comparing Frequencies. Tabulated Data. The number of individuals that fall into each combination of the categories is entered into each cell. and Proportions Figure 6-1 A 3 x 3 Contingency Table describing the effect of a report on the opinion of surveyed people. Figure 6-2 Valid Data Formats for McNemar Test . and places the corresponding groups for the first treatment category in one column and the second treatment category in a second column. The categories assigned to the rows are assumed to be in the same order of occurrence as the columns.

3. drag the pointer over your data. Setting McNemar’s Options Use the McNemar Test options to enable the Yates Correction Factor. see “Options for McNemar’s” on page 285. For more information. . 2.284 Chapter 7 Columns 1 through 3 in the worksheet above are arranged as a 3 x 3 contingency table. To change McNemar Test options: 1. and columns 4 and 5 are raw observation data. Figure 3-1 Options for McNemar’s dialog box 4. Select McNemar Test from the Standard toolbar drop-down list. If you are going to run the test after changing test options and want to select your data before you run the test. Select Yates Correction Factor to include the Yates Correction Factor in the test report. From the menus select: Statistics Current Test Options The Options for McNemar’s dialog box appears.

The Yates correction is applied to 2 x 2 tables and other statistics 2 where the P value is computed from a χ distribution with one degree of freedom. and Proportions 5. 2 whereas the χ produced with real data is discrete. For descriptions of the derivation of the Yates correction. 6. Select McNemar’s Test from the Standard toolbar drop-down list 3. click Run Test. 2. it increases the P value and reduces the chance of a false positive conclusion. When a statistical test uses a χ distribution with one degree 2 of freedom. You can use the Yates Continuity Correction to adjust the computed R2value down to compensate for this discrepancy. click OK. you can reference any appropriate statistics reference. From the menus select: Statistics Rates and Proportions McNemar’s Test . Use the Pick Columns dialog box to select the worksheet columns with the data you want to test and to specify how your data is arranged in the worksheet. To run McNemar’s Test: 1. Rates. The theoretical χ distribution is continuous. Options for McNemar’s Yates Correction Factor. such as analysis of a 2 x 2 contingency table or McNemar’s test.285 Comparing Frequencies. drag the pointer over your data. If you want to select your data before you run the test. To continue the test. Using the Yates correction makes a test more conservative. To close the options dialog box and accept the current settings without continuing the test. for example. 2 Running McNemar’s Test To run the McNemar Test. you need to select the data to test. the χ calculated tends to produce P values which are too small when compared with the 2 2 actual distribution of the χ test statistic.

8. 2 Note: The report scroll bars only scroll to the top and bottom of the current page. then select new column from the worksheet. or select the columns from the Data for Observations or Category drop-down list. To change your selections. The title of selected columns appears in each row. Click Next to pick the data columns for the test. the selected columns appear in the Selected Columns list. For tabulated data you are prompted to select up to 64 worksheet columns. see “Arranging McNemar Test Data” on page 282. For raw data. If you selected columns before you chose the test. use the Up and Down buttons in the formatting toolbar to move one page up and down in the report. For more information. select the columns in the worksheet. For descriptions of the derivations of McNemar’s Test results. . select the assignment in the list. To assign the desired worksheet columns to the Selected Columns list. The first selected column is assigned to the first Observation or Category row in the Selected Columns list. 7.For more information. and all successively selected columns are assigned to successive rows in the list. you are prompted to select two worksheet columns. the χ statistic calculated from the distributions. If your data is arranged in raw format.286 Chapter 7 The Pick Columns dialog box appears prompting you to specify a data format. the dialog box prompts you to pick your data. you can reference any appropriate statistics reference. Click Finish to run the test. 5. 6. Interpreting Results of McNemar’s Test The report for McNemar’s Test lists a summary of the contingency table data. 4. You can also clear a column assignment by double-clicking it in the Selected Columns list. The McNemar’s test report appears. To move to the next or the previous page in the report. select Tabulated. and the P value. select Raw. see “Interpreting Results of McNemar’s Test” on page 286. If you are testing contingency table data. Select the appropriate data format from the Data Format drop-down list. If you have not selected columns.

287 Comparing Frequencies. 2 . Chi-Square χ is the summed squared differences between the observed frequencies in each cell of the table and the expected frequencies. and Proportions Figure 8-1 A McNemar Test Results Report Results Explanations In addition to the numerical results. ignoring observations on the diagonal cells of the table where the individuals responded identically to the treatments. You can turn off this text on the Options dialog box. You can also set the number of decimal places to display in the Options dialog box. expanded explanations of the results may also appear. Rates.

05. 2 Values of χ near zero indicate that the pattern in the contingency table is no different from what one would expect if the counts were distributed at random. Observed Counts. Traditionally. At the end of the study period. Relative Risk Test Use the Relative Risk Test to determine if a treatment or risk factor has an significant effect on the occurrence of some event. The P value is the probability of being wrong in concluding that there is a true difference in the distribution of the numbers of observations (i. 2 P Value. The expected frequencies for each cell in the contingency table. the greater the probability that the samples are drawn from populations with different distributions among the categories. the probability of falsely rejecting the null hypothesis. obtained from the contingency table data. The smaller the P value. based on ). or committing a Type I error. . About the Relative Risk Test A relative risk RR is the probability of the event in the treatment group divided by the probability of the event in the control group. It is usually computed for prospective studies in which the investigator selects two groups of subjects according to who did or did not receive the treatment. that there are differences between the expected and observed frequencies). Contingency Table Summary Each cell in the table is described with a set of statistics for that cell. as predicted using the row and columns percentages. These are the number of observations per cell. the number of subjects from each group who experienced the event is counted. Expected Frequencies.288 Chapter 7 Large values of the χ test statistic indicate that individuals responded differently to the different treatments (for example..e. you conclude that there are significant differences when P < 0. where each probability is estimated as the relative frequency of the event in the group.

2. . Tabulated data is arranged in a contingency table using the worksheet rows and columns as the groups and categories. 3. see “Interpreting Results of the Relative Risk Test” on page 293. Tabulated Data. If desired. set the Relative Risk options. For more information. View and interpret the Relative Risk Test report. see “Setting Relative Risk Test Options” on page 290. For more information. see “About the Odds Ratio Test” on page 295. and Proportions For more information. The first column selected always represents the observations that experienced the event of interest. Enter or arrange your data appropriately in the data worksheet. Select Relative Risk from the Standard toolbar. 6. Performing the Relative Risk Test To perform Relative Risk Test: 1. Arranging Relative Risk Test Data You can run a relative risk test using data from a contingency table entered in the worksheet or from two columns of raw data observations. 4. Run the test. see “Running the Relative Risk Test” on page 291. From the menus select: Statistics Run Current Test 5. Rates. see “Arranging Relative Risk Test Data” on page 289. For more information. For more information. Specify the data format to use in the test in the Pick Columns dialog box.289 Comparing Frequencies.

and want to select your data before you run the test. To change Relative Risk options: 1. Display the confidence interval for the data in Compare Proportion test reports.290 Chapter 7 Raw Data. 3. The first column contains the two levels for the event (for example. or cases versus controls). To accept the current settings and close the options dialog box. Enable the Yates Correction Factor. see “Options for Relative Risk” on page 291. To continue the test. The second column represents the two levels of treatment (treatment versus control. All options are saved between SigmaPlot sessions. event versus no event. drag the pointer over your data. 2. For more information. see “Running the Relative Risk Test” on page 291. see “Data Format for Rate and Proportion Tests” on page 253. Click a check box to enable or disable a test option. Use the first row of the selected data as the control group. click Run Test. For more information. For more information. Setting Relative Risk Test Options Use the Relative Risk options to: Display the power of a performed test for Compare Proportion tests in the reports. click OK. or risk versus no risk). . From the menus select: Statistics Current Test Options The Options for Relative Risk dialog box appears. The number of rows is the total number of observations in the study. 5. 4. If you are going to run the test after changing test options.

the χ calculated tends to produce P values which are too small when compared with the 2 2 actual distribution of the χ test statistic. Smaller values of α result in stricter requirements before concluding there is a significant difference.05. For descriptions of the derivation of the Yates correction. Running the Relative Risk Test To run the Relative Risk Test. The suggested value is α = 0. The power or sensitivity of a test is the probability that the test will detect a difference between the proportions of two groups if there is really a difference. This is the confidence interval for the difference of proportions. select the box and type any number from 1 to 99 (95 and 99 are the most commonly used intervals). 2 You can use the Yates Continuity Correction to adjust the computed χ value down to compensate for this discrepancy. Use the first row of the selected data as the control group. for example. it increases the P value and reduces the chance of a false positive conclusion. . Use Alpha Value. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. Yates Correction Factor. 2 whereas the χ produced with real data is discrete. Change the alpha value by editing the value in the Alpha Value box. but a greater possibility of concluding there is no difference when one exists. you need to select the data to test. Select to detect the sensitivity of the test. but also increase the risk of reporting a false positive.05. Using the Yates correction makes a test more conservative. 2 Confidence Interval. Rates. and Proportions Options for Relative Risk Power. The theoretical χ distribution is continuous. such as analysis of a 2 x 2 contingency table or McNemar’s test. Larger values of α make it easier to conclude that there is a difference. you can reference any appropriate statistics reference. Use the Pick Columns dialog box to select the worksheet columns with the data you want to test and to specify how your data is arranged in the worksheet. To change the specified interval.291 Comparing Frequencies. or that you are willing to conclude there is a significant difference when P < 0. This indicates that a one in twenty chance of error is acceptable. The Yates correction is applied to 2 x 2 tables and other statistics 2 where the P value is computed from a χ distribution with one degree of freedom. When a statistical test uses a χ distribution with one degree 2 of freedom.

If you want to select your data before you run the test. see “Arranging Relative Risk Test Data” on page 289. 2. Select Relative Risk Test from the Standard toolbar drop-down list 3. drag the pointer over your data. Select the appropriate data format from the Data Format drop-down list.292 Chapter 7 To run Relative Risk’s Test: 1. If your data is arranged in raw format. If you are testing contingency table data. From the menus select: Statistics Rates and Proportions Relative Risk Test The Pick Columns dialog box appears prompting you to specify a data format. select Raw. . To assign the desired worksheet columns to the Selected Columns list. Figure 3-1 The Pick Columns for Rates and Proportions Test Dialog Box Prompting You to Specify a Data Format 4. the selected columns appear in the Selected Columns list. If you selected columns before you chose the test. 5. 6. the dialog box prompts you to pick your data. Click Next to pick the data columns for the test. or select the columns from the Data for Observations or Category drop-down list. For more information. select the columns in the worksheet. select Tabulated. If you have not selected columns.

The title of selected columns appears in each row. . Contingency Table Summary Each cell in the table is described with a set of statistics. and Proportions The first selected column is assigned to the first Observation or Category row in the Selected Columns list. You can turn off this text on the Options dialog box. For more information. and all successively selected columns are assigned to successive rows in the list. then select new column from the worksheet. Interpreting Results of the Relative Risk Test Results Explanations In addition to the numerical results. select the assignment in the list. For raw data. For tabulated data you are prompted to select up to 64 worksheet columns. expanded explanations of the results may also appear. see “Interpreting Results of the Relative Risk Test” on page 293. you are prompted to select two worksheet columns. 7. You can also set the number of decimal places to display in the Options dialog box. To change your selections. You can also clear a column assignment by double-clicking it in the Selected Columns list.293 Comparing Frequencies. Figure 7-1 The Pick Columns for Rates and Proportions Test Dialog Box Prompting You to Select Data Columns 8. Click Finish to run the test. Rates. The Rates and Proportions test report appears.

Row Percentage. Expected Frequencies. obtained by dividing the observed frequency in the cells by the total number of observations in the table. The percentage of observations in each column of the contingency table. or committing a Type I error. These are the number of observations per cell. Chi-Square χ is the summed squared differences between the observed frequencies in each cell of the table and the expected frequencies.. obtained by dividing the observed frequency counts in the cells by the total number of observations in that column. obtained from the contingency table data. based on χ ). indicating that the rows and columns are independent).e. Yates Correction. The Yates correction is enabled in the Options for Chi-Square dialog box.294 Chapter 7 Observed Counts. 2 P Value. The Yates correction is used to adjust the χ and therefore the P 2 value for 2 x 2 tables to more accurately reflect the true distribution of χ . 2 If the value of χ is large. the probability of 2 falsely rejecting the null hypothesis. The percentage of observations in each row of the contingency table. Column Percentage. obtained by dividing the observed frequency counts in the cells by the total number of observations in that row. that there is a large differences between the expected and observed frequencies. you can conclude that the distributions are different (for example. or 2 This computation assumes that the rows and columns are independent. The . The P value is the probability of being wrong in concluding that there is a true difference in the distribution of the numbers of observations (i. The expected frequencies for each cell in the contingency table. 2 Values of χ near zero indicate that the pattern in the contingency table is no different from what one would expect if the counts were distributed at random. The percentage of total number of observations in the contingency table. Total Cell Percentage. as predicted using the row and columns percentages. and is only applied to 2 x 2 tables.

05. Larger values of α make it easier to conclude that there is a difference. the Odds Ratio test is done retrospectively. The Odds Ratio test determines how many from each . Set the α value is set in the Power Option dialog box. Odds Ratio Test Use the Odds Ratio Test to determine if a treatment or risk factor has an significant effect on the occurrence of some event. or Cases and Controls. Smaller values of α result in stricter requirements before concluding there is a difference in distribution. Rates. and Proportions smaller the P value.295 Comparing Frequencies. that are sampled from the population and either did or did not experience an event. Power The power. or sensitivity. This result is displayed if you selected this option in the Options for Chi-Square dialog box. The closer the power is to 1. The suggested value is α = 0. which indicates that a one in twenty chance of error is acceptable. Alpha ( α ) Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. It is usually computed for retrospective studies in which the investigator selects two groups of subjects according to who did or did not experience the event. Chi-Square power is affected by the sample size and the observed proportions of the samples.05. The number of subjects from each group who were exposed to the risk factor is then counted. Unlike the Relative Risk test. you conclude that there are significant differences when P < 0. the more sensitive the test. of a Chi-Square test is the probability that the test will detect a difference among the groups if there really is a difference. About the Odds Ratio Test A study that uses the odds ratio can also be called a case-control study. but also increase the risk of seeing a false difference (a Type I error). but a greater possibility of concluding there is no difference when one exists (a Type II error). Traditionally. the greater the probability that the samples are drawn from populations with different distributions among the categories. An α error is also called a Type I error (a Type I error is when you reject the hypothesis of no effect when this hypothesis is true). You identify two groups of subjects.

296 Chapter 7 group were exposed to the risk factor. The odds of not experiencing the event among the individuals not exposed to the risk factor is also measured to give a value Odds2. From the menus select: Statistics Run Current Test 5. see “Arranging Odds Ratio Test Data” on page 296 . For more information. For more information. . see “Interpreting Results of the Odds Ratio Test” on page 300. 6. Select Odds Ratio Test from the Standard toolbar. Specify the data format to use in the test in the Pick Columns dialog box. 3. 4. 2. The odds ratio OR is the ratio of these values: Performing the Odds Ratio Test To perform an Odds Ratio Test: 1. The odds of the event occurring among those individuals exposed to the risk factor is measured to give a value Odds1. View and interpret the Odds Ratio Test report. set the Odds Ratio Test options. For more information. see “Setting Odds Ratio Test Options” on page 297. Arranging Odds Ratio Test Data You can run an Odds Ratio test using data from a contingency table entered in the worksheet or from two columns of raw data observations. For more information. If desired. Run the test. Enter or arrange your data appropriately in the data worksheet. see ““Running the Odds Ratio Test” on page 298.

For more information. . The second column represents the two levels of treatment (treatment versus control. Use the first row of the selected data as the control group. 3. Click a check box to enable or disable a test option. Rates. To change Odds Ratio options: 1. Enable the Yates Correction Factor. 5. To continue the test. Raw Data. 2. see “Running the Odds Ratio Test” on page 298. To accept the current settings and close the options dialog box. click Run Test. event versus no event. Setting Odds Ratio Test Options Use the Odds Ratio options to: Display the power of a performed test for Compare Proportion tests in the reports. The first column contains the two levels for the event (for example.297 Comparing Frequencies. see“Options for Odds Ratio” on page 298. and want to select your data before you run the test. Tabulated data is arranged in a contingency table using the worksheet rows and columns as the groups and categories. For more information. The number of rows is the total number of observations in the study. and Proportions Tabulated Data. The first column selected always represents the observations that experienced the event of interest. see “Data Format for Rate and Proportion Tests” on page 253. click OK. If you are going to run the test after changing test options. or cases versus controls). 4. From the menus select: Statistics Current Test Options The Options for Odds Ratio dialog box appears. drag the pointer over your data. For more information. or risk versus no risk). Display the confidence interval for the data in Compare Proportion test reports. All options are saved between SigmaPlot sessions.

it increases the P value and reduces the chance of a false positive conclusion. This indicates that a one in twenty chance of error is acceptable. you can reference any appropriate statistics reference. To change the specified interval. you need to select the data to test. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. select the box and type any number from 1 to 99 (95 and 99 are the most commonly used intervals). Change the alpha value by editing the value in the Alpha Value box. 2 whereas the χ produced with real data is discrete. for example. Smaller values of α result in stricter requirements before concluding there is a significant difference. When a statistical test uses a χ distribution with one degree 2 of freedom.05. This is the confidence interval for the difference of proportions.05. Running the Odds Ratio Test To run the Odds Ratio Test. the χ calculated tends to produce P values which are too small when compared with the 2 2 actual distribution of the χ test statistic. Using the Yates correction makes a test more conservative. The Yates correction is applied to 2 x 2 tables and other statistics 2 where the P value is computed from a χ distribution with one degree of freedom. Use the first row of the selected data as the control group. Yates Correction Factor. 2 Confidence Interval. but also increase the risk of reporting a false positive. 2 You can use the Yates Continuity Correction to adjust the computed χ value down to compensate for this discrepancy. The suggested value is α = 0. Use the Pick Columns dialog box to select the worksheet columns with the data you want to test and to specify how your data is arranged in the worksheet. Use Alpha Value. The power or sensitivity of a test is the probability that the test will detect a difference between the proportions of two groups if there is really a difference. but a greater possibility of concluding there is no difference when one exists. Larger values of α make it easier to conclude that there is a difference. or that you are willing to conclude there is a significant difference when P < 0. Select to detect the sensitivity of the test.298 Chapter 7 Options for Odds Ratio Power. such as analysis of a 2 x 2 contingency table or McNemar’s test. The theoretical χ distribution is continuous. For descriptions of the derivation of the Yates correction. .

select the columns in the worksheet. 2. If you have not selected columns. see “Arranging Odds Ratio Test Data” on page 296. then select new column from the worksheet. and all successively selected columns are assigned to successive rows in the list. For more information. 8.For more information. If you selected columns before you chose the test. You can also clear a column assignment by double-clicking it in the Selected Columns list. 4. . Rates. see “Interpreting Results of the Odds Ratio Test” on page 300. 5. and Proportions To run Odds Ratio Test: 1. Click Finish to run the test. To assign the desired worksheet columns to the Selected Columns list. The first selected column is assigned to the first Observation or Category row in the Selected Columns list. From the menus select: Statistics Rates and Proportions Odds Ratio Test The Pick Columns dialog box appears prompting you to specify a data format. To change your selections. The Odds Ratio test report appears. If your data is arranged in raw format. Select Odds Ratio Test from the Standard toolbar drop-down list 3. For tabulated data you are prompted to select up to 64 worksheet columns. If you are testing contingency table data. select the assignment in the list. Select the appropriate data format from the Data Format drop-down list. 6. If you want to select your data before you run the test. drag the pointer over your data. the selected columns appear in the Selected Columns list.299 Comparing Frequencies. The title of selected columns appears in each row. or select the columns from the Data for Observations or Category drop-down list. 7. select Raw. select Tabulated. you are prompted to select two worksheet columns. Click Next to pick the data columns for the test. For raw data. the dialog box prompts you to pick your data.

The percentage of observations in each column of the contingency table. Observed Counts. The expected frequencies for each cell in the contingency table. obtained by dividing the observed frequency in the cells by the total number of observations in the table. or 2 This computation assumes that the rows and columns are independent. Total Cell Percentage. expanded explanations of the results may also appear. These are the number of observations per cell. as predicted using the row and columns percentages. You can also set the number of decimal places to display in the Options dialog box. obtained by dividing the observed frequency counts in the cells by the total number of observations in that row. Row Percentage. The percentage of total number observations in the contingency table. Chi-Square χ is the summed squared differences between the observed frequencies in each cell of the table and the expected frequencies. obtained by dividing the observed frequency counts in the cells by the total number of observations in that column. You can turn off this text on the Options dialog box.300 Chapter 7 Interpreting Results of the Odds Ratio Test Results Explanations In addition to the numerical results. obtained from the contingency table data. Expected Frequencies. . The percentage of observations in each row of the contingency table. Column Percentage. Contingency Table Summary Each cell in the table is described with a set of statistics.

Chi-Square power is affected by the sample size and the observed proportions of the samples. An a error is also called a Type I error (a Type I error is when you reject the hypothesis of no effect when this hypothesis is true). 2 Values of χ near zero indicate that the pattern in the contingency table is no different from what one would expect if the counts were distributed at random. which indicates that a one in twenty chance of error is acceptable. Larger values of α make it easier to conclude that there is a difference. or sensitivity.05. The closer the power is to 1. Smaller values of α result in stricter requirements before concluding there is a difference in distribution. The P value is the probability of being wrong in concluding that there is a true difference in the distribution of the numbers of observations (i. The suggested value is α = 0. you can conclude that the distributions are different (for example. The Yates correction is used to adjust the χ and therefore the P 2 value for 2 x 2 tables to more accurately reflect the true distribution of χ .05. but also increase the risk of seeing a false difference (a Type I error). The α value is set in the Power Option dialog box.. but a greater possibility of concluding there is no difference when one exists (a Type II error). the greater the probability that the samples are drawn from populations with different distributions among the categories. you conclude that there are significant differences when P < 0. of a Chi-Square test is the probability that the test will detect a difference among the groups if there really is a difference. based on χ ). The Yates correction is enabled in the Options for Chi-Square dialog box. . and Proportions If the value of χ is large. This result is displayed if you selected this option in the Options for Chi-Square dialog box. that there is a large difference between the expected and observed frequencies. or committing a Type I error. and is only applied to 2 x 2 tables. Rates.301 Comparing Frequencies. Traditionally. The smaller the P value. indicating that the rows and columns are independent). Alpha ( α ) Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. the more sensitive the test.e. 2 Yates Correction. the probability of 2 falsely rejecting the null hypothesis. 2 P Value. Power The power.

302 Chapter 7 .

For more information. Spearman Rank Order Correlation. Polynomial Regression. About Regression. see “Spearman Rank Order Correlation” on page 438. see “Multiple Logistic Regression” on page 348. For more information. For more information. For more information. For more information. see “Best Subsets Regression” on page 424. For more information. Best Subsets Regression. see “Polynomial Regression” on page 369. Stepwise Linear Regression. Data Format for Regression and Correlation. For more information. see “Pearson Product Moment Correlation” on page 433.Chapter 8 Prediction and Correlation Prediction uses regression and correlation techniques to describe the relationship between two or more variables. For more information. see “Correlation” on page 305. Simple Linear Regression. Correlation. Multiple Logistic Regression. For more information. see “Stepwise Linear Regression” on page 387. see “Simple Linear Regression” on page 306. see “Multiple Linear Regression” on page 325. 303 . For more information. For more information. see “About Regression” on page 304. Multiple Linear Regression. For more information. see “Data Format for Regression and Correlation” on page 305. Pearson Product Moment Correlation. see “Choosing the Prediction or Correlation Method” on page 39.

For more information.. see“Multiple Logistic Regression” on page 348. Multiple Linear Regression is similar to simple linear regression. You can perform regressions using seven different methods. both forwards and backwards. variable.xk are the k independent variables. The independent variables are the known. For example. Simple Linear Regression. Polynomial Regression.304 Chapter 8 About Regression Regression procedures use the values of one or more independent variables to predict the value of a dependent variable. When the independent variables are varied. when graphed on a Cartesian coordinate system. see “Multiple Linear Regression” on page 325. plane. Stepwise Regression. and b1 is the slope.b1. see “Polynomial Regression” on page 369.x3. Multiple Logistic Regression.b2.. For more information. For more information. Simple Linear Regression uses the equation for a straight line y=b0b1x where y is the dependent variable. or curve. where y is the dependent variable. For more information.. see “Stepwise Linear Regression” on page 387. For more information.. depending on the sign of b1. they result in a corresponding value for the dependent. produces a straight line. Best Subset Regression. or response. Regression assumes an association between the independent and dependent variables that. or predictor. x1. Multiple Linear Regression.x2. the corresponding value for y either increases or decreases by bk depending on the sign of . As the values for xi increase by 1. the corresponding values for Y either increase or decrease by b1. x is the independent variable. but uses multiple independent variables to fit the general equation for a multidimensional plane.. and b0. the point where the regression line intersects the Y axis). variables. or regression coefficient (increase in the value of Y per unit increase in X). or constant term (the value of the dependent variable when x=0. Regression finds the equation that most closely describes the actual data. see “Best Subsets Regression” on page 424. b0 is the intercept.bk are the k regression coefficients.. As the values for X increase by 1. For more information. see “Simple Linear Regression” on page 306 .

A correlation of +1 indicates there is a perfect positive relationship between the two variables. a nonparametric association test that does not require assuming normality or constant variance of the residuals. The Spearman Rank Order Correlation. A correlation of ‚ -1 indicates there is a perfect negative relationship between the two variables. Because the regression coefficients are computed by minimizing the sum of squared residuals. Data Format for Regression and Correlation Data for all regression and correlation procedures consists of the dependent variables (usually the "y" data) in one column. see “Spearman Rank Order Correlation” on page 438. a parametric statistic which assumes a normal distribution and constant variance of the residuals. Correlation Correlation procedures measure the strength of association between two variables. For more information. or you will receive an error message. this technique is often called least squares regression. A correlation of 0 indicates no relationship between the two variables. including missing values. with both always increasing together.305 Prediction and Correlation b k. Regression ignores rows containing missing data points within columns of data (indicated with a double dash "--"). one column for each independent variable. and the independent variables (usually the "x" data) in one or more additional columns. All the columns must be of equal length. Unlike regression. see “Pearson Product Moment Correlation” on page 433. . it is not necessary to define one variable as the independent variable and one as the dependent variable. There are two types of correlation coefficients. The correlation coefficient r is a number that varies between ‚-1 and +1. with one always decreasing as the other increases. The Pearson Product Moment Correlation. Regression is a parametric statistical method that assumes that the residuals (differences between the predicted and observed values of the dependent variables) are normally distributed with constant variance. For more information. which can be used as a gauge of the certainty of prediction.

and b1 is the slope. the value of the dependent variable. x is the independent variable. it produces a corresponding value for the dependent. the corresponding value for y either increases or decreases by b1 Linear Regression is a parametric test. You know there is exactly one independent variable. b0 is the intercept. The equation used for a Simple Linear Regression is the equation for a straight line. the point where the regression line intersects the y axis). About the Simple Linear Regression Linear Regression assumes an association between the independent and dependent variable that. If you know there is more than one independent variable. As the values for x increase. or constant term (value of the dependent variable when x=0. or regression coefficient (increase in the value of y per unit increase in x). use multiple linear regression. The independent variable is the known. When the independent variable is varied. variable. the possible values for the dependent variable are assumed to be normally distributed with constant variance around the regression line. by fitting a straight line through the data. when graphed on a Cartesian coordinate system. See the Selecting Data Columns sections under each test for information on selecting blocks of data instead of entire columns. that is. variable. Simple Linear Regression Use Linear Regression when: You want to predict a trend in data. . the columns must be adjacent.306 Chapter 8 If you plan to test blocks of data instead of picking columns. produces a straight line. given the observed value of the independent variable. or y=b0b1x where y is the dependent variable. for a given independent variable value. such as time or temperature. or predicts. Linear Regression finds the straight line that most closely describes. or predicted. and the left-most column is assumed to be the dependent variable. the corresponding value for y either increases or decreases by b1 is the slope. or regression coefficient (increase in the value of y per unit increase in x). As the values for x increase. or predict the value of a variable from the value of another variable. or response.

Select Linear Regression from the Standard toolbar or rom the menus select: Statistics Regression Linear 4. Arranging Linear Regression data Place the data for the observed dependent variable in one column and the data for the corresponding independent variable in a second column. 2. Display confidence intervals and save them to the worksheet. For more information. Enter or arrange your data in the worksheet. see “Interpreting Simple Linear Regression Results” on page 316. Observations containing missing values are ignored. and both columns must be equal in length. If desired. For more information. . For more information. View and interpret the Linear Regression report. 6. 3. set the Linear Regression options. For more information. Setting Linear Regression Options Use the Linear Regression options to: Set assumption checking options.307 Prediction and Correlation Performing a Linear Regression To perform a Simple Linear Regression: 1. Generate report graphs. Specify the residuals to display and save them to the worksheet. 5. see “Simple Linear Regression Report Graphs” on page 325. Run the test. For more information. see “Arranging Linear Regression data” on page 307. see “Running a Linear Regression” on page 315. see “Setting Linear Regression Options” on page 307.

For more information. see “Options for Nonlinear Regression: Residuals” on page 310. For more information. For more information. Click the Assumption Checking tab to return to the Normality. and Standardized Coefficients options. see “Options for Nonlinear Regression: Other Diagnostics” on page 313. Other Diagnostics. Constant Variance. Click the Other Diagnostics tab to view the Influence and Power options. To accept the current settings and close the options dialog box. Options settings are saved between SigmaPlot sessions. 2. click Run Test. To change Linear Regression options: 1.308 Chapter 8 Display the PRESS Prediction Error and standardized regression coefficients. . If you are going to run the test after changing test options. 6. 5. Click the More Statistics tab to view the confidence intervals. drag the pointer over your data. To continue the test. click OK. More Statistics. Display power. Specify tests to identify outlying or influential data points. PRESS Prediction Error. 3. and want to select your data before you run the test. For more information. see “Interpreting Simple Linear Regression Results” on page 316. For more information. and Durbin-Watson options. Click the Residuals tab to view the residual options. Select Linear Regression from the drop-down list on the Standard toolbar. see “Options for Nonlinear Regression: More Statistics” on page 312. From the menus select: Statistics Current Test Options The Options for Linear Regression dialog box appears with four tabs: Assumption Checking. 4. Residuals. see “Options for Nonlinear Regression: Assumption Checking” on page 309. Select a check box to enable or disable a test option.

When this correlation is significant. a P value of 0. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions. SigmaPlot tests for constant variance by computing the Spearman rank correlation between the absolute values of the residuals and the observed value of the dependent variable. To relax the requirement of normality and/or constant variance. If the P computed by the test is greater than the P set here. 0. P Values for Normality and Constant Variance. Larger values of P (for example. These options test your data for its suitability for regression analysis by checking three assumptions that a linear regression makes about the data.e.05. All assumption checking options are selected by default. Only disable these options if you are certain that the data was sampled from normal populations with constant variance and that the residuals are independent of each other.10) require less evidence to conclude that the residuals are not normally distributed or the constant variance assumption is violated. That the residuals are independent of each other.01 for the normality test . decrease P. one that more closely follows the pattern of the data). the suggested value in SigmaPlot is 0. Requiring smaller values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal. and Durbin-Watson options. Constant Variance Testing. SigmaPlot uses the Kolmogorov-Smirnov test to test for a normally distributed population. Constant Variance. or transforming one or more of the independent variables to stabilize the variance. Normality Testing. and you should consider trying a different model (i. To require a stricter adherence to normality and/or constant variance.309 Prediction and Correlation Options for Nonlinear Regression: Assumption Checking Select the Assumption Checking tab from the options dialog box to view the Normality. The P value determines the probability of being incorrect in concluding that the data is not normally distributed (P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed). the constant variance assumption may be violated.. increase the P value. For example. the test passes. A linear regression assumes: That the source population is normally distributed about the regression. The variance of the dependent variable in the source population is constant regardless of the value of the independent variable(s).

Enter the acceptable deviation from 2.0. the Durbin-Watson statistic will be 2. To require a stricter adherence to independence. Studentized.5 or less than 1. i. SigmaPlot uses the Durbin-Watson statistic to test residuals for their independence of each other. If the residuals are not correlated.0 that you consider as evidence of a serial correlation in the Difference for 2. decrease the acceptable difference from 2.0 more than the entered value. Difference from 2 Value. Click the selected check box if you do not want to include raw residuals in the worksheet. If the computed DurbinWatson statistic deviates from 2. To relax the requirement of independence.50. there are extreme conditions of data distribution that these tests cannot detect.0 box. Studentized Deleted. and Report Flagged Values Only options. However. increase the acceptable difference from 2. Standardized. and the deviation between the observation and the regression line at one time are related to the deviation at the previous time. If you . select the worksheet column you want to save the predicted values to from the corresponding drop-down list.e. Tip: Although the assumption tests are robust in detecting data from populations that are non-normal or with non-constant variances. Durbin-Watson Statistic values greater than 2. Use this option to calculate the predicted value of the dependent variable for each observed value of the independent variable(s). Predicted Values. The residuals are often correlated when the independent variable is time.310 Chapter 8 requires greater deviations from normality to flag the data as non-normal than a value of 0.0.05. SigmaPlot warns you that the residuals may not be independent.5 flag the residuals as correlated. Raw. The Durbin-Watson statistic is a measure of serial correlation between the residuals.. Options for Nonlinear Regression: Residuals Select the Residuals tab in the options dialog box to view the Predicted Values. Durbin-Watson Statistic. these conditions should be easily detected by visually examining the data without resorting to the automatic assumption tests. then save the results to the worksheet. The suggested deviation value is 0. To assign predicted values to a worksheet column.

5. make sure this check box is selected. Click the selected check box if you do not want to include Studentized residuals in the worksheet. You can change which data points are flagged by editing the value in the Flag Values > edit box. To include standardized residuals in the report.. The suggested residual value is 2. To assign the raw residuals to a worksheet column. make sure this check box is selected. i. so the t distribution can be used to define "large" values of the Studentized residuals. SigmaPlot automatically flags data points with "large" values of the Studentized residuals. i. make sure this check box is selected. Studentized Deleted Residuals. These data points are considered to have "large" standardized residuals. and is a measure of variability around the regression line. outlying data points. except that the residual values are obtained by computing the regression equation without using the data point in question. Click the selected check box if you do not want to include raw residuals in the worksheet. SigmaPlot automatically flags data points lying outside of the confidence interval specified in the corresponding box. To include raw residuals in the report. Studentized Residuals. The standard error of the residuals is essentially the standard deviation of the residuals. the values appear in the report but are not assigned to the worksheet.e.311 Prediction and Correlation select none and the Predicted Values check box is selected. the values appear in the report but are not assigned to the worksheet. . outlying data points. Studentized residuals scale the standardized residuals by taking into account the greater precision of the regression line near the middle of the data versus the extremes. The raw residuals are the differences between the predicted and observed values of the dependent variables. The Studentized residuals tend to be distributed according to the Student t distribution. Studentized deleted residuals are similar to the Studentized residual. the suggested data points flagged lie outside the 95% confidence interval for the regression population. select the number of the desired column from the corresponding drop-down list. Standardized Residuals. Click the selected check box if you do not want to include raw residuals in the worksheet.. If you select none from the drop-down list and the Raw check box is selected. To include Studentized residuals in the report. The standardized residual is the residual divided by the standard error of the estimate.e. Raw Residuals.

312 Chapter 8 To include Studentized deleted residuals in the report. To include only the flagged standardized and Studentized deleted residuals in the report. make sure the Regression check box is selected. regression. Click the selected check box if you do not want to include the confidence intervals for the population in the report. You can set the confidence interval for the population. Click the selected check box if you do not want to include the confidence intervals for the population in the report. i. Click the selected check box if you do not want to include Studentized deleted residuals in the worksheet. Click the selected check box if you do not want to include the confidence intervals for the population in the report. To include confidence intervals for the population in the report. Note: Both Studentized and Studentized Deleted residuals use the same confidence interval setting to determine outlying points. To include confidence intervals for the regression in the report. make sure the Population check box is selected. make sure the Report Flagged Values Only check box is selected. Report Flagged Values Only. then specify a confidence level by entering a value in the percentage box. Confidence Interval for the Regression. The confidence interval for the population gives the range of values that define the region that contains the population from which the observations were drawn. outlying data points. The suggested confidence level for all intervals is 95%. The confidence level can be any value from 1 to 99. Options for Nonlinear Regression: More Statistics Select the More Statistics tab in the Options for Nonlinear Regression dialog box to view the confidence interval options. SigmaPlot can automatically flag data points with "large" values of the Studentized deleted residual. Confidence Interval for the Population. with the specified level of confidence. Clear this option to include all standardized and Studentized residuals in the report. make sure this check box is selected. or both and then save them to the worksheet. . The confidence interval for the regression line gives the range of values that defines the region containing the true mean relationship between the dependent and independent variables. the suggested data points flagged lie outside the 95% confidence interval for the regression population..e.

Influence options automatically detect instances of influential data points. Options for Nonlinear Regression: Other Diagnostics Select the Other Diagnostics tab in the Options for Nonlinear Regression dialog box to view the Influence options. increase this value. To save the confidence intervals to the worksheet.. to flag less influential points. . where small changes in the independent variables can have large effects on the predicted values of the dependent variable. The suggested value is 2.0 standard errors.313 Prediction and Correlation Saving Confidence Intervals to the Worksheet. Select DFFITS to compute this value for all points and flag influential points. decrease this value. Leverage is used to identify the potential influence of a point on the results of the regression equation. those with DFFITS greater than the value specified in the Flag Values > edit box. Leverage. select the column number of the first column you want to save the intervals to from the Starting in Column drop-down list. DFFITS. Observations with high leverage tend to be at the extremes of the independent variables. i. DFFITSi is the number of estimated standard errors that the predicted value changes for the ith data point when it is removed from the data set. which indicates that the point has a strong influence on the data. To avoid flagging more influential points. PRESS Prediction Error. The PRESS Prediction Error is a measure of how well the regression equation fits the data. The selected intervals are saved to the worksheet starting with the specified column and continuing with successive columns in the worksheet. These points can have a potentially disproportionately strong influence on the calculation of the regression line. Leave this check box selected to evaluate the fit of the equation using the PRESS statistic. Influence. Most influential points are data points which are outliers. It is another measure of the influence of a data point on the prediction used to compute the regression coefficients. Predicted values that change by more than two standard errors when the data point is removed are considered to be influential. Clear the selected check box if you do not want to include the PRESS statistic in the report. You can use several influence tests to identify and quantify influential points. Leverage depends only on the value of the independent variable(s). that is. they do not "line up" with the rest of the data points.e.

make sure you’ve selected Report Flagged Values Only. To include only the influential points flagged by the influential point tests in the report. Clear this option to include all influential points in the report. caused by an error in observation or data entry. i. you may be able to justify deleting the data point. those with a Cook’s distance greater than the specified value.0 times the expected leverage for the regression. Cook’s distances exceeding 4 indicate that the point has a major effect on the values of the parameter estimates. or a Nonlinear Regression.e. increase this value: to flag less influential points. Cook’s distance assesses how much the values of the regression coefficients change if a point is deleted from the analysis. To avoid flagging more influential points. Cook’s distances above 1 indicate that a point is possibly influential. Observations with leverages much higher than the expected leverages are potentially influential points. What to Do About Influential Points: Influential points have two possible causes: There is something wrong with the data point. For descriptions of how to handle influential points. Cook’s distance depends on both the values of the independent and dependent variables. those points that could have leverages greater than the specified value times the expected leverage. try regression with different independent variables.e. If you do not know the correct value. lower this value. lower this value. Report Flagged Values Only.. Cook’s Distance is a measure of how great an effect each point has on the estimates of the parameters in the regression equation. to flag points with less potential influence. you can reference an appropriate statistics reference. Cook’s Distance. To avoid flagging more potentially influential points.. The suggested value is 4. The model is incorrect. i.0. The suggested value is 2. where there are k independent variables and n data points. correct the value. . If the model appears to be incorrect. Select Leverage to compute the leverage for each point and automatically flag potentially influential points.314 Chapter 8 The expected leverage of a data point is. increase this value. Select Cook’s Distance to compute this value for all points and flag influential points. If you made a mistake in data collection or entry.

4. Smaller values of α result in stricter requirements before concluding there is a significant relationship. This indicates that a one in twenty chance of error is acceptable.05. or select the columns from the Data for Dependent or Data for Independent drop-down list. Select Power to compute the power for the linear regression data. You use the Pick Columns dialog box to select the worksheet columns with the data you want to test. select the columns in the worksheet. If you selected columns before you chose the test. but also increase the risk of reporting a false positive. If you have not selected columns. If you want to select your data before you run the test. and the second column is assigned to independent row in the list. The suggested value is α = 0. The title of selected . the dialog box prompts you to pick your data. or that you are willing to conclude there is a significant relationship when P < 0. drag the pointer over your data. the columns appear in the Selected Columns list. Larger values of α make it easier to conclude that there is a relationship. Running a Linear Regression To run a Simple Linear Regression. The first selected column is assigned to the dependent row in the Selected Columns list. 2. The alpha ( α ) is the acceptable probability of incorrectly concluding there is a relationship. you need to select the data to test.05. To assign the desired worksheet columns to the Selected Columns list. To run a Linear Regression: 1. From the menus select: Statistics Regression Linear The Pick Columns for Linear Regression dialog box appears. but a greater possibility of concluding there is no relationship when one exists. Select Linear Regression from the toolbar drop-down list.315 Prediction and Correlation Power. 3. The power of α regression is the power to detect the observed relationship in the data. Change the alpha value by editing the number in the Alpha Value edit box.

R2 .316 Chapter 8 columns appear in each row. constant variance. the Simple Linear Regression report appears. SigmaPlot performs the tests for normality (Kolmogorov-Smirnov). R. Tip: The report scroll bars only scroll to the top and bottom of the current page. use the buttons in the formatting toolbar to move one page up and down in the report. You can also set the number of decimal places to display in the Options dialog box. You can turn off this text on the Options dialog box. select the assignment in the list. To change your selections. Interpreting Simple Linear Regression Results The report for a Linear Regression displays the equation with the computed coefficients for the line. If your data fail either of these tests. Click Finish to run the regression. The other results displayed in the report are enabled and disabled Options for Linear Regression dialog box. they are placed in the specified column and are labeled by content and source column. To move to the next or the previous page in the report. You can also clear a column assignment by double-clicking it in the Selected Columns list. and independent residuals. For more information. For descriptions of the computations of these results. constant variance. 5. you can reference an appropriate statistics reference. If you elected to test for normality. You can only select one dependent and one independent data column. expanded explanations of the results may also appear. and independent residuals. see “Setting Linear Regression Options” on page 307. then select new column from the worksheet. When the test is complete. 6. . Result Explanations In addition to the numerical results. If you selected to place predicted values and residuals in the worksheet. SigmaPlot warns you.

and the number of observations containing missing values (if any) that were omitted from the regression. R Squared. The adjusted R sqr 2 Standard Error of the Estimate The standard error of the estimate S y x Statistical Summary Table Coefficients. and Adj R Squared R. the correlation coefficient. and equals 1 when you can perfectly predict the dependent variable from the independent variable. These values are used to compute t and confidence intervals for the regression. R. The true regression coefficients of the underlying population generally fall within about two standard errors of the observed sample coefficients. The value for the constant (intercept) and coefficient of the independent variable (slope) for the regression model are listed. x is the independent variable. or intercept (value of the dependent variable when x = 0. the point where the regression line intersects the y axis). b0 is the constant. The standard errors of the intercept and slope are measures of the precision of the estimates of the regression coefficients (analogous to the standard error of the mean). and R2 R equals 0 when the values of the independent variable do not allow any prediction of the dependent variables. Standard Error. t is the ratio of the regression coefficient to its standard error. The t statistic tests the null hypothesis that the coefficient of the independent variable is zero. Adjusted R Squared. are also displayed. The number of observations N. and b1 is the slope (increase in the value of y per unit increase in x). This equation takes the form: y=b0b1x where y is the dependent variable. or . that is.317 Prediction and Correlation Regression Equation This is the equation for a line with the values of the coefficients—the intercept (constant) and the slope—in place. the independent variable does not contribute to predicting the dependent variable. t Statistic.

sx = standard deviation of the independent variable x. Analysis of Variance (ANOVA) Table The ANOVA (analysis of variance) table lists the ANOVA statistics for the regression and the corresponding F value. Beta (Standardized Coefficient b) This is the coefficient of the independent variable standardized to dimensionless values. and sy = standard deviation of dependent variable y. This result is displayed unless the Standardized Coefficients option is disabled in the Options for Linear Regression dialog box. based on t).05. Degrees of freedom represent the number of observations and variables in the regression equation. where b 1 = regression coefficient. that the coefficient is not zero). The regression degrees of freedom is a measure of the number of independent variables in the regression equation (always 1 for simple linear regression). DF (Degrees of Freedom). P is the P value calculated for t. or committing a Type I error. The P value is the probability of being wrong in concluding that there is a true association between the variables (for example. you can conclude that the independent variable can be used to predict the dependent variable when P < 0. P Value. the probability of falsely rejecting the null hypothesis. . Traditionally. the greater the probability that the independent variable can be used to predict the dependent variable.318 Chapter 8 You can conclude from "large" t values that the independent variable can be used to predict the dependent variable (for example. The smaller the P value.

the slope of the line is different . SS (Sum of Squares). or The residual mean square is also equal to F Statistic.319 Prediction and Correlation The residual degrees of freedom is a measure of the number of observations less the number of terms in the equation. The mean square regression is a measure of the variation of the regression from the mean of the dependent variable. The sum of squares are measures of variability of the dependent variable. The total degrees of freedom is a measure of total observations. which are the differences between the observed values of the dependent variable and the values predicted by regression model The total sum of squares (SStot ) is a measure of the overall variability of the dependent variable about its mean value. you can conclude that the independent variable contributes to the prediction of the dependent variable (for example. The F test statistic gauges the contribution of the independent variable in predicting the dependent variable. It is the ratio If F is a large number. The residual sum of squares (SSres ) is a measure of the size of the residuals. The sum of squares due to regression (SSreg ) measures the difference of the regression line from the mean of the dependent variable. MS (Mean Square). The mean square provides two estimates of the population variances. Comparing these variance estimates is the basis of analysis of variance. or The residual mean square is a measure of the variation of the residuals about the regression line.

5). The smaller the PRESS statistic. If the Durbin-Watson value deviates from 2 by more than the value set in the Options for Linear Regression dialog box.50 (for example. the data is consistent with the null hypothesis that all the samples are just randomly distributed about the population mean. the greater the probability that there is an association. with that point deleted from the computation of the regression equation. The suggested trigger value is a difference of more than 0. Durbin-Watson Statistic The Durbin-Watson statistic is a measure of correlation between the residuals. the P value for the ANOVA is identical to the P value associated with the t of the slope coefficient.320 Chapter 8 from zero. you can conclude that the independent variable can be used to predict the dependent variable when P < 0. regardless of the value of the independent variable). If the residuals are not correlated. Note: In simple linear regression. Traditionally.e. the probability of falsely rejecting the null hypothesis. If the F ratio is around 1. The PRESS statistic is computed by summing the squares of the prediction errors (the differences between predicted and observed values) for each observation.5 or over 2. PRESS Statistic PRESS. the DurbinWatson test is used to check this assumption. based on F). and the "unexplained variability" is smaller than what is expected from random sampling variability). the more this value differs from 2. if the Durbin-Watson statistic is below 1. The smaller the P value. This result appears if it was selected in the Regression Options dialog box. a warning appears in the report. is a gauge of how well a regression model predicts new data. and F=t2. you can conclude that there is no association between the variables (i. the Durbin-Watson statistic will be 2. the Predicted Residual Error Sum of Squares.. P Value. . where t is the t value associated with the slope..e. Regression assumes that the residuals are independent of each other. The P value is the probability of being wrong in concluding that there is an association between the dependent and independent variables (i. the better the predictive ability of the model. or committing a Type I error.05. the greater the likelihood that the residuals are correlated.

The power. you should consider trying a different model (for example. When the constant variance assumption may be violated. Regression power is affected by the number of observations. Power This result is displayed if you selected this option in the options dialog box. and the P value calculated by the test. but a greater . When this assumption may be violated. if there is a relationship. Alpha ( α ). or transforming the independent variable to stabilize the variance and obtain more accurate estimates of the parameters in the regression equation. or sensitivity. and the P value calculated by the test. Set the value in the Power Options dialog box. Smaller values of α result in stricter requirements before concluding the model is correct. and the correlation coefficient r associated with the regression. All regressions assume a source population to be normally distributed about the regression line. This result appears unless you disabled normality testing in the Options for Linear Regression dialog box. If you receive this warning. the suggested value is α = 0. one that more closely follows the pattern of the data). Constant Variance Test The constant variance test result displays whether or not the data passed or failed the test of the assumption that the variance of the dependent variable in the source population is constant regardless of the value of the independent variable. a warning appears in the report. Failure of the normality test can indicate the presence of outlying influential points or an incorrect regression model. Alpha ( α ) is the acceptable probability of incorrectly concluding that the model is correct. of a performed regression is the probability that the model correctly describes the relationship of the variables. a warning appears in the report.321 Prediction and Correlation Normality Test Normality test result displays whether the data passed or failed the test of the assumption that the source population is normally distributed around the regression line. the chance of erroneously reporting a difference α (alpha).05 which indicates that a one in twenty chance of error is acceptable. An α error is also called a Type I error (a Type I error is when you reject the hypothesis of no association when this hypothesis is true).

all other results for that observation are also displayed. residual results. . the suggested confidence value is 95%. These are the raw residuals. The Studentized residual is a standardized residual that also takes into account the greater confidence of the predicted values of the dependent variable in the "middle" of the data set. This is the value for the dependent variable predicted by the regression model for each observation. The standardized residual is the raw residual divided by the standard error of the estimate s y x If the residuals are normally distributed about the regression line. Predicted Values. Standardized Residuals. but also increase the risk of accepting a bad model (a Type I error). Both Studentized and Studentized deleted residuals that lie outside a specified confidence interval for the regression are flagged as outlying points. and other diagnostics selected in the Options for Regression dialog box. about 66% of the standardized residuals have values between -1 and +1. This is the row number of the observation. Studentized Residuals. Regression Diagnostics The regression diagnostic results display only the values for the predicted values. The trigger values to flag residuals as outliers are set in the Options for Linear Regression dialog box. By weighting the values of the residuals of the extreme data points (those with the lowest and highest independent variable values). however. Residuals. the Studentized residual is more sensitive than the standardized residual in detecting outliers. All results that qualify as outlying values are flagged with a < symbol. A larger standardized residual indicates that the point is far from the regression line. Larger values of α make it easier to conclude that the model is correct. only those observations that have one or more residuals flagged as outliers are reported. and about 95% of the standardized residuals have values between -2 and +2. the difference between the predicted and observed values for the dependent variables.5.322 Chapter 8 possibility of concluding the model is bad when it is really correct (a Type II error). the suggested value flagged as an outlier is 2. Row. If you selected Report Cases with Outliers Only.

Leverage. is a Studentized residual which uses the standard error of the estimate s y x ( –1 ) Both Studentized and Studentized deleted residuals that lie outside a specified confidence interval for the regression are flagged as outlying points. Observations with leverages a specified factor greater than the expected leverages are flagged as potentially influential points. Cook’s distance is a measure of how great an effect each point has on the estimates of the parameters in the regression equation. This is the row number of the observation. If you selected Report Cases with Outliers Only.0 times the expected leverage. It is a measure of how much the values of the regression equation would change if that point is deleted from the analysis.323 Prediction and Correlation This residual is also known as the internally Studentized residual because the standard error of the estimate is computed using all data. Leverage values identify potentially influential points. all other results for that observation are also displayed. The expected leverage of a data point is . Studentized Deleted Residuals. since the Studentized deleted residual results in much larger values for outliers than the Studentized residual. only observations that have one or more observations flagged as outliers are reported. The trigger values to flag data points as outliers are also set in the Options dialog box under the Other Diagnostics tab. Influence Diagnostics The influence diagnostic results display only the values for the results selected in the Options dialog box under the Other Diagnostics tab. Points with Cook’s distances greater than the specified value are flagged as influential. or externally Studentized residual. Cook’s distances exceeding 4 indicate that the point has a major effect on the values of the parameter estimates. however. All results that qualify as outlying values are flagged with a < symbol. the suggested confidence value is 95%. The Studentized deleted residual is more sensitive than the Studentized residual in detecting outliers. the suggested value is 4. the suggested value is 2. Row. The Studentized deleted residual. Cook’s Distance. Values above 1 indicate that a point is possibly influential.

Population. Predicted values that change by more than the specified number of standard errors when the data point is removed are flagged as influential. DFFITS. Because leverage is calculated using only the dependent variable. The confidence interval for the regression line gives the range of variable values computed for the region containing the true relationship between the dependent and independent variables. for the specified level of confidence. the suggested confidence level for both intervals is 95%.0 standard errors. where small changes in the independent variables can have large effects on the predicted values of the dependent variable. where α is the acceptable probability of incorrectly concluding that the coefficient is different than zero. This can also be described as P < α (alpha). It is the number of estimated standard errors the predicted value for a data point changes when the observed value is removed from the data set before computing the regression coefficients. Confidence Intervals These results are displayed if you selected them in the Regression Options dialog box. high leverage points tend to be at the extremes of the independent variables (large and small values).324 Chapter 8 .α ). . This is the value for the dependent variable predicted by the regression model for each observation. Predicted. and the confidence interval is 100(1 . the suggested value is 2. where there are k independent variables and n data points. The confidence interval for the population gives the range of variable values computed for the region containing the population from which the observations were drawn. you can conclude that the coefficient is different than zero with the level of confidence specified. This is the row number of the observation. Regression. The DFFITS statistic is a measure of the influence of a data point on regression prediction. Row. for the specified level of confidence. If the confidence interval does not include zero. The specified confidence level can be any value from 1 to 99.

and You know there are two or more independent variables and want to find a model with these independent variables. Multiple Linear Regression Use a Multiple Linear Regression to when you want to: Predict the value of one variable from the values of two or more other variables. . Bar chart of the standardized residuals. Scatter plot of the residuals. then click OK. Line/scatter plot of the regression with confidence and prediction intervals.325 Prediction and Correlation Simple Linear Regression Report Graphs You can generate up to five graphs using the results from a Simple Linear Regression. Select the type of graph you want to create from the Graph Type list. Creating a Linear Regression Report Graph To generate a graph of Linear Regression report data: 1. The specified graph appears in a graph window or in the report. With the report in view. For more information. Normal probability plot of residuals. by fitting a plane (or hyperplane) through the data. see “Generating Report Graphs” on page 539. 2. They include a: Histogram of the residuals. from the menus select: Graph Create Result Graph The Create Result Graph dialog box appears displaying the types of graphs available for the Linear Regression results.

... . If the relationship is not a straight line or plane. the possible values for the dependent variable are assumed to be normally distributed and have constant variance about the regression plane. variable. that is. they produce a corresponding value for the dependent.326 Chapter 8 The independent variables are the known. If you know there is only one independent variable.. the corresponding value for y either increases or decreases. x2. variables. About the Multiple Linear Regression Multiple Linear Regression assumes an association between the dependent and k independent variables that fits the general equation for a multidimensional plane: where y is the dependent variable x1. . for a given set of independent variable values.. If you are not sure if all independent variables should be used in the model. bk are the k coefficients.. use Simple Linear Regression. b3. or use a variable transformation.. using all the independent variables selected. x3. or response. use Stepwise or Best Subsets Regression to identify the important independent variables from the selected possible independent variables. Multiple Linear Regression is a parametric test. b2. use Polynomial or Nonlinear Regression. As the values xi vary. are the k independent variables and b1. When the independent variables are varied. depending on the sign of the associated regression coefficient bi Multiple Linear Regression finds the k+1 dimensional plane that most closely describes the actual data. or predictor.

If desired. 2. 6. Place the data for the observed dependent variable in one column and the data for the corresponding independent variables in two or more columns. set the Linear Regression options. Specify the residuals to display and save them to the worksheet. Setting Multiple Linear Regression Options Use the Multiple Linear Regression options to: Set assumption checking options. . View and interpret the Multiple Linear Regression report. Set the variance inflation factor. Display power. 7. Arranging Multiple Linear regression Data. 4.327 Prediction and Correlation Performing a Multiple Linear Regression To perform a Multiple Linear Regression: 1. Specify tests to identify outlying or influential data points. Display the PRESS Prediction Error and standardized regression coefficients. 3. Display confidence intervals and save them to the worksheet. Generate report graphs. Select Multiple Linear Regression from the Standard toolbar. Run the test by selecting the worksheet columns with the data you want to test using the Pick Columns for Multiple Linear Regression dialog box. From the menus select: Statistics Regression Multiple Linear 5. Enter or arrange your data appropriately in the worksheet.

Select Multiple Linear Regression from the drop-down list in the toolbar. click OK. Select or clear a check box to enable or disable a test option. Variance Inflation Factor. To continue the test. Click the Residuals tab to view the residual options. If you are going to run the test after changing test options and want to select your data before you run the test. Constant Variance. 3. For more information. Options settings are saved between SigmaPlot sessions. Constant Variance. More Statistics. For more information. see “Options for Multiple Linear Regression: Residuals” on page 330. Click the Assumption Checking tab to view the Normality. 5. and Durbin-Watson options. From the menus select: Statistics Current Test Options The Options for Multiple Linear Regression dialog box appears with four tabs: Assumption Checking. 4. For more information. click Run Test. Standardized Coefficients options. To accept the current settings and close the options dialog box. 6. These options test your data for its suitability for regression analysis by checking three assumptions that a . Click Other Diagnostics to view the Influence. Click the More Statistics tab to view the confidence intervals.328 Chapter 8 To change Multiple Linear Regression options: 1. PRESS Prediction Error. For more information. drag the pointer over the data. Residuals. see “Options for Multiple Linear Regression: Other Diagnostics” on page 333. see “Interpreting Multiple Logistic Regression Results” on page 360. 2. and Power options. see “Options for Multiple Linear Regression: More Statistics” on page 332. see “Options for Multiple Linear Regression: Assumption Checking” on page 328. Other Diagnostics. For more information. Options for Multiple Linear Regression: Assumption Checking Select the Assumption Checking tab from the options dialog box to view the Normality. and Durbin-Watson options.

The P value determines the probability of being incorrect in concluding that the data is not normally distributed (P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed). and you should consider trying a different model (i. the test passes.01 for the normality test requires greater deviations from normality to flag the data as non-normal than a value of 0. SigmaPlot tests for constant variance by computing the Spearman rank correlation between the absolute values of the residuals and the observed value of the dependent variable. there are extreme conditions of data distribution that these tests cannot detect.10) require less evidence to conclude that the residuals are not normally distributed or the constant variance assumption is violated. 0. or transforming one or more of the independent variables to stabilize the variance. To relax the requirement of normality and/or constant variance. Requiring smaller values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal. P Values for Normality and Constant Variance. these conditions should be easily ..05.329 Prediction and Correlation multiple linear regression makes about the data. Normality Testing. Constant Variance Testing.05. SigmaPlot uses the Kolmogorov-Smirnov test to test for a normally distributed population. If the P computed by the test is greater than the P set here. Only disable these options if you are certain that the data was sampled from normal populations with constant variance and that the residuals are independent of each other. decrease P. To require a stricter adherence to normality and/or constant variance. Larger values of P (for example. Note: Although the assumption tests are robust in detecting data from populations that are non-normal or with non-constant variances. one that more closely follows the pattern of the data). increase the P value. For example. However. the constant variance assumption may be violated. All assumption checking options are selected by default. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions. The variance of the dependent variable in the source population is constant regardless of the value of the independent variable(s).e. That the residuals are independent of each other. the suggested value in SigmaPlot is 0. When this correlation is significant. a P value of 0. A Multiple Linear Regression assumes: That the source population is normally distributed about the regression.

then save the results to the data worksheet. Predicted Values.0. The residuals are often correlated when the independent variable is time. Durbin-Watson Statistic values greater than 2. If the residuals are not correlated.330 Chapter 8 detected by visually examining the data without resorting to the automatic assumption tests. If you select none from the drop-down . To require a stricter adherence to independence. the Durbin-Watson statistic will be 2.50. select the number of the desired column from the corresponding drop-down list.0 box. decrease the acceptable difference from 2. Standardized. SigmaPlot uses the Durbin-Watson statistic to test residuals for their independence of each other. To relax the requirement of independence. Durbin-Watson Statistic.. Options for Multiple Linear Regression: Residuals Click the Residuals tab in the options dialog box to view the Predicted Values.e. Raw Residuals. To assign predicted values to a worksheet column. the values appear in the report but are not assigned to the worksheet. Enter the acceptable deviation from 2. Difference from 2 Value. Studentized Deleted. increase the acceptable difference from 2. and the deviation between the observation and the regression line at one time are related to the deviation at the previous time. make sure this check box is selected. The Durbin-Watson statistic is a measure of serial correlation between the residuals. To assign the raw residuals to a worksheet column.5 flag the residuals as correlated.0 more than the entered value. and Report Flagged Values Only options. select the worksheet column you want to save the predicted values to from the corresponding drop-down list. If the computed DurbinWatson statistic deviates from 2. SigmaPlot warns you that the residuals may not be independent. The suggested deviation value is 0.5 or less than 1. Use this option to calculate the predicted value of the dependent variable for each observed value of the independent variable(s). To include raw residuals in the report.0 that you consider as evidence of a serial correlation in the Difference for 2. If you select none and the Predicted Values check box is selected. i. Raw. Studentized. The raw residuals are the differences between the predicted and observed values of the dependent variables.0.

e. Studentized residuals scale the standardized residuals by taking into account the greater precision of the regression line near the middle of the data versus the extremes. You can change which data points are flagged by editing the value in the Flag Values > edit box... the values appear in the report but are not assigned to the worksheet. Report Flagged Values Only. To include standardized residuals in the report. Click the selected check box if you do not want to include Studentized deleted residuals in the worksheet. SigmaPlot automatically flags data points with "large" values of the Studentized residuals.e. except that the residual values are obtained by computing the regression equation without using the data point in question. Studentized deleted residuals are similar to the Studentized residual. make sure this check box is selected. outlying data points. Studentized Residuals. To include Studentized deleted residuals in the report. i. select Report Flagged Values Only. The standard error of the residuals is essentially the standard deviation of the residuals. SigmaPlot automatically flags data points lying outside of the confidence interval specified in the corresponding box. SigmaPlot can automatically flag data points with "large" values of the Studentized deleted residual. the suggested data points flagged lie outside the 95% confidence interval for the regression population. and is a measure of variability around the regression line. To include Studentized residuals in the report. so the t distribution can be used to define "large" values of the Studentized residuals. Standardized Residuals. make sure this check box is selected. The standardized residual is the residual divided by the standard error of the estimate. Note: Both Studentized and Studentized deleted residuals use the same confidence interval setting to determine outlying points. the suggested data points flagged lie outside the 95% confidence interval for the regression population. Studentized Deleted Residuals. outlying data points. Clear this option to include all standardized and Studentized residuals in the report. To include only the flagged standardized and Studentized deleted residuals in the report.331 Prediction and Correlation list and the Raw check box is selected. i. These data points are considered to have "large" standardized residuals. Click the selected check box if you do not want to include studentized residuals in the worksheet.. .e. The Studentized residuals tend to be distributed according to the Student t distribution. i. make sure this check box is selected. outlying data points.

The confidence interval for the regression line gives the range of values that defines the region containing the true mean relationship between the dependent and independent variables. Confidence Interval for the Population. These are the coefficients of the regression equation standardized to dimensionless values. The confidence interval for the population gives the range of values that define the region that contains the population from which the observations were drawn. The PRESS Prediction Error is a measure of how well the regression equation fits the data. To include confidence intervals for the regression in the report. make sure the Regression check box is selected. then specify a confidence level by entering a value in the percentage box. The selected intervals are saved to the worksheet starting with the specified column and continuing with successive columns in the worksheet. or both and then save them to the data worksheet. To include confidence intervals for the population in the report. select the column number of the first column you want to save the intervals to from the Starting in Column drop-down list. make sure the Population check box is selected. Click the selected check box if you do not want to include the confidence intervals for the population in the report. Click the selected check box if you do not want to include the confidence intervals for the population in the report. Click the selected check box if you do not want to include the PRESS statistic in the report. The suggested confidence level for all intervals is 95%. with the specified level of confidence. You can set the confidence interval for the population. Standardized Coefficients. PRESS Prediction Error. Confidence Interval for the Regression. Saving Confidence Intervals to the Worksheet. bi where .332 Chapter 8 Options for Multiple Linear Regression: More Statistics Click the More Statistics tab in the options dialog box to view the confidence interval options. Leave this check box selected to evaluate the fit of the equation using the PRESS statistic. regression. The confidence level can be any value from 1 to 99. To save the confidence intervals to the worksheet.

You can use several influence tests to identify and quantify influential points. where there are k independent variables and n data points. Leverage depends only on the value of the independent variable(s). select Standardized Coefficients. sx = standard deviation of the independent variable xi and sy standard deviation of dependent variable y. To avoid flagging more influential points. Select DFFITS to compute this value for all points and flag influential points. Observations with high leverage tend to be at the extremes of the independent variables.. For more information. Options for Multiple Linear Regression: Other Diagnostics Select the Other Diagnostics tab in the Options for Multiple Linear Regression dialog box to view the Influence options. they do not "line up" with the rest of the data points. which indicates that the point has a strong influence on the data. increase this value. that is. DFFITSi is the number of estimated standard errors that the predicted value changes for the ith data point when it is removed from the data set. Select Leverage to identify the potential influence of a point on the results of the regression equation. Clear this option if you do not want to include the standardized coefficients in the worksheet. see “What to Do About Influential Points” below. .333 Prediction and Correlation = regression coefficient. Leverage. i. Observations with leverages much higher than the expected leverages are potentially influential points.0 standard errors. those with DFFITS greater than the value specified in the Flag Values > edit box. DFFITS. It is another measure of the influence of a data point on the prediction used to compute the regression coefficients. to flag less influential points. The expected leverage of a data point is . where small changes in the independent variables can have large effects on the predicted values of the dependent variable.e. To include the standardized coefficients in the report. Influence options automatically detect instances of influential data points. The suggested value is 2. Most influential points are data points which are outliers. These points can have a potentially disproportionately strong influence on the calculation of the regression line. Predicted values that change by more than two standard errors when the data point is removed are considered to be influential. decrease this value.

increase this value. The power of a regression is the power to detect the observed relationship in the data. Larger values of α make it easier to conclude that there is a relationship.. see “What to Do About Influential Points” on page 336. The suggested value is 4.0 times the expected leverage for the regression (i. Power. Cook’s Distance. To avoid flagging more potentially influential points. i. increase this value: to flag less influential points. Cook’s distances exceeding 4 indicate that the point has a major effect on the values of the parameter estimates. Smaller values of α result in stricter requirements before concluding there is a significant relationship. i. select Report Flagged Values Only. . Report Flagged Values Only. The suggested value is α = 0. Clear this option to include all influential points in the report. Select Cook’s Distance to compute this value for all points and flag influential points.05..0.e.. For more information. Cook’s distance is a measure of how great an effect each point has on the estimates of the parameters in the regression equation. The suggested value is 2. ). To avoid flagging more influential points.e. This indicates that a one in twenty chance of error is acceptable. but a greater possibility of concluding there is no relationship when one exists. The alpha ( α ) is the acceptable probability of incorrectly concluding there is a relationship. lower this value. To only include only the influential points flagged by the influential point tests in the report. Cook’s distances above 1 indicate that a point is possibly influential. but also increase the risk of reporting a false positive. or the linear combination of the independent variables in the fit. those points that could have leverages greater than the specified value times the expected leverage. Cook’s distance depends on both the values of the independent and dependent variables. Select Variance Inflation Factor to measure the multicollinearity of the independent variables. Cook’s distance assesses how much the values of the regression coefficients change if a point is deleted from the analysis. or that you are willing to conclude there is a significant relationship when P < 0. lower this value.334 Chapter 8 Select Leverage to compute the leverage for each point and automatically flag potentially influential points. Change the alpha value by editing the number in the Alpha Value edit box. those with a Cook’s distance greater than the specified value. Select Power to compute the power for the multiple linear regression data.e. Variance Inflation Factor. to flag points with less potential influence.05.

decrease this value. When the variance inflation factor is large.335 Prediction and Correlation Regression procedures assume that the independent variables are statistically independent of each other. height. Including interaction terms in a regression equation can also result in structural multicollinearity. the estimates of the parameters in the regression model can become unreliable. There are two types of multicollinearity: Structural Multicollinearity. The most common form of structural multicollinearity occurs when a polynomial regression equation contains several powers of the independent variable. Use the value in the Flag Values > edit box as a threshold for multicollinear variables. Sample-Based Multicollinearity. If the multicollinearity is severe. Because these powers (e. this ideal situation rarely occurs in the real world.g. For more information. or contain redundant information. However. The default threshold value is 4. When the independent variables are correlated. i.. they contain some common information and "contaminate" the estimates of the parameters. x2 are correlated with each other. When the independent variables are correlated. . if age. see “What to Do About Multicollinearity” on page 336. and the parameter estimates may not be reliable.e. that the value of one independent variable does not affect the value of another. SigmaPlot can automatically detect multicollinear independent variables using the variance inflation factor. To make this test more sensitive to possible multicollinearity. Sample-based multicollinearity occurs when the sample observations are collected in such a way that the independent variables are correlated (for example. the parameter estimates can become unreliable. meaning that any value greater than 4. and weight are collected on children of varying ages.0 will be flagged as multicollinear. Variance inflation factor values above 4 suggest possible multicollinearity. To allow greater correlation of the independent variables before flagging the data as multicollinear. there are redundant variables in the regression model. values above 10 indicate serious multicollinearity. Flagging Multicollinear Data.0. structural multicollinearity occurs. each variable has a correlation with the others). increase this value. The parameters in regression models quantify the theoretically unique contribution of each independent variable to predicting the dependent variable.. Structural multicollinearity occurs when the regression equation contains several independent variables which are functions of each other.

you can reference an appropriate statistics reference. What to Do About Multicollinearity Sample-based multicollinearity can sometimes be resolved by collecting more data under other conditions to break up the correlation among the independent variables. try regression with different independent variables. For descriptions of how to handle influential points. or a Nonlinear Regression. The Pick Columns dialog box is used to select the worksheet columns with the data you want to test. You can resolve structural multicollinearities by centering the independent variable before forming the power or interaction terms. What to Do About Influential Points Influential points have two possible causes: There is something wrong with the data point. you need to select the data to test. you may be able to justify deleting the data point.336 Chapter 8 Report Flagged Values Only. For descriptions of how to handle multicollinearity. you can reference an appropriate statistics reference. Clear this option to include all influential points in the report. If you do not know the correct value. correct the value. The model is incorrect. Running a Multiple Linear Regression To run a Multiple Linear Regression. To only include only the points flagged by the influential point tests and values exceeding the variance inflation threshold in the report. select Report Flagged Values. If the model appears to be incorrect. the regression equation is over parameterized and one or more of the independent variables must be dropped to eliminate the multicollinearity. If a mistake was made in data collection or entry. caused by an error in observation or data entry. . If this is not possible.

You can also clear a column assignment by double-clicking it in the Selected Columns list. The title of selected columns appear in each row. To assign the desired worksheet columns to the Selected Columns list. drag the pointer over your data. If you have not selected columns. select the assignment in the list. You can select up to 64 independent columns. SigmaPlot performs the tests for normality (Kolmogorov-Smirnov). If you selected columns before you chose the test. SigmaPlot warns you. select the columns in the worksheet or from the Data for Dependent or Independent drop-down list. If you elected to test for normality. and all successively selected columns are assigned to the Independent rows in the list. they are placed in the specified column and are labeled by content and source column. If your data fails either of these tests. From the menus select: Statistics Regression Multiple Linear The Pick Columns for Multiple Linear Regression dialog box appears. Select Multiple Linear Regression from the Standard toolbar drop-down list. constant variance. the report appears displaying the results of the Multiple Linear Regression. 2. and independent residuals. The first selected column is assigned to the Dependent row in the Selected Columns list. 5. constant variance. If you selected to place residuals and other test results in the worksheet. 6. 3. the selected columns appear in the column list. . the dialog box prompts you to pick your data. 4. then select new column from the worksheet.337 Prediction and Correlation To run a Multiple Linear Regression: 1. When the test is complete. and/or independent residuals. If you want to select your data before you run the regression. Click Finish to run to perform the regression. To change your selections.

are also displayed. and R2 the coefficient of determination for multiple regression. R Squared. Note: The report scroll bars only scroll to the top and bottom of the current page. To move to the next or the previous page in the report. This equation takes the form: where y is the dependent variable. R2. You can also set the number of decimal places to display in the Options dialog box. xk are the independent variables. . R. bk are the regression coefficients. Result Explanations In addition to the numerical results.338 Chapter 8 Interpreting Multiple Linear Regression Results The report for a Multiple Linear Regression displays the equation with the computed coefficients. x1. . use the buttons in the formatting toolbar to move one page up and down in the report. are both measures of how well the regression model describes the data. b2.. and Adjusted R Sqared R and R2 . The number of observations N... The other results displayed in the report are enabled or disabled in the Options for Multiple Linear Regression dialog box. and b0.. x3. x2. b1. the correlation coefficient. and the adjusted R2. You can turn off this text on the Options dialog box. R. and the number of observations containing missing values (if any) that were omitted from the regression. and the P value for the regression equation and for the individual coefficients. R. Regression Equation This is the equation with the values of the coefficients in place. a table of statistical values for the estimate of the dependent variable. R values near 1 indicate that the equation is a good description of the relation between the independent and dependent variables. . expanded explanations of the results may also appear.

These results are displayed if the Standardized Coefficients option was selected in the Regression Options dialog box. Adjusted R2. . The standard errors of the regression coefficients (analogous to the standard error of the mean). The adjusted R2. Statistical Summary Table Coefficients. and equals 1 when you can perfectly predict the dependent variables from the independent variables. Standard Error of the Estimate ( Sy x ) The standard error of the estimate Sy x is a measure of the actual variability about the regression plane of the underlying population. The true regression coefficients of the underlying population generally fall within about two standard errors of the observed sample coefficients. Large standard errors may indicate multicollinearity. is also a measure of how well the regression model adj describes the data. and sy = standard deviation of dependent variable y. which reflects the degrees of freedom. but takes into account the number of independent variables. Beta (Standardized Coefficient βi). The underlying population generally falls within about two standard errors of the estimate of the observed sample. R2 . The value for the constant and coefficients of the independent variables for the regression model are listed. These values are used to compute t and confidence intervals for the regression.339 Prediction and Correlation R equals 0 when the values of the independent variable do not allow any prediction of the dependent variables. Standard Error. These are the coefficients of the regression equation standardized to dimensionless values sx β i = b i ----i sy where bi = regression coefficient. Larger R2 values (nearer to 1) indicate that the equation adj is a good description of the relation between the independent and dependent variables. s xi = standard deviation of the independent variable xi.

0. The smaller the P value. P value. the independent variable does not contribute to predicting the dependent variable. Analysis of Variance (ANOVA) Table The ANOVA (analysis of variance) table lists the ANOVA statistics for the regression and the corresponding F value. based on t). that the coefficient is not zero). The suggested value is 4. or committing a Type I error. P is the P value calculated for t. If the variance inflation factor is 1..05. If the variance inflation factor is much larger. The sum of squares due to regression measures the difference of the regression plane from the mean of the dependent variable. The P value is the probability of being wrong in concluding that there is a true association between the variables (i.340 Chapter 8 t Statistic. there are redundant variables in the regression model. that is. the greater the probability that the variables are correlated. The variance inflation factor is a measure of multicollinearity. .. Traditionally. or: You can conclude from "large" t values that the independent variable can be used to predict the dependent variable (i. VIF (Variance Inflation Factor). there is no redundant information in the other independent variables.0. and the parameter estimates may not be reliable. the probability of falsely rejecting the null hypothesis. It measures the "inflation" of the standard error of each regression parameter (coefficient) for an independent variable due to redundant information in other independent variables. t is the ratio of the regression coefficient to its standard error. The sum of squares are measures of variability of the dependent variable. The t statistic tests the null hypothesis that the coefficient of the independent variable is zero. indicating multicollinearity with other independent variables.e. you can conclude that the independent variable contributes to predicting the dependent variable when P < 0. SS (Sum of Squares) .e. Variance inflation factor values for independent variables above the specified value are flagged with a > symbol.

. Comparing these variance estimates is the basis of analysis of variance.341 Prediction and Correlation The residual sum of squares is a measure of the size of the residuals. The F test statistic gauges the ability of the regression equation. containing all independent variables. you can conclude that the independent variables contribute to the prediction of the dependent variable (i. or: The residual mean square is also equal to s y x F Statistic. The mean square provides two estimates of the population variances. It is the ratio 2 If F is a large number. The regression degrees of freedom is a measure of the number of independent variables. to predict the dependent variable. If the F . The total degrees of freedom is a measure of total observations. and the "unexplained variability" is smaller than what is expected from random sampling variability about the mean value of the dependent variable). DF (Degrees of Freedom). The mean square regression is a measure of the variation of the regression from the mean of the dependent variable. which are the differences between the observed values of the dependent variable and the values predicted by regression model. Degrees of freedom represent the number observations and variables in the regression equation.e. The residual degrees of freedom is a measure of the number of observations less the number of terms in the equation. The total sum of squares is a measure of the overall variability of the dependent variable about its mean value. or: The residual mean square is a measure of the variation of the residuals about the regression plane. MS (Mean Square). at least one of the coefficients is different from zero.

342 Chapter 8 ratio is around 1. the Predicted Residual Error Sum of Squares.. The PRESS statistic is computed by summing the squares of the prediction errors (the differences between predicted and observed values) for each observation.e. The incremental sum of squares measures the increase in the regression sum of squares (and reduction in the sum of squared residuals) obtained when that independent variable is added to the regression equation. the greater the probability that there is an association. after all other variables in the equation have been entered.e. Traditionally. as it is added to the equation. after taking into account all other independent variables. with that point deleted from the computation of the regression equation. You can gauge the independent contribution of each independent variable by comparing these values. is a gauge of how well a regression model predicts new data. is a measure of the new predictive information contained in an independent variable. after all independent variables above it have been entered. PRESS Statistic PRESS. The marginal sum of squares measures the reduction in the sum of squared residuals obtained by entering the independent variable last.. Incremental Sum of Squares SSincr. SSincr. You can gauge the additional contribution of each independent variable by comparing these values. is a measure of the unique predictive information contained in an independent variable. P Value.05. the better the predictive ability of the model. The P value is the probability of being wrong in concluding that there is an association between the dependent and independent variables (i. The smaller the PRESS statistic. SSmarg. The smaller the P value. the data is consistent with the null hypothesis that all the samples are just randomly distributed). the incremental or Type I sum of squares. you can conclude that there is no association between the variables (i. or committing a Type I error. you can conclude that the independent variable can be used to predict the dependent variable when P < 0. . based on F). SSmarg. the marginal or Type III sum of squares. the probability of falsely rejecting the null hypothesis.

one that more closely follows the pattern of the data). If you receive this warning. Failure of the normality test can indicate the presence of outlying influential points or an incorrect regression model. a warning appears in the report. When this assumption may be violated. the Durbin-Watson statistic is below 1. The suggested trigger value is a difference of more than 0.e. Normality Test Normality test result displays whether the data passed or failed the test of the assumption that the source population is normally distributed around the regression. . the more this value differs from 2. This results appears if it was selected in the Regression Options dialog box.e. Constant Variance Test The constant variance test result displays whether or not the data passed or failed the test of the assumption that the variance of the dependent variable in the source population is constant regardless of the value of the independent variable.50 or above 2. This result appears unless you disabled normality testing in the Regression Options dialog box. a warning appears in the report. you should consider trying a different model (i. the Durbin-Watson statistic will be 2.50. the DurbinWatson test is used to check this assumption. i. If the Durbin-Watson value deviates from 2 by more than the value set in the Regression Options dialog box. All regressions require a source population to be normally distributed about the regression line. If the residuals are not correlated. and the P value calculated by the test. a warning appears in the report.50. When the constant variance assumption may be violated.. Regression assumes that the residuals are independent of each other.. or transforming the independent variable to stabilize the variance and obtain more accurate estimates of the parameters in the regression equation. the greater the likelihood that the residuals are correlated.343 Prediction and Correlation Durbin-Watson Statistic The Durbin-Watson statistic is a measure of correlation between the residuals. and the P value calculated by the test.

all other results for that observation are also displayed. These are the raw residuals. Smaller values of α result in stricter requirements before concluding the model is correct. Larger values of α make it easier to conclude that the model is correct. Regression power is affected by the number of observations. however. but also increase the risk of accepting a bad model (a Type I error). Regression Diagnostics The regression diagnostic results display only the values for the predicted values. This is the value for the dependent variable predicted by the regression model for each observation.344 Chapter 8 Power This result is displayed if you selected this option in the Options for Multiple Linear Regression dialog box. only those observations that have one or more residuals flagged as outliers are reported.05 which indicates that a one in twenty chance of error is acceptable. All results that qualify as outlying values are flagged with a < symbol. residuals. the chance of erroneously reporting a difference α (alpha). if there is a relationship in the underlying population. the difference between the predicted and observed values for the dependent variables. The power. Alpha ( α ) is the acceptable probability of incorrectly concluding that the model is correct. This is the row number of the observation. If you selected Report Cases with Outliers Only. and other diagnostic results selected in the Options for Multiple Linear Regression dialog box. Predicted Values. The trigger values to flag residuals as outliers are set in the Options for Multiple Linear Regression dialog box. Alpha ( α ) . but a greater possibility of concluding the model is bad when it is really correct (a Type II error). Row. of a regression is the probability that the regression model can detect the observed relationship among the variables. the suggested value is α = 0. Residuals. . Set the value in the Power Options dialog box. and the slope of the regression. An α error is also called a Type I error (a Type I error is when you reject the hypothesis of no association when this hypothesis is true). or sensitivity.

Both Studentized and Studentized deleted residuals that lie outside a specified confidence interval for the regression are flagged as outlying points. The Studentized residual is a standardized residual that also takes into account the greater confidence of the predicted values of the dependent variable in the "middle" of the data set. Studentized Deleted Residual. All results that qualify as outlying values are flagged with a < symbol. or externally Studentized residual. the Studentized residual is more sensitive than the standardized residual in detecting outliers. . The Studentized deleted residual. Both Studentized and Studentized deleted residuals that lie outside a specified confidence interval for the regression are flagged as outlying points. because the standard error of the estimate is computed using all data. since the Studentized deleted residual results in much larger values for outliers than the Studentized residual. the suggested value flagged as an outlier is 2. Studentized Residuals.345 Prediction and Correlation Standardized Residuals. This reflects the greater effect of outlying points by deleting the data point from the variance computation. the suggested confidence value is 95%. This residual is also known as the internally Studentized residual. is a Studentized residual which uses the standard error of the estimate. A larger standardized residual indicates that the point is far from the regression. computed after deleting the data point associated with the residual. Influence Diagnostics The influence diagnostic results display only the values for the results selected in the Options dialog box under the Other Diagnostics tab. the suggested confidence value is 95%. The trigger values to flag data points as outliers are also set in Options dialog box under the Other Diagnostics tab. about 66% of the standardized residuals have values between -1 and +1. The Studentized deleted residual is more sensitive than the Studentized residual in detecting outliers. By weighting the values of the residuals of the extreme data points (those with the lowest and highest independent variable values).5. and about 95% of the standardized residuals have values between -2 and +2. The standardized residual is the raw residual divided by the standard error of the estimate s y x If the residuals are normally distributed about the regression.

high leverage points tend to be at the extremes of the independent variables (large and small values). Values above 1 indicate that a point is possibly influential. the suggested value is 2. Cook’s distances exceeding 4 indicate that the point has a major effect on the values of the parameter estimates. The DFFITS statistic is a measure of the influence of a data point on regression prediction. The expected leverage of a data point is where there are k independent variables and n data points. Cook’s Distance. the suggested value is 4. It is the number of estimated standard errors the predicted value for a data point changes when the observed value is removed from the data set before computing the regression coefficients. This is the row number of the observation. only observations that have one or more observations flagged as outliers are reported. Points with Cook’s distances greater than the specified value are flagged as influential. are flagged as potentially influential points. Predicted values that change by more than the specified number of standard errors when the data point is removed are flagged as influential. however. where small changes in the independent variables can have large effects on the predicted values of the dependent variable. Cook’s distance is a measure of how great an effect each point has on the estimates of the parameters in the regression equation. Leverage. the suggested value is 2. all other results for that observation are also displayed. . Row. It is a measure of how much the values of the regression coefficients would change if that point is deleted from the analysis.346 Chapter 8 If you selected Report Cases with Outliers Only. Observations with leverages a specified factor greater than the expected leverages. Because leverage is calculated using only the dependent variable. DFFITS. Leverage values identify potentially influential points.0 standard errors.0 times the expected leverage.

This can also be described as P < α (alpha). and the confidence interval is 100(1 . Population. see “Normal Probability Plot” on page 549. see “Bar Chart of the Standardized Residuals” on page 546. Multiple Linear Regression Report Graphs You can generate up to six graphs using the results from a Multiple Linear Regression. . For more information. Predicted. Bar chart of the standardized residuals.347 Prediction and Correlation Confidence Intervals These results are displayed if you selected them in the Options for Multiple Linear Regression dialog box. They include a: Histogram of the residuals. see “Histogram of Residuals” on page 547. For more information. for the specified level of confidence. The specified confidence level can be any value from 1 to 99. see “2D Line/Scatter Plots of the Regressions with Prediction and Confidence Intervals” on page 550. The confidence interval for the regression gives the range of variable values computed for the region containing the true relationship between the dependent and independent variables. Normal probability plot of the residuals. Line/scatter plot of the regression variable and confidence and prediction intervals with one independent. This is the row number of the observation. Row. For more information. The confidence interval for the population gives the range of variable values computed for the region containing the population from which the observations were drawn. For more information. you can conclude that the coefficient is different than zero with the level of confidence specified. the suggested confidence level for both intervals is 95%. If the confidence interval does not include zero. for the specified level of confidence. Scatter plot of the residuals. where α is the acceptable probability of incorrectly concluding that the coefficient is different than zero.α ). This is the value for the dependent variable predicted by the regression model for each observation. Regression.

Multiple Logistic Regression Use a Multiple Logistic Regression when you want to predict a qualitative dependent variable. The independent variables are the known. a dialog box appears prompting you to select the column with independent variables you want to use in the graph. or predictor. variables. by fitting a logistic function to the data. SigmaPlot ’s Logistic Regression requires that the . For more information. Conf. variable. from observations of one or more independent variables. see “Generating Report Graphs” on page 539. The Create Result Graph dialog box appears displaying the types of graphs available for the Multiple Linear Regression results. 4. or 3D Residual Scatter. If you select Scatter Plot Residuals. a dialog box appears prompting you to select the two columns with the independent variables you want to plot. Bar Chart Std Residuals. they produce a corresponding value for the dependent. For more information. see “3D Residual Scatter Plot” on page 551. Select the type of graph you want to create from the Graph Type list. With the Multiple Linear Regression report in view. then click OK. Creating Multiple Linear Regression Report Graphs 1. 5. or response. When the independent variables are varied. The graph appears using the specified independent variables. or double-click the desired graph in the list. Regression. and you have more than two columns of independent variables.348 Chapter 8 3D scatter plot of the residuals. Select the columns with the independent variables you want to use in the graph. If you select 3D Scatter & Mesh. & Pred. then click OK. To generate a report graph of Multiple Linear Regression data: 2. such as the presence or absence of a disease. from the menus select: Graph Create Result Graph 3.

Performing a Multiple Logistic Regression To perform a Multiple Logistic Regression: 1. black or white) represented by values of 0 and 1. Select Multiple Logistic Regression from the Standard toolbar. Enter or arrange your data appropriately in the worksheet.349 Prediction and Correlation dependent variable be dichotomous or take two possible responses (dead or alive. b0 through bk are the k regression coefficients. the corresponding estimated probability that y =1 increases or decreases. Set the Logistic Regression options. and and x1 through xk are the independent variables. As the values xi vary. From the menus select: Statistics Regression Multiple Logistic . use a Simple Linear Regression if you have one independent variable and a Multiple Linear Regression if you have more than one independent variable. 4. P(y =1) is the predicted probability that the dependent variable has a positive response or has a value of 1. Multiple Logistic Regression finds the set of values of the regression coefficients most likely to predict the observed values of the dependent variable. If your dependent variable data does not use dichotomous values. 3. 2. given the observed values of the independent variables. About the Multiple Logistic Regression Multiple Logistic Regression assumes an association between the dependent and k independent variables that fits the general equation for a multidimensional plane: where y is the dependent variable. depending on the sign of the associated regression coefficient bi.

This data format is useful if you have several instances of the same variable combination. if there are three instances of the dependent variable 0 with corresponding independent variables of 26. Only enter one instance of each different combination of dependent and independent variables. place 0 in the dependent variable column. Run the test. For example. and 142. place the data for the observed dependent variable in one column and the data for the corresponding independent variables in one or more columns. Setting Multiple Logistic Regression Options Use the Multiple Logistic Regression options to: . To enter data in grouped format. and all columns must be equal in length. View and interpret the Multiple Logistic Regression report. place the data for the observed dependent variable in one column and the data for the corresponding independent variables in one or more columns. The grouped data format enables you to specify the number of instances a combination of dependent and independent variables appear in a data set. For both formats you must have one column of dependent variable data and one or more columns of independent variable data. Grouped Data. You must enter dependent variable data as dichotomous data and independent variable data must be entered in numerical format. 26. Raw Data. and 142 in the corresponding rows of the independent variable columns. Observations containing missing values are ignored. 6.350 Chapter 8 5. or if you are using categorical independent variables. If you have continuous numerical data or as text as your dependent variable data. then specify the number of times the combination appears in the data set in the corresponding row of another worksheet column. you must convert them into an equivalent set of dummy variables using reference coding. and 3 in the corresponding row of the count worksheet column. To enter data in raw format. and you don’t want to enter every instance in the worksheet. Arranging Multiple Logistic Regression Data Logistic Regression data can be entered into the worksheet in raw or grouped data format.

see “Running a Multiple Logistic Regression” on page 359. Click the Residuals tab to view the residual and influence options. odds ratio confidence. Wald statistic. For more information. For more information. Click the More Statistics tab to view the Standard Error Coefficients. and coefficients P value. If you are going to run the test after changing test options and want to select your data before you run the test. To continue the test. For more information. Specify tests to identify outlying or influential data points. see “Options for Multiple Logistic Regression: Statistics” on page 353. and Variance Inflation Factor options. Residuals. see “Options for Multiple Logistic Regression: Criterion” on page 352. From the menus select: Statistics Current Test Options The Options for Multiple Logistic Regression dialog box appears with three tabs: Criterion. Option settings are saved between SigmaPlot sessions. Click the Criterion tab to view the criterion options. Specify the residuals to display and save them to the worksheet. . More Statistics. odds ratio. 3. Wald Statistic. Odds Ratio. drag the pointer over the data. For more information. 2. and Coefficients P Values. To change Multiple Logistic Regression options: 1. 4. see “Options for Multiple Logistic Regression: Residuals” on page 356.351 Prediction and Correlation Set options used to determine how well the logistics regression equation fits the data. Odds Ratio Confidence. Estimate the variance inflation factors for the regression coefficients. Select Multiple Logistic Regression from the Standard toolbar drop-down list. click Run Test. Calculate the standard error coefficient. Predicted Values.

It compares the your full model against a model that uses nothing but the mean of the dependent variable. Large values of the Pearson ChiSquare indicate a poor agreement. click OK. Hosmer-Lemshow Test Statistic. To change the P value. Threshold probability for goodness of fit. The default value is 0. It summarizes the results of whether the data fits the logistic equation by cross-classifying the actual dependent response variables with predicted responses and identifying the number of different combinations of the independent variables. Likelihood Ratio Test Statistic.2.352 Chapter 8 5. type a new value in the edit box. The Likelihood Ratio Test statistic tests how well the logistic regression equation fits your data by summing the squares of the deviance residuals. Large P values indicate a good fit between the logistic equation and the data. Small P values indicate that you can reject the null hypothesis that the logistic equation fits the data and try should try an equation with different independent variables. The classification table tests the null hypothesis that the data follow the logistic equation by comparing the number of individuals with each outcome with the number expected based on the logistic equation. Use these options to specify the criterion you want to use to test how well your data fits the logistic regression equation. Small P values indicate a good fit between the logistic regression equation and your data. Classification Table. . Pearson Chi-Square Statistic. To accept the current settings and close the options dialog box. The Pearson Chi-Square statistic tests how well the logistic regression equation fits your data by summing the squares of the Pearson residuals. Setting the P value to larger values requires smaller deviations between the values predicted by the logistic equation and the observed values of the dependent variable to accept the equation as a good fit to the data. Options for Multiple Logistic Regression: Criterion Select the Criterion tab in the Options for Multiple Logistic Regression dialog box to set the criterion options. The Hosmer-Lemshow statistic tests the null hypothesis that the logistic equation fits the data by comparing the number of individuals with each outcome with the number expected based on the logistic equation. Small values of the Pearson Chi-Square statistic indicate a good agreement between the logistic regression equation and the data.

a large P value indicates a good fit between the logistic regression equation and the data. see “Interpreting Multiple Logistic Regression Results” on page 360. Standard Error Coefficients. The Wald Statistic compares the observed value of the estimated coefficient with its associated standard error. The default threshold is 0.353 Prediction and Correlation Threshold probability for positive classification. Number of Independent Variable Combinations. As with the Hosmer-Lemshow statistic.5. the warning message appears in the report. For more information. Options for Multiple Logistic Regression: Statistics Select the More Statistics tab in the Options dialog box to view the statistics options. If the number of unique combinations of the independent variables is not large compared to the number of independent variables. The true regression coefficients of the underlying population generally fall within two standard errors of the observed sample coefficients. select the Number of Independent Variable Combinations check box. probabilities less than or equal to the specified value are assigned a value of 0 or a reference value. The predicted responses are assigned dichotomous variables derived by comparing estimated logistic probabilities to the probability value specified in the Threshold probability for positive classification edit box. To calculate the number of independent variable combinations and warn if there are not enough combinations as compared to the independent variables. Wald Statistic. It is computed as the ratio: . your logistic regression results may be unreliable. and asks if you want to continue. If the estimated probability exceeds the specified probability value. The resulting contingency table can be analyzed with a Chi-Square test. The Standard Error Coefficients are measures of the precision of the estimates of the regression coefficients. These options help determine how well your data fits the logistic regression equation using maximum likelihood as the estimation criterion. the predicted variable is assigned a positive response (value of 1). If you select Yes. a dialog box appears warning you that the number of independent variable combinations are too small. If the calculated independent combination is less than the value in the corresponding edit box.

Select Wald Statistic to include the ratio of the observed coefficient with the associated standard error in the report. The smaller the P value. Odds Ratio Confidence. Use the Wald Statistic to test whether the coefficients associated with the independent variables are significantly different from zero.is the -2 point on the axis of the standard normal distibution that corresponds to the desired confidence interval. The Wald statistic can also be used to determine how significant the independent variables are in predicting the dependent variable. Odds Ratio. The odds ratio for an independent variable is computed as is the regression coefficient. The odds ratio is an estimate of the increase (or decrease) in the odds for an outcome if the independent variable value is increased by 1. The Coefficients P Value determines the probability of being incorrect in concluding that each independent variable has a significant effect on determining the dependent variable. and Z 1 – α. The odds of any event occurring can be defined by P Odds = Ω = ----------1–P Where P is the probability of the event happening.i⎠ ⎝ 2 Where b i is the coefficient. change the percentage in the corresponding edit box. Coefficients P Value.354 Chapter 8 where z is the Wald Statistics. To change the confidence used. is the observed value of the estimated coefficient. and s bi is the standard error of the coefficient. the more likely the independent variables actually predict the dependent variables. s b i is the standard error of the coefficient. The significance of independent variables is tested by comparing the observed value of the coefficients . The default confidence used is 95%. The odds ratio confidence intervals are defined as e ⎛ ⎞ ⎜ bi ± Z α sb ⎟ 1 – -.

If you select none and the Predicted Values check box is selected. When the independent variables are correlated. Sample-Based Multicollinearity. Use this option to measure the multicollinearity of the independent variables. Structural multicollinearity occurs when the regression equation contains several independent variables which are functions of each other. Regression procedures assume that the independent variables are statistically independent of each other. There are two types of multicollinearity. For logistic regression. Structural Multicollinearity. Variance Inflation Factor. An example of this is when a regression equation contains several . height. they contain some common information and "contaminate" the estimates of the parameters. This is the most common form of multicollinearity. For more information. see “Interpreting Multiple Logistic Regression Results” on page 360. For more information on computing the Wald statistic and on including it in your report. If the observed value of the coefficient is large compared to the standard error.355 Prediction and Correlation with the associated standard error of the coefficient. Predicted Values. the estimates of the parameters in the regression model can become unreliable. However.. the values appear in the report but are not assigned to the worksheet. To assign predicted values to a worksheet column. the predicted values indicate the probability of a positive response. then save the results to the data worksheet. the parameter estimates can become unreliable. each variable has a correlation with the others). or contain redundant information. Sample-based multicollinearity occurs when the sample observations are collected in such a way that the independent variables are correlated (for example. select the worksheet column you want to save the predicted values to from the corresponding drop-down list. When the independent variables are correlated. The parameters in regression models quantify the theoretically unique contribution of each independent variable to predicting the dependent variable. that the value of one independent variable does not affect the value of another. or the linear combination of the independent variables in the fit. you can conclude that the coefficients are significantly different from zero and that the independent variables contribute significantly to predicting the dependent variables. Use this option to calculate the predicted value of the dependent variable for each observed value of the independent variable(s).e. If the multicollinearity is severe. i. and weight are collected on children of varying ages. this ideal situation rarely occurs in the real world. if age. see “Interpreting Multiple Logistic Regression Results” on page 360.

x. To allow greater correlation of the independent variables before flagging the data as multicollinear. increase this value.g. Use the value in the Flag Values > edit box as a threshold for multicollinear variables. Variance inflation factor values above 4 suggest possible multicollinearity. and Report Flagged Values Only options. meaning that any value greater than 4. Studentized. you can reference an appropriate statistics reference. values above 10 indicate serious multicollinearity. Standardized. Options for Multiple Logistic Regression: Residuals Select the Residuals tab in the options dialog box to view the Residual Type. Clear this option to include all influential points in the report. If this is not possible. For more information. Studentized Deleted. To make this test more sensitive to possible multicollinearity. structural multicollinearity occurs. decrease this value. x2 ) are correlated with each other. . Because these powers (e.0. Report Flagged Values Only. Raw.0 will be flagged as multicollinear.356 Chapter 8 powers of the independent variable. You can resolve structural multicollinearities by centering the independent variable before forming the power or interaction terms. Including interaction terms in a regression equation can also result in structural multicollinearity. Flag values >. and the parameter estimates may not be reliable. For descriptions of how to handle multicollinearity. The default threshold value is 4. there are redundant variables in the regression model.. What to Do About Multicollinearity You can sometimes resolve sample-based multicollinearity by collecting more data under other conditions to break up the correlation among the independent variables. the regression equation is over parameterized and one or more of the independent variables must be dropped to eliminate the multicollinearity. select Report Flagged Values Only. When the variance inflation factor is large. To include only the points flagged by the influential point tests and values exceeding the variance inflation threshold in the report. see“What to Do About Multicollinearity” on page 356.

select the number of the desired column from the corresponding drop-down list. . y = 1) outcome that is estimated from the Logistic Regression equation. the suggested data points flagged lie outside the 95% confidence interval for the regression population. Studentized residuals take into account the greater precision of the regression estimates near the middle of the data versus the extremes. The raw residuals are the differences between the predicted and observed values of the dependent variables. Raw Residuals. SigmaPlot automatically flags data points with "large" values of the Studentized residuals.. Studentized Deleted Residuals. The standard error is defined as the observed value of the dependent variable (0 or 1) divided by the probability of a positive response (i. To assign the raw residuals to a worksheet column. To include studentized residuals in the report. Select None from the drop-down list if you don’t want to include residuals in the report. the values appear in the report but are not assigned to the worksheet. so the t distribution can be used to define "large" values of the Studentized residuals. Studentized Residuals. The Studentized residuals tend to be distributed according to the Student t distribution. Click the selected check box if you do not want to include studentized residuals in the worksheet. outlying data points..e. If you select none from the drop-down list and the Raw check box is selected. Pearson residuals are the default residual type used to calculate the goodness of fit for the logistic regression equation because the Chi-Square goodness of fit statistic is the sum of squared Pearson residuals. Click the selected check box if you do not want to include raw residuals in the worksheet. Larger values of the deviance residual indicate a larger difference between the observed and predicted values of the dependent variable. The likelihood ratio test statistic is the sum of squared deviance residuals. To include residuals in the report select either Pearson or Deviance from the Residual Type drop-down list.e. except that the residual values are obtained by computing the regression equation without using the data point in question. Studentized deleted residuals are similar to the Studentized residual. Deviance residuals are used to calculate the likelihood ratio test statistic to assess the overall goodness of fit of the logistic regression equation to the data. The deviance residual for each point is a measure of how much that point contributes to the likelihood ratio test statistic. Residuals are not reported by default. make sure this check box is selected. Pearson residuals are calculated by dividing the raw residual by the standard error.357 Prediction and Correlation Residual Type. i. To include raw residuals in the report. make sure this check box is selected.

select Report Flagged Values Only. You can use several influence tests to identify and quantify influential points. Cook’s Distance.). make sure this check box is selected. Cook’s distance assesses how much the values of the regression coefficients change if a point is deleted from the . to flag points with less potential influence. ------------------. Leverage is used to identify the potential influence of a point on the results of the regression equation. Select Leverage to compute the leverage for each point and automatically flag potentially influential points. Leverage. SigmaPlot can automatically flag data points with "large" values of the studentized deleted residual. Observations with high leverage tend to be at the extremes of the independent variables. These points can have a potentially disproportionately strong influence on the calculation of the regression line. Leverage depends only on the value of the independent variable(s). increase this value.e. Cook’s distance is a measure of how great an effect each point has on the estimates of the parameters in the regression equation.. those points that could have leverages greater than the specified value times the expected leverage. Influence Influence options automatically detect instances of influential data points. To only include the flagged standardized and studentized deleted residuals in the report. outlying data points. Observations with leverages much higher than the expected leverages are potentially influential points.0 times the 2(k + 1) expected leverage for the regression (i. To avoid flagging more n potentially influential points. Note: Both Studentized and Studentized deleted residuals use the same confidence interval setting to determine outlying points. the suggested data points flagged lie outside the 95% confidence interval for the regression population.. where small changes in the independent variables can have large effects on the predicted values of the dependent variable.. Click the selected check box if you do not want to include studentized deleted residuals in the worksheet. Clear this option to include all standardized and studentized residuals in the report. Most influential points are data points which are outliers.. where there are k independent n variables and n data points. The suggested value is 2. that is. they do not "line up" with the rest of the data points. lower this value. i. Report Flagged Values Only. i. k+1 The expected leverage of a data point is ----------.e.358 Chapter 8 To include Studentized deleted residuals in the report.e.

e.0. see “What to Do About Influential Points” on page 359.359 Prediction and Correlation analysis. If you do not know the correct value. If you want to select your data before you run the regression. The suggested value is 4. try regression with different independent variables. Cook’s distances exceeding 4 indicate that the point has a major effect on the values of the parameter estimates. . Select Multiple Logistic Regression from the drop-down list on the Standard toolbar. Select Cook’s Distance to compute this value for all points and flag influential points. To avoid flagging more influential points. Cook’s distance depends on both the values of the independent and dependent variables. If the model appears to be incorrect. For more information. you need to select the data to test. those with a Cook’s distance greater than the specified value. correct the value. Running a Multiple Logistic Regression To run a Multiple Logistic Regression. To run a Multiple Logistic Regression: 1. The model is incorrect. lower this value. If a mistake was made in data collection or entry. Cook’s distances above 1 indicate that a point is possibly influential. What to Do About Influential Points Influential points have two possible causes: There is something wrong with the data point. or a Nonlinear Regression.. For descriptions of how to handle influential points. you may be able to justify deleting the data point. you can reference an appropriate statistics reference. i. Use the Pick Columns for Multiple Logistic Regression dialog box to select the worksheet columns with the data you want to test. drag the pointer over your data. increase this value: to flag less influential points. 2. caused by an error in observation or data entry.

If you selected columns before you chose the test. the worksheet column with the dependent variable data. 6. From the menus select: Statistics Regression Multiple Logistic The Pick Columns for Multiple Logistic Regression dialog box appears. Independent. When the test is complete. SigmaPlot warns you. 4. If you selected to place residuals and other test results in the worksheet. and the Hosmer-Lemshow and Chi Square goodness of fit statistics. their standard errors. the report appears displaying the results of the Multiple Linear Regression. The title of selected columns appears in each row. To change your selections. the values representing the positive and reference responses. . select the assignment in the list. To assign the desired worksheet columns to the Selected Columns list. constant variance. and/or independent residuals. the number of observations in the test. constant variance.360 Chapter 8 3. If your data fails either of these tests. Click Finish to run the regression. SigmaPlot performs the tests for normality (Kolmogorov-Smirnov). The other results displayed in the report are enabled or disabled in the Options for Multiple Logistic Regression dialog box. 5. If you elected to test for normality. then select new column from the worksheet. select the columns in the worksheet. or select the columns from the Data for Dependent. or Count drop-down list. Select the column with the value indication of the number of times a dependent and independent combination repeats as the Count column. You can also clear a column assignment by double-clicking it in the Selected Columns list. estimation criterion used to fit the logistic equation to your data. they are placed in the specified column and are labeled by content and source column. the selected columns appear in the Selected Columns list. Interpreting Multiple Logistic Regression Results The report for a Multiple Logistic Regression displays the logistic equation with the computed coefficients. and independent residuals.

. . Result Explanations In addition to the numerical results. value of the dependent variable equal to 1) and x1. You can also set the number of decimal places to display in the Options dialog box.. use the buttons in the formatting toolbar to move one page up and down in the report. . b3. You can turn off this text on the Options dialog box... b2. x2. x3. are also displayed..e. To move to the next or the previous page in the report. xk are the independent variables and b0. Estimation Criterion Logistic regression uses the maximum likelihood approach to find the values of the coefficients . expanded explanations of the results may also appear. The equation can be rewritten by applying the logit transformation to both sides of this equation P LogitP = ln ⎛ -----------⎞ ⎝ 1 – p⎠ Number of Observations The number of observations N. Regression Equation The logistic regression equation is: where P is the probability of a “positive” response (i. bk are the regression coefficients.361 Prediction and Correlation Note: The report scroll bars only scroll to the top and bottom of the current page.. and the number of observations containing missing values (if any) that were omitted from the regression. b1.

Number of Unique Independent Variable Combinations This value represents the number of unique combinations of the independent variables and appears if you have the Number of Independent Variable Combinations option in the Options for Logistic Regression dialog box selected. It tests the null hypothesis that the logistic equation describes the data. Dependent Variable This section of the report indicates which values in the dependent variable column represent the positive response (1) and which value represents the reference response (0). All of the P values are based on a chi-square probability distribution. you reject the null hypothesis of agreement).e. Note: The regression coefficients computed by minimizing the sum of squared residuals in Multiple Logistic Regression are also the maximum likelihood estimates. a warning message appears in the report that your results may be unreliable. If this value is less than the value specified for the Number of Independent Variable Combinations option. Thus. which is not recommended for use with small numbers of observations. When the dataset is small. The number of unique independent variable combinations is compared to the actual number of independent variables. Hosmer-Lemshow P Value The Hosmer-Lemshow P value indicates how well the logistic regression equation fits your data by comparing the number of individuals with each outcome with the number expected based on the logistic equation. goodness of fit measures for the logistic regression should be interpreted with great caution.362 Chapter 8 in the Logistic Regression Equation that were most likely to fit the observed data. Large P values indicate a good fit between the logistic equation and the data. small P values indicate a poor fit of the equation to your data (i.. . The critical Hosmer-Lemshow P value option is set in the Options for Multiple Logistic Regression dialog box.

363 Prediction and Correlation Pearson Chi-Square Statistic The Pearson Chi-Square statistic is the sum of the squared Pearson residuals. a small coefficients of P value is reported. If the pattern of observed outcomes is more likely to have occurred when independent variables affect the outcome than when they do not. the closer this sum will be to zero. This comparison is computed by running the logistic regression with and without the independent variables and comparing the results. Log Likelihood Statistic The -2 log likelihood statistic is a measure of the goodness of fit between the actual observations and the predicted probabilities. It is the summation: where the yi and μι are respectively the observed and predicted values of the dependent variable. The Pearson Chi-Square option is set in the Options for Multiple Logistic Regression dialog box. It indicates how well the logistic regression equation fits your data by comparing the likelihood of obtaining observations if the independent variables had no effect on the dependent variable with the likelihood of obtaining the observations if the independent variables had an effect on the dependent variables. The -2 log likelihood is also equal to the sum of the squared deviance residuals. and n is the number of observations. Thus the closer the predicted values are to the observed. The Chi-Square test statistic is analogous to the residual sum of squares in ordinary linear regression. indicating a good fit between the logistic regression equation and your data. . Small values of the ChiSquare (and corresponding large values of the associated P value) indicate a good agreement between the logistic regression equation and the data and large values of Chi-Square (and small values of P) indicate a poor agreement. It is a measure of the agreement between the observed and predicted values of the dependent variable using a Chi-Square test statistic. Note that ln(1) is zero and the observed values must be 0 or 1. Likelihood Ratio Test Statistic The Likelihood Ratio Test statistic is derived from the sum of the squared deviance residuals.

and LL must be closer to zero reflecting a better fit. probabilities less than or equal to the specified value are assigned a value of 0 or a reference value. and LL will equal LL0 when there is no fit whatsoever). The predicted responses are assigned values of 1 . the predicted variable is assigned a positive response (value of 1). just a constant term. and identifying the number of correctly and incorrectly classified cases. If the estimated probability in the probability table exceeds the specified threshold probability value. Threshold Probability for Positive Classification The threshold probability value determines whether the response predicted by the logistic model in the classification and probability tables (see following sections) is a positive or a reference response. In viewing this relationship note that both LL0 and LL are positive. (At the extremes. This table appears in the report if the Classification Table option is selected in the Options dialog box. Classification Table The classification table summarizes the results by cross-classifying the observed dependent response variables with predicted. The threshold probability value is set in the options dialog box. the estimated logistic probability of a positive response (a value of 1). The responses classified by the logistic model are derived by comparing estimated logistic probabilities in the Probability Table to the specified threshold probability value (see preceding section). Thus the larger the LR the larger the implied explanatory power of the independent variables for the given dependent variable. Probability Table The Probability Table lists the actual responses of the dependent variable.364 Chapter 8 The -2 log likelihood (LL) statistic is related to the likelihood ratio (LR) as follows: LR = LL – LL 0 where LL0 is the -2 log likelihood of a regression model having none of the independent variables. and the predicted response of the dependent variables. LL will be zero when there is a perfect fit.

Odds Ratio Confidence. Traditionally. The standard errors of the regression coefficients (analogous to the standard error of the mean). the greater the probability that the independent variables affect the dependent variable. Coefficients. The value for the constant and coefficients of the independent variables for the regression model are listed. Use these values to compute the Wald statistic and confidence intervals for the regression coefficients. The true regression coefficients of the underlying population generally fall within about two standard errors of the observed sample coefficients. you can conclude that the independent variable contributes to predicting the dependent variable when P < 0. bi is the observed value of the estimated coefficient. This table appears in the report if the Predicted Values option is selected in the Options dialog. P is the P value calculated for the Wald statistic. Odds Ratio. and VIF for the independent variables. The P value is the probability of being wrong in concluding that there is a true association between the variables. Large standard errors may indicate multicollinearity.365 Prediction and Correlation (positive response) or 0 (reference response) derived by comparing estimated logistic probabilities to the specified threshold probability value (see preceding section). The P value is based on the chi-square distribution with one degree of freedom. Wald Statistic. standard error. P value. It is computed as the ratio: b z = ----i s bi where z is the Wald Statistics. Statistical Summary Table The summary table lists the coefficient. and s bi is the standard error of the coefficient. P value. The Wald statistic is the regression coefficient divided by the standard error. The odds ratio for an independent variable is computed as .05. Wald Statistic. The smaller the P value. Odds Ratio. Standard Error.

The presence of serious multicollinearity indicates that you have too many redundant independent variables in your regression equation. This choice does not affect the logistic regression itself. Residual Calculation Method The residual calculation method indicates how the residuals for the logistic regression are calculated. indicating multicollinearity with other independent variables. you should delete the redundant variables. If the variance inflation factor is much larger. Variance inflation factor values for independent variables above the specified value are flagged with a > symbol. The odds ratio is an estimate of the increase (or decrease) in the odds for an outcome if the independent variable value is increased by 1. there are redundant variables in the regression model. To improve the quality of the regression equation.0. The level of confidence (95%) is specified in the options dialog. It measures the "inflation" of the standard error of each regression parameter (coefficient) for an independent variable due to redundant information in other independent variables. The variance inflation factor is a measure of multicollinearity. The Pearson residual is defined as: where yi and μi are respectively the observed and predicted values for the ith case. there is no redundant information in the other independent variables. You can choose Pearson or Deviance residuals from the Options for Logistic Regression dialog. VIF (Variance Inflation Factor). but does affect how the Studentized residuals are calculated. The cutoff value for flagging multicollinearity is set in the Options dialog box. The suggested value is 4. These two values represent the lower and upper ends of the confidence interval in which the true odds ratio lies. .366 Chapter 8 where β I is the regression coefficient. If the variance inflation factor is 1.0. which minimizes the deviance residuals squared. Odds Ratio Confidence. and the parameter estimates may not be reliable.

Raw residuals are the difference between the predicted and observed values for each of the subjects or cases. This is the row number of the observation. and studentized deleted residuals if the associated options are selected in the options dialog. The Residual table displays either Pearson or Deviance residuals. are the raw residuals divided by the standard error.367 Prediction and Correlation Residuals Table The residuals table displays the raw. or externally Studentized residual. because the standard error of the estimate is computed using all data. only those observations that have one or more residuals flagged as outliers are reported. with smaller values indicating a better fit. The way the residuals are calculated depend on whether Pearson or Deviance is selected as the residual type in the Options dialog box. Studentized Deleted Residual. studentized. The Studentized deleted residual. Raw Residuals. however. Pearson residuals. If you selected Report Flagged Values Only. all other results for that observation are also displayed. computed after deleting the data point associated with the residual. Pearson or Deviance. Both Pearson and Deviance residuals indicate goodness of fit between the logistic equation and the data. also known as standardized residuals. The trigger values to flag residuals as outliers are also set in the Options for Multiple Logistic Regression dialog. The Studentized residual is a standardized residual that also takes into account the greater confidence of the predicted values of the dependent variable in the "middle" of the data set. Note that if your data has a case with a value missing. This residual is also known as the internally Studentized residual. These two residual types are calculated differently and affect the way the studentized residuals in the table are calculated. Studentized Residuals. . Deviance residuals are a measure of how much each point contributes to the likelihood function being minimized as part of the maximum likelihood procedure. Row. the corresponding row is entirely omitted from the table of residuals. All residuals that qualify as outlying values are flagged with a < symbol. Pearson/Deviance Residuals. is a Studentized residual which uses the standard error. depending on the Residual Type option setting in the Options for Logistic Regression dialog box.

The Cook’s Distance value used to flag "large" values is set in the Options dialog box. the suggested confidence value is 95%. Observations with leverages a specified factor greater than the expected leverages are flagged as potentially influential points. Leverage values identify potentially influential points. . The trigger values to flag data points as outliers are also set in the Options dialog under the More Statistics tab. Points with Cook’s distances greater than the specified value are flagged as influential. All results that qualify as outlying values are flagged with a < symbol. Because leverage is calculated using only the dependent variable. only observations that have one or more observations flagged as outliers are reported.0 times the expected leverage. The expected leverage of a data point is where there are k independent variables and n data points. Cook’s Distance. Influence Diagnostics The influence diagnostic results display only the values for the results selected in the Options dialog under the More Statistics tab.368 Chapter 8 Both Studentized and Studentized deleted residuals that lie outside a specified confidence interval for the regression are flagged as outlying points. If you selected Report Cases with Outliers Only. Row. the suggested value is 2. It is a measure of how much the values of the regression coefficients would change if that point is deleted from the analysis. Cook’s distances exceeding 4 indicate that the point has a major effect on the values of the parameter estimates. This is the row number of the observation. however. Leverage. The Studentized deleted residual is more sensitive than the Studentized residual in detecting outliers. high leverage points tend to be at the extremes of the independent variables (large and small values). Cook’s distance is a measure of how great an effect each point has on the estimates of the parameters in the regression equation. all other results for that observation are also displayed. since the Studentized deleted residual results in much larger values for outliers than the Studentized residual. the suggested value is 4. Values above 1 indicate that a point is possibly influential.

. b1.. that is. x is the independent variable. or response...bk are the regression coefficients. for a given independent variable value. When the independent variable is varied. and b0. etc.. b3. a log or exponential function). by fitting a curve through the data that does not follow a straight line. variable. or predict the value of one variable from the value of another variable. If the relationship is not a linear polynomial (e. and Know there is only one independent variable The independent variable is the known. use Multiple Linear Regression. a corresponding value for the dependent. the corresponding value varies according to a polynomial function. About the Polynomial Regression Polynomial Regression assumes an association between the independent and dependent variables that fits the general equation for a polynomial of order k where y is the dependent variable. a first order polynomial is a straight line. The order of the polynomial k is the highest exponent of the independent variable. a second order (quadratic) polynomial is a parabola. b2. variable is produced..369 Prediction and Correlation where small changes in the independent variables can have large effects on the predicted values of the dependent variable. use Nonlinear Regression. If the relationships between the independent variables and the dependent variables is first order (a straight line). or predictor. Polynomial Regression Use Polynomial Regression to when you: Want to predict a trend in the data. the possible values for the dependent variable are assumed to be normally distributed and have equal variance.g. As the value for x varies. Polynomial Regression is a parametric test.

etc. 6. Enter or arrange your data in the worksheet. 5. see “Polynomial Regression Report GraphsPolynomial Regression Report Graphs” below. see “Interpreting Incremental Polynomial Regression Results” on page 379. For more information. 3. Select Polynomial Regression from the Standard toolbar or from the menus select: Statistics Regression Polynomial 4. For more information. the polynomial regression procedure yields more reliable results than simply performing a Multiple Linear Regression using x. View and interpret the order only polynomial regression reports. . Set the polynomial regression options.370 Chapter 8 Note: If you are fitting a polynomial to data. x2. View and interpret the incremental polynomial regression reports. 2. For more information. 7. Generate report graphs. Arranging Polynomial Regression Data Place the data for the dependent variable in one column and the corresponding data for the observed independent variable in another column. For more information. Performing a Polynomial Regression To perform a Polynomial Regression: 1. For more information. see “Arranging Polynomial Regression Data” on page 370. Run the test. see “Setting Polynomial Regression Options” on page 371. see “Interpreting Order Only Polynomial Regression Results” on page 382. as the independent variables.

For more information. 2. Specify the type of polynomial regression you want to perform (incremental evaluation or order only). To change Polynomial Regression options: 1.371 Prediction and Correlation Observations containing missing values are ignored. and Durbin-Watson options. Display confidence intervals and save them to the worksheet. Select Polynomial Regression from the drop-down list in the Standard toolbar. Display the power. From the menus select: Statistics Current Test Options The Options for Polynomial Regression dialog box opens. Setting Polynomial Regression Options Use the Polynomial Regression options to: Set the polynomial order. Click the Criterion tab to view to the Normality. then the following tabs appear: Criterion. If you select Incremental Order as the regression type. 3. and all columns must be equal in length. Set the assumption checking options. see “Options for Polynomial Regression: Criterion” on page 372. . Specify the residuals to display and save them to the worksheet. and want to select your data before you run the test. Display the PRESS prediction error and the standardized coefficients. If you are going to run the test after changing test options. If you select Order Only. only the Criterion options are available. drag the pointer over your data. Constant Variance.

More Statistics. You can also type the desired value on the drop-down box. 4. Click Residuals tab to view the residual options. Polynomial Order. click Run Test. Click the Assumption Checking tab to view the Normality. To accept the current settings and close the dialog box. Click the Post Hoc Tests tab to view the Power options. click OK. Click the Criterion tab to return to the Normality. More Statistics. see “Running a Polynomial Regression” on page 378. For more information. Constant Variance. Click the More Statistics tab to view the confidence intervals. Click the Assumption Checking tab to view the Normality. and Durbin-Watson options. PRESS Prediction Error. Standardized Coefficients options. PRESS Prediction Error. Residuals. Post Hoc. For more information. and Durbin-Watson options. see “Options for Polynomial Regression: Post Hoc Tests” on page 377. This value is used either as the maximum order to evaluate or the specific order to compute. Residuals. To continue the test. 5. Use these options to specify the polynomial order to use and the type of polynomial to use to evaluate your data. Post Hoc. For more information. For more information.372 Chapter 8 Assumption Checking. Click the More Statistics tab to view the confidence intervals. Constant Variance. see “Options for Polynomial Regression: More Statistics” on page 376. see “Options for Polynomial Regression: Assumption Checking” on page 373. . Options for Polynomial Regression: Criterion Select the Criterion tab from the options dialog to view the Polynomial Order and Regression options. and Durbin-Watson options. Standardized Coefficients options. Options settings are saved between SigmaPlot sessions. Assumption Checking. Constant Variance. Select the desired polynomial order from the Polynomial Order drop-down list. For more information. Click the Post Hoc Tests tab to view the Power options. see “Options for Polynomial Regression: Residuals” on page 375. Click the Residuals tab to view the residual options. Criterion.

Only disable these options if you are certain that the data was sampled from normal populations with constant variance and that the residuals are independent of each other. it is used to evaluate the order for the best model to use. Select Incremental Evaluation if you need to find the order of polynomial to use. SigmaPlot uses the Kolmogorov-Smirnov test to test for a normally distributed population. SigmaPlot tests for constant variance by computing the Spearman rank correlation between the absolute values of the residuals and the observed value of the dependent variable. Select Order Only from the Regression drop-down list to fit only the order specified in the Polynomial Order edit box to the data. and Durbin-Watson options. A polynomial regression assumes: That the source population is normally distributed about the regression. Normality Testing. Once the order is determined. instead. Options for Polynomial Regression: Assumption Checking Select the Assumption Checking tab from the options dialog to view the Normality. The P value determines the probability of being incorrect in concluding that the data is not normally distributed (P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed).e. These options test your data for its suitability for regression analysis by checking three assumptions that a polynomial regression makes about the data. If the P computed by the test is greater than the P set here. Incremental Evaluation. All assumption checking options are selected by default. That the residuals are independent of each other. Note this option does not display all regression results. The variance of the dependent variable in the source population is constant regardless of the value of the independent variable(s). the test passes. run an order only polynomial regression to obtain complete regression results. . the constant variance assumption may be violated. Constant Variance Testing. P Values for Normality and Constant Variance. When this correlation is significant. or transforming one or more of the independent variables to stabilize the variance.373 Prediction and Correlation Order Only. one that more closely follows the pattern of the data). This option evaluates each polynomial order equation starting at zero and increasing to the value specified in the Polynomial Order box. Constant Variance.. and you should consider trying a different model (i.

the Durbin-Watson statistic will be 2. there are extreme conditions of data distribution that these tests cannot detect.0 more than the entered value. To relax the requirement of independence. To require a stricter adherence to independence.10) require less evidence to conclude that the residuals are not normally distributed or the constant variance assumption is violated. To relax the requirement of normality and/or constant variance.50.0 box. Durbin-Watson Statistic. The suggested deviation value is 0. 0. . The Durbin-Watson statistic is a measure of serial correlation between the residuals. i.05. a P value of 0.0. Difference from 2 Value. Requiring smaller values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal. decrease P. and the deviation between the observation and the regression line at one time is related to the deviation at the previous time. the suggested value in SigmaPlot is 0.e. The residuals are often correlated when the independent variable is time. these conditions should be easily detected by visually examining the data without resorting to the automatic assumption tests.0. Enter the acceptable deviation from 2.5 flag the residuals as correlated..374 Chapter 8 To require a stricter adherence to normality and/or constant variance.5 or less than 1. For example. increase the P value. If the residuals are not correlated. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions. Larger values of P (for example. SigmaPlot warns you that the residuals may not be independent.05. If the computed DurbinWatson statistic deviates from 2. SigmaPlot uses the Durbin-Watson statistic to test residuals for their independence of each other.01 for the normality test requires greater deviations from normality to flag the data as non-normal than a value of 0. However.0 that you consider as evidence of a serial correlation in the Difference for 2. increase the acceptable difference from 2. Durbin-Watson Statistic values greater than 2. Note: Although the assumption tests are robust in detecting data from populations that are non-normal or with non-constant variances. decrease the acceptable difference from 2.

The raw residuals are the differences between the predicted and observed values of the dependent variables. The Studentized residuals tend to be distributed according to the Student t distribution. the values appear in the report but are not assigned to the worksheet. The standard error of the residuals is essentially the standard deviation of the residuals.. To assign the raw residuals to a worksheet column. Raw Residuals. These data points are considered to have "large" standardized residuals. i. select the number of the desired column from the corresponding drop-down list.5. make sure this check box is selected. the values appear in the report but are not assigned to the worksheet. Studentized residuals scale the standardized residuals by taking into account the greater precision of the regression line near the middle of the data versus the extremes. The standardized residual is the residual divided by the standard error of the estimate. then save the results to the worksheet. SigmaPlot automatically flags data points lying outside of the confidence interval specified in the corresponding box. and is a measure of variability around the regression line. You can change which data points are flagged by editing the value in the Flag Values > edit box. Predicted Values. Studentized Residuals. Raw. If you select none and the Predicted Values check box is selected. Studentized. SigmaPlot automatically flags data points with "large" values of the Studentized . To include raw residuals in the report. If you select none from the drop-down list and the Raw check box is selected. Click the selected check box if you do not want to include raw residuals in the worksheet. Select Standardized Residuals to include them in the report. Use this option to calculate the predicted value of the dependent variable for each observed value of the independent variable(s).e. select the worksheet column you want to save the predicted values to from the corresponding drop-down list. The suggested residual value is 2. Select Studentized Residuals to include them in the report. Standardized Residuals.375 Prediction and Correlation Options for Polynomial Regression: Residuals Select the Residuals tab in the Options for Polynomial Regression dialog to view the Predicted Values. To assign predicted values to a worksheet column. Standardized. Click the selected check box if you do not want to include raw residuals in the worksheet. so the t distribution can be used to define "large" values of the Studentized residuals. Studentized Deleted. outlying data points. and Report Flagged Values Only options.

The confidence level can be any value from 1 to 99. i.e. regression. outlying data points. Confidence Interval for the Population. outlying data points. select Regression and then specify a confidence level by entering a value in the percentage box. Clear the selected check box if you do not want to include the confidence intervals for the population in the report. To include confidence intervals for the regression in the report. Note: Both Studentized and Studentized deleted residuals use the same confidence interval setting to determine outlying points. Studentized deleted residuals are similar to the Studentized residual. select the column number of the first column you want to save the intervals to from the Starting in Column drop-down list. the suggested data points flagged lie outside the 95% confidence interval for the regression population. select Report Flagged Values Only. with the specified level of confidence. Confidence Interval for the Regression. or both and then save them to the worksheet.376 Chapter 8 residuals. Options for Polynomial Regression: More Statistics Select the More Statistics tab in the options dialog to view the confidence interval options. To include confidence intervals for the population in the report. the suggested data points flagged lie outside the 95% confidence interval for the regression population. To save the confidence intervals to the worksheet. Studentized Deleted Residuals. The suggested confidence level for all intervals is 95%.. Report Flagged Values Only . To only include only the flagged standardized and Studentized deleted residuals in the report. The confidence interval for the population gives the range of values that define the region that contains the population from which the observations were drawn. SigmaPlot can automatically flag data points with "large" values of the studentized deleted residual. i. You can set the confidence interval for the population. except that the residual values are obtained by computing the regression equation without using the data point in question. select Population. The selected intervals are saved to the .e. The confidence interval for the regression line gives the range of values that defines the region containing the true mean relationship between the dependent and independent variables. Saving Confidence Intervals to the Worksheet..

s xi = standard deviation of the independent variable xi. To include the standardized coefficients in the report. make sure the Standardized Coefficients check box is selected. Change the alpha value by editing the number in the Use Alpha Value edit box. Larger values of α make it easier to conclude that there is a relationship. . Leave this check box selected to evaluate the fit of the equation using the PRESS statistic. Click the selected check box if you do not want to include the standardized coefficients in the worksheet. The suggested value is α = 0. but also increase the risk of reporting a false positive.05. Select PRESS Prediction Error to measure how well the regression equation fits the data. but a greater possibility of concluding there is no relationship when one exists.377 Prediction and Correlation worksheet starting with the specified column and continuing with successive columns in the worksheet. These are the coefficients of the regression equation standardized to dimensionless values. The alpha ( α ) is the acceptable probability of incorrectly concluding there is a relationship. This indicates that a one in twenty chance of error is acceptable.05. and sy = standard deviation of dependent variable y. Options for Polynomial Regression: Post Hoc Tests Click the Post Hoc Tests tab on the Options for Polynomial Regression dialog box to view the Power options. The power of a regression is the power to detect the observed relationship in the data. where bi = regression coefficient. Standardized Coefficients. PRESS Prediction Error. Select Power to compute the power for the polynomial regression data. or that you are willing to conclude there is a significant relationship when P < 0. Smaller values of α result in stricter requirements before concluding there is a significant relationship.

select the assignment in the list. and the second column is assigned to the Independent Variable row. the report appears displaying the results of the Polynomial Regression. . You can also clear a column assignment by double-clicking it in the Selected Columns list. If you want to select your data before you run the regression. SigmaPlot warns you. 4. From the menus select: Statistics Regression Polynomial The Pick Columns for Polynomial Regression dialog box appears. drag the pointer over your data. then select new column from the worksheet. When the test is complete. SigmaPlot performs the tests for normality (Kolmogorov-Smirnov). constant variance. 6. 5. To run a Polynomial Regression: 1. If you selected columns before you chose the test. You are only prompted for one dependent and one independent variable column. and/or independent residuals. If your data fail either of these tests. or select the columns from the Data for Dependent and Independent drop-down list. select the columns in the worksheet. If you elected to test for normality. Select Polynomial Regression from the drop-down list on the Standard toolbar. 2. the dialog prompts you to pick your data. If you have not selected columns. and independent residuals.378 Chapter 8 Running a Polynomial Regression To run a Polynomial Regression you need to select the data to test. the selected columns appear in the column list. The title of selected columns appears in each row. The first selected column is assigned to the Dependent Variable row in the Selected Columns list. Click Finish to run the regression. constant variance. 3. You use the Pick Columns dialog box to select the worksheet columns with the data you want to test. To assign the desired worksheet columns to the Selected Columns list. To change your selections.

and P value for each order equation are listed. F value. x is the independent variable. The equations take the form: where y is the dependent variable. all equations from zero order up to the maximum order specified in the Options for Polynomial Regressions dialog box are listed. For incremental polynomial regression. they are placed in the specified data columns and are labeled by content and source column. and selected to place predicted values. Regression Equation These are the regression equations for each order.. You can turn off this text on the Options dialog box. .. and/or other test results in the worksheet. Note: Worksheet results can only be obtained using order only polynomial regression.bk are the regression coefficients The order k of the polynomial is the largest exponent of the independent variable. b2. To move to the next or the previous page in the report. with the values of the coefficients in place. b3.. starting with zero order and increasing to the specified order. Result Explanations In addition to the numerical results. and b0. residuals. The residual and incremental mean square. Note: The report scroll bars only scroll to the top and bottom of the current page. Interpreting Incremental Polynomial Regression Results Incremental Order Polynomial Regression results display the regression equations for each order polynomial. b1. and incremental and overall R2.379 Prediction and Correlation If you are performing a regression using one order only. expanded explanations of the results may also appear.. use the buttons in the formatting toolbar to move one page up and down in the report. You can also set the number of decimal places to display in the Options dialog box.

It is a measure of the increase in the regression sum of squares (and reduction in the sum of squared residuals) obtained when the highest order term of the independent variable is added to the regression equation. The residual mean square is a measure of the variation of the residuals about the regression line. The incremental or Type I sum of squares. R2 is near 0 when the values of the independent variable poorly predict the dependent variables F Value. is a measure of the new predictive information contained in the added power of the independent variable. The F test statistic gauges the ability of the independent variable in predicting the dependent variable. Overall R2 values nearer to 1 indicate that the curve is a good description of the relation between the independent and dependent variables. is a measure of how well the regression model describes the data. R2. The sum of squares are measures of variability of the dependent variable. The residual sum of squares is a measure of the size of the residuals. The incremental mean square is a measure of the reduction in variation of the residuals about the regression equation gained with this order polynomial. . which are the differences between the observed values of the dependent variable and the values predicted by the regression model. as it is added to the equation.380 Chapter 8 Incremental Results MSres (Residual Mean Square). DFincr =1. after all lower order terms have been entered. The overall R2 is the actual R2 of this order polynomial. Since one order is added in each step. The incremental R2 is the gain in R2 obtained with this order polynomial over the previous order polynomial. the coefficient of determination. MSincr (Incremental Mean Square).

you can conclude that adding the order of the independent variables predicts the dependent variable significantly better than the previous model. you can conclude that the independent variable can be used to predict the dependent variable when P < 0. When the overall F ratio is around 1. The overall P value is the probability of being wrong that the order of the polynomial correctly predicts the dependent variable. P Value. the probability of falsely rejecting the null hypothesis. or committing a Type I error. The incremental P value is the change in probability of being wrong that the added independent variable order improves the prediction of the dependent variable. the greater the probability that there is an association.. you can conclude that there is no association between the independent variables (i. The "best" order polynomial to use is generally the highest order polynomial that produces a marked improvement in predictive ability. the data is consistent with the null hypothesis that all the samples are just randomly distributed). It is the ratio.05. Overall F value gauges the contribution of all orders of the independent variable in predicting the dependent variable.. It is the ratio.e. based on F). .381 Prediction and Correlation The incremental F value gauges the increase in contribution of each added order of the independent variable in predicting the dependent variable. The smaller the P value. Traditionally. The P value is the probability of being wrong in concluding that there is a true association between the dependent and independent variables (i. P is the P value calculated for F.e. If the incremental F is large and the overall F jumps to a large number.

Failure of the normality test can indicate the presence of outlying influential points or an incorrect regression model. The other results displayed in the report are selected in the Options for Polynomial Regression dialog. The first model that has a significant increase in the incremental F value is generally the best model to use. use the buttons in the formatting toolbar to move one page up and down in the report. Because the R2 value increases as the order increases. Constant Variance. a warning appears in the report. Normality test result displays whether or not the polynomial model passed or failed the test of the assumption that the source population is normally distributed around the regression curve. F. Note: The report scroll bars only scroll to the top and bottom of the current page. To move to the next or the previous page in the report. you also want to use the simplest model that adequately describes the data.382 Chapter 8 Assumption Testing Normality. All regression techniques require a normal distribution of the residuals about the regression curve. mean squares. . Choosing the Best Model The smaller the residual sum of squares and mean square. Interpreting Order Only Polynomial Regression Results The report for an order only Polynomial Regression displays the equation with the computed coefficients for the curve. When this assumption may be violated. The constant variance test results list whether or not that polynomial model passed the test for constant variance of the residuals about the regression. and the P value computed for that order polynomial. R and R2. and the P value for the regression equation. and the P value calculated by the test. All regression requires a source population to be normally distributed about the regression curve. the closer the curve matches the data at those values of the independent variable.

with the missing values. R2 values near 0 indicate that the values of the independent variable do not predict the dependent variables.. It is the ratio . The residual mean square is a measure of the variation of the residuals about the regression curve. R2 values near 1 indicate that the curve is a good description of the relation between the independent and dependent variables. b2. You can also set the number of decimal places to display in the Options dialog box.bk are the regression coefficients. The coefficient of determination R2 is a measure of how well the regression model describes the data. This equation takes the form: where y is the dependent variable.383 Prediction and Correlation Result Explanations In addition to the numerical results. Regression Equation This is the equation with the values of the coefficients in place. and b0. The number of observations N is also displayed. expanded explanations of the results may also appear. The order of the polynomial is the exponent of the independent variable. b1. The mean square provides an estimate of the population variance. b3. F Statistic.. You can turn off this text on the Options dialog box.. Analysis of Variance (ANOVA) MSres (Residual Mean Square) . The F test statistic gauges the contribution of the regression equation to predict the dependent variable. if any.. x is the independent variable. or R2.

the greater the probability that the variables are correlated. The smaller the PRESS statistic.. the "unexplained variability" is smaller than what is expected from random sampling variability of the dependent variable about its mean). the probability of falsely rejecting the null hypothesis. the more this value differs from 2. PRESS Statistic. . with that point deleted from the computation of the regression equation. the Durbin-Watson statistic will be 2.. the data is consistent with the null hypothesis that all the samples are just randomly distributed). based on F). the better the predictive ability of the model. This result appears if it was selected in the Options for Polynomial Regression dialog. you can conclude that the independent variable contributes to the prediction of the dependent variable (i. P Value. The underlying population generally falls within about two standard errors of the observed sample. PRESS. Standard Error of the Estimate The standard error of the estimate s y x is a measure of the actual variability about the regression line of the underlying population. or committing a Type I error. The smaller the P value..e. The PRESS statistic is computed by summing the squares of the prediction errors (the differences between predicted and observed values) for each observation.384 Chapter 8 If F is a large number. is a gauge of how well a regression model predicts new data. the Predicted Residual Error Sum of Squares. Durbin-Watson Statistic The Durbin-Watson statistic is a measure of correlation between the residuals.e. you can conclude that there is no association between the variables (i.e. If the residuals are not correlated. the greater the likelihood that the residuals are correlated. The P value is the probability of being wrong in concluding that there is a true association between the variables (i. If the F ratio is around 1. P is the P value calculated for F.

you should consider trying a different model (i. Row. When the constant variance assumption may be violated. or transforming the independent variable to stabilize the variance and obtain more accurate estimates of the parameters in the regression equation. however. The trigger values to flag residuals as outliers are set in the Options for Polynomial Regression dialog. If you receive this warning. This is the row number of the observation. Regression Diagnostics The regression diagnostic results display only the values for the predicted values. Failure of the normality test can indicate the presence of outlying influential points or an incorrect regression model. All results that qualify as outlying values are flagged with a < symbol. Constant Variance Test The constant variance test result displays whether or not the polynomial model passed or failed the test of the assumption that the variance of the dependent variable in the source population is constant regardless of the value of the independent variable. one that more closely follows the pattern of the data). residual results. and the P value calculated by the test. and the P value calculated by the test. and other diagnostics selected in the Options for Polynomial Regression dialog. all other results for that observation are also displayed.385 Prediction and Correlation Normality Test The normality test results display whether or not the polynomial model passed or failed the test of the assumption that the source population is normally distributed around the regression curve. a warning appears in the report. If you selected Report Cases with Outliers Only.. When this assumption may be violated. only those observations that have one or more residuals flagged as outliers are reported. This result appears unless you disabled constant variance testing in the Options for Polynomial Regression dialog box. All regression requires a source population to be normally distributed about the regression curve. .e. a warning appears in the report. This result appears unless you disabled normality testing in the Options for Polynomial Regression dialog box.

and the confidence interval is 100 (1. and about 95% of the standardized residuals have values between -2 and +2. The specified confidence level can be any value from 1 to 99. A larger standardized residual indicates that the point is far from the regression line. centered at the predicted value. Standardized Residuals. the suggested confidence level for both intervals is 95%. . This is the value for the dependent variable predicted by the regression model for each observation. the difference between the predicted and observed values for the dependent variables. the suggested value flagged as an outlier is 2. the suggested confidence level is 95%. Predicted. These are the raw residuals. for the specified level of confidence. centered at the predicted value. This is the row number of the observation. where α is the acceptable probability of incorrectly concluding that the coefficient is different than zero.386 Chapter 8 Residuals. The specified confidence level can be any value from 1 to 99. for the specified level of confidence. This can also be described as P < α (alpha). The standardized residual is the raw residual divided by the standard error of the estimate s y x . Population Confidence Interval . Confidence Intervals These results are displayed if you selected them in the Options for Polynomial Regression dialog box.5. about 66% of the standardized residuals have values between -1 and +1. The specified confidence level can be any value from 1 to 99. These are the values that define the region containing the true relationship between the dependent and independent variables. If the residuals are normally distributed about the regression line. If the confidence interval does not include zero. Row. These are the values that define the region containing the population from which the observations were drawn. This result is displayed if you selected it in the Options for Polynomial Regression dialog box. you can conclude that the coefficient is different than zero with the level of confidence specified. This result is displayed if you selected it in the Options for Polynomial Regression dialog box.α ). the suggested confidence level is 95%. Regression.

Stepwise Linear Regression Use Stepwise Linear Regression when you: . For more information. Select the type of graph you want to create from the Graph Type list. They include a: Histogram of the residuals. see “2D Line/Scatter Plots of the Regressions with Prediction and Confidence Intervals” on page 550. Normal probability plot of the residuals. see “Bar Chart of the Standardized Residuals” on page 546. see “Normal Probability Plot” on page 549. from the menus select: Graph Create Result Graph The Create Result Graph dialog box appears displaying the types of graphs available for the Polynomial Regression report. see “Histogram of Residuals” on page 547. Scatter plot of the residuals. With the Polynomial Regression report in view. then click OK. 2. Creating Polynomial Regression Report Graphs To generate a report graph of Polynomial Regression report data: 1. For more information. For more information. Bar chart of the standardized residuals. Line/scatter plot of the regression with one independent variable and confidence and prediction intervals. The selected graph appears in a graph window.387 Prediction and Correlation Polynomial Regression Report Graphs You can generate up to five graphs using the results from a Polynomial Regression. For more information. 1. For more information. see “Scatter Plot of the Residuals” on page 545. or double-click the desired graph in the list.

the independent variable that adds the next largest amount of information is entered second.. the corresponding value for y either increases or decreases. . Do not know which independent variables contribute to predicting the dependent variable.. b2. by fitting a line or plane (or hyperplane) through the data. About Stepwise Linear Regression Stepwise Regression is a technique for selecting independent variables for a Multiple Linear Regression equation from a list of candidate variables. use Multiple Linear Regression. After each variable is entered.. variable. or predict the value of one variable from the values of one or more other variables.. or under specifying or over specifying the model. If you want to find the few best equations from all possible models. If you already know the independent variables you want to include. b1. Using Stepwise Regression instead of regular Multiple Linear Regression avoids using extraneous variables. Stepwise Regression assumes an association between the one or more independent variables and a dependent variable that fits the general equation for a multidimensional plane: where y is the dependent variable... The independent variable is the known. use Polynomial or Nonlinear Regression. Stepwise Regression determines which independent variables to use by adding or removing selected independent variables from the equation. x2. xk are the independent variables. the independent variable that produces the best prediction of the dependent variable (and has an F value higher than a specified F-to-Enter) is entered into the equation first. In Forward Stepwise Regression. x1. and so on. depending on the sign of bi. the F value of each variable already entered into the equation is checked. and you want to find the model with suitable independent variables by adding or removing independent variables from the equation. x3. use Best Subsets Regression. As the values for xi vary.. or predicted.388 Chapter 8 Want to predict a trend in the data. There are two approaches to Stepwise Regression: Forward Stepwise Regression. . If the relationship is not a straight line or plane. and b0.bk are the regression coefficients. and any variables with small F values (below a specified F-to-Remove value) are removed.

389 Prediction and Correlation This process is repeated until adding or removing variables does not significantly improve the prediction of the dependent variable. Performing a Stepwise Linear Regression To perform a Stepwise Linear Regression: 1. Enter or arrange your data in the worksheet. . the F value of each variable removed from the equation is checked. 4. Run the test. the next least important independent variable is removed. Select Stepwise Linear Regression from the Standard toolbar or from the menus select: Statistics Regression Stepwise Forward or Statistics Regression Stepwise Backward 3. Note: Forward and Backward Stepwise Regression using the same potential variables do not necessarily yield the same final regression model when there is multicollinearity among the possible independent variables. and any variables with large F values (above a specified F-to-Enter value) are re-entered into the equation. For more information. all variables are entered into the equation. The independent variable that contributes the least to the prediction (and has an F value lower than a specified F-to-Remove) is removed from the equation. 2. For more information. Backward Stepwise Regression. see “Interpreting Stepwise Regression Results” on page 413. In Backward Stepwise Regression. see “Running a Stepwise Regression” on page 412. and so on. For more information. see “Arranging Stepwise Regression Data” on page 390. This process is repeated until removing or adding variables does not significantly improve the prediction of the dependent variable. View and interpret the Stepwise Linear Regression report. After each variable is removed.

Set the number of steps permitted before the stepwise algorithm stops. replaced. Display the PRESS statistic error. Arranging Stepwise Regression Data The data format for a Stepwise Linear Regression consists of the data for the independent variables in one or more columns and the corresponding data for the observed dependent variable in a single column. 3. 2.390 Chapter 8 5. To change the Forward Stepwise Regression options: 1. From the menus select: Statistics Current Test Options . see “Stepwise Regression Report Graphs” on page 423. For more information. Select Forward Stepwise Regression from the drop-down list in the Standard toolbar. Generate report graphs. Display standardized regression coefficients. Display the power of the regression. and/or removed into or from a regression equation during forward or backwards stepwise regression. Any observations containing missing values are ignored. If you are going to run the test after changing test options. drag the pointer over your data. and want to select your data before you run the test. Set confidence interval options. Specify the residuals to display and save them to the worksheet. and the columns must be equal in length. Setting Forward Stepwise Regression Options Use the Stepwise Regression options to: Specify which independent variables are entered. Set assumption checking options. deleted.

For more information. For more information. To accept the current settings and close the dialog box. PRESS Prediction Error. Click the Residuals tab to view the residual options. 5. Click the More Statistics tab to view the confidence intervals. Click the Assumption Checking tab to view the Normality. see “Running a Stepwise Regression” on page 412. and Number of Steps options. For more information. and Durbin-Watson options. 4. Click the Post Hoc Tests tab to view the Power options. The F-to-Enter value controls which independent variables are entered into the regression equation during forward stepwise regression or replaced after each step during backwards stepwise regression. For more information. For more information. Options for Forward Stepwise Regression: Criterion Select the Criterion tab from the options dialog box to view the F-to-Enter. Other Diagnostics. The F-to-Enter value is the minimum incremental F value associated with an independent variable before it can be entered into the regression equation. More Statistics. Assumption Checking. and to specify when the stepwise algorithm stops. and Number of Stepsoptions. F-to-Enter Value. Use these options to specify the independent variables that are entered into. Options settings are saved between SigmaPlot sessions. . Click the Criterion tab to return to the F-to-Enter. replaced. F-toRemove. or removed from the regression equation during the stepwise regression. Standardized Coefficients options. click OK. For more information. see “Options for Forward Stepwise Regression: Assumption Checking” on page 393. click Run Test. see “Options for Forward Stepwise Regression: More Statistics” on page 396 . see “Options for Forward Stepwise Regression: Criterion” on page 391. F-to-Remove. see “Options for Forward Stepwise Regression: Residuals” on page 394. see “Options for Forward Stepwise Regression: Other Diagnostics” on page 397 .391 Prediction and Correlation The Options for Forward Stepwise Regression dialog box appears with five tabs: Criterion. Residuals. To continue the test. Constant Variance. All independent variables producing incremental F values above the F-to-Enter value are added to the model.

However. to avoid cycling variables in and out of the regression model. F-to-Remove Value. Note that if the algorithm stops because it ran out of steps. Use this option to set the maximum number of steps permitted before the stepwise algorithm stops. The suggested F-to-Remove value is 3. The F-to-Remove value controls which independent variables are deleted from the regression equation during backwards stepwise regression. Note: The F-to-Remove value should always be less than or equal to the F-to-Enter value. but may produce redundant variables and result in multicollinearity.392 Chapter 8 The suggested F-to-Enter value is 4. All independent variables producing incremental F values below the F-to-Remove value are deleted from the model. but may stop too soon and exclude important variables.g. The F-to-Remove is the maximum incremental F value associated with an independent variable before it can be removed from the regression equation. Note: If you are performing backwards stepwise regression and you want any variable that has been removed to remain deleted. 100000. Note: If you are performing forwards stepwise regression and you want any variable that has been entered to remain in the equation. Important variables may also be deleted. Reducing the F-to-Remove value makes it easier to retain a variable in the regression equation because variables that have smaller effects on the ability of the regression equation to predict the dependent variable are still accepted.. as variables that contain more predictive value can be removed. increase the F-to-Enter value to a large number.0.9. set the F-to-Remove value to zero. Reducing the F-to-Enter value makes it easier to add a variable. . Note: The F-to-Enter value should always be greater than or equal to the F-to-Remove value. e. Increasing the F-to-Remove value makes it easier to delete variables from the equation. Increasing F-to-Enter requires a potential independent variable to have a greater effect on the ability of the regression equation to predict the dependent variable before it is accepted. Number of Steps. because it relaxes the importance of a variable required before it is accepted. to avoid cycling variables in and out of the regression model. resulting in multicollinearity. or removed after each step in forward stepwise regression. however. the regression may still contain redundant variables.

Constant Variance Testing. If the P computed by the test is greater than the P set here. and you should consider trying a different model (i. increase the P value. These options test your data for its suitability for regression analysis by checking three assumptions that a Stepwise Linear Regression makes about the data. SigmaPlot uses the Kolmogorov-Smirnov test to test for a normally distributed population. All assumption checking options are selected by default. the constant variance assumption may be violated. The suggested number of steps is 20 added or deleted independent variables. Normality Testing. Only disable these options if you are certain that the data was sampled from normal populations with constant variance and that the residuals are independent of each other.05. P Values for Normality and Constant Variance The P value determines the probability of being incorrect in concluding that the data is not normally distributed (P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed). one that more closely follows the pattern of the data). The variance of the dependent variable in the source population is constant regardless of the value of the independent variable(s).e. A Stepwise Linear Regression assumes: That the source population is normally distributed about the regression. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions. the test passes. SigmaPlot tests for constant variance by computing the Spearman rank correlation between the absolute values of the residuals and the observed value of the dependent variable.393 Prediction and Correlation the results are probably not reliable. To require a stricter adherence to normality and/or constant variance. That the residuals are independent of each other. When this correlation is significant. the suggested value in SigmaPlot is 0. and Durbin-Watson options. Options for Forward Stepwise Regression: Assumption Checking Select the Assumption Checking tab from the options dialog box to view the Normality. or transforming one or more of the independent variables to stabilize the variance. Constant Variance.. .

e. If the computed DurbinWatson statistic deviates from 2. To require a stricter adherence to independence. increase the acceptable difference from 2. Studentized Deleted.SigmaPlot uses the Durbin-Watson statistic to test residuals for their independence of each other.50.0. If the residuals are not correlated.10) require less evidence to conclude that the residuals are not normally distributed or the constant variance assumption is violated.0 box. decrease P. Note: Although the assumption tests are robust in detecting data from populations that are non-normal or with non-constant variances. Studentized. there are extreme conditions of data distribution that these tests cannot detect. Options for Forward Stepwise Regression: Residuals Select the Residuals tab in the options dialog box to view the Predicted Values. The Durbin-Watson statistic is a measure of serial correlation between the residuals.05. these conditions should be easily detected by visually examining the data without resorting to the automatic assumption tests. decrease the acceptable difference from 2. the Durbin-Watson statistic will be 2.0. Raw. Difference from 2 Value. 0. Requiring smaller values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal. However. Standardized. i. and the deviation between the observation and the regression line at one time are related to the deviation at the previous time.0 more than the entered value. a P value of 0.5 or less than 1. For example. Enter the acceptable deviation from 2. The suggested deviation value is 0. Durbin-Watson Statistic.01 for the normality test requires greater deviations from normality to flag the data as non-normal than a value of 0. Durbin-Watson Statistic values greater than 2. and Report Flagged Values Only options.5 flag the residuals as correlated. SigmaPlot warns you that the residuals may not be independent.394 Chapter 8 Larger values of P (for example.. . To relax the requirement of independence. To relax the requirement of normality and/or constant variance. The residuals are often correlated when the independent variable is time.0 that you consider as evidence of a serial correlation in the Difference for 2.

from the corresponding dropdown list.. the values appear in the report but are not assigned to the worksheet. If you select none and the Predicted Values check box is selected. The standardized residual is the residual divided by the standard error of the estimate. To include standardized residuals in the report. outlying data points. Click the selected check box if you do not want to include Studentized residuals in the worksheet. i. Standardized Residuals. and is a measure of variability around the regression line. the suggested data points flagged lie outside the 95% confidence interval for the regression population. Raw Residuals. To include raw residuals in the report. select the number of the desired column from the corresponding drop-down list. To assign predicted values to a worksheet column. select the worksheet column you want to save the predicted values to. outlying data points. Studentized Residuals. make sure this check box is selected. Select this option to calculate the predicted value of the dependent variable for each observed value of the independent variable(s). the values appear in the report but are not assigned to the worksheet.. These data points are considered to have "large" standardized residuals.e. The standard error of the residuals is essentially the standard deviation of the residuals. If you select none from the drop-down list and the Raw check box is selected. i. Click the selected check box if you do not want to include raw residuals in the worksheet. The Studentized residuals tend to be distributed according to the Student t distribution. Click the selected check box if you do not want to include raw residuals in the worksheet. then save the results to the data worksheet.e.395 Prediction and Correlation Predicted Values. SigmaPlot automatically flags data points lying outside of the confidence interval specified in the corresponding box. SigmaPlot automatically flags data points with "large" values of the Studentized residuals. Studentized residuals scale the standardized residuals by taking into account the greater precision of the regression line near the middle of the data versus the extremes. so the t distribution can be used to define "large" values of the Studentized residuals. To include Studentized residuals in the report. You can change which data points are flagged by editing the value in the Flag Values > edit box. make sure this check box is selected. To assign the raw residuals to a worksheet column. The raw residuals are the differences between the predicted and observed values of the dependent variables. . make sure this check box is selected.

with the specified level of confidence.396 Chapter 8 Studentized Deleted Residuals. . the suggested data points flagged lie outside the 95% confidence interval for the regression population. The confidence interval for the regression line gives the range of values that defines the region containing the true mean relationship between the dependent and independent variables. then specify a confidence level by entering a value in the percentage box. Report Flagged Values Only. outlying data points. make sure the Report Flagged Values Only check box is selected. Confidence Interval for the Regression. make sure this check box is selected. except that the residual values are obtained by computing the regression equation without using the data point in question. To include only the flagged standardized and Studentized deleted residuals in the report. You can set the confidence interval for the population. The confidence level can be any value from 1 to 99. The confidence interval for the population gives the range of values that define the region that contains the population from which the observations were drawn. regression. make sure the Regression check box is selected. To include confidence intervals for the regression in the report.. To include Studentized deleted residuals in the report. The suggested confidence level is 95%. SigmaPlot can automatically flag data points with "large" values of the Studentized deleted residual. Studentized deleted residuals are similar to the Studentized residual. Click the selected check box if you want to include the confidence intervals for the population in the report. Confidence Interval for the Population.e. Click the selected check box if you do not want to include Studentized deleted residuals in the worksheet. To include confidence intervals for the population in the report. i. Clear this option to include all standardized and studentized residuals in the report. Note: Both Studentized and Studentized deleted residuals use the same confidence interval setting to determine outlying points. or both. make sure the Population check box is selected. Options for Forward Stepwise Regression: More Statistics Select the More Statistics tab in the options dialog to view the confidence interval options. and then save them to the worksheet. Click the selected check box if you do not want to include the confidence intervals for the population in the report.

that is. s xi = standard deviation of the independent variable xi. select Standardized Coefficients. where bi = regression coefficient. and sy = standard deviation of dependent variable y. . Most influential points are data points which are outliers. Options for Forward Stepwise Regression: Other Diagnostics Select the Other Diagnostics tab in the options dialog box to view the Influence. To save the confidence intervals to the worksheet. Clear the check box if you do not want to include the standardized coefficients in the worksheet. Use the left pointing arrow to move the other tabs back into view. To include the standardized coefficients in the report. PRESS Prediction Error. If Other Diagnostic is hidden. Influence options automatically detect instances of influential data points. These points can have a potentially disproportionately strong influence on the calculation of the regression line. The selected intervals are saved to the worksheet starting with the specified column and continuing with successive columns in the worksheet.397 Prediction and Correlation Clear the selected check box if you do not want to include the confidence intervals for the population in the report. Variance Inflation Factor and Poweroptions. These are the coefficients of the regression equation standardized to dimensionless values. Leave this check box selected to evaluate the fit of the equation using the PRESS statistic. The PRESS Prediction Error is a measure of how well the regression equation fits the data. You can use several influence tests to identify and quantify influential points. Standardized Coefficients. select the column number of the first column you want to save the intervals to from the Starting in Column drop-down list. Saving Confidence Intervals to the Worksheet. they do not do not "line up" with the rest of the data points. click the right pointing arrow to the right of the tabs to move it into view. Clear the selected check box if you do not want to include the PRESS statistic in the report.

those points that could have leverages greater than the specified value times the expected leverage.e. The suggested value is 2. increase this value. The expected leverage of a data point is: where there are k independent variables and n data points. Leverage is used to identify the potential influence of a point on the results of the regression equation. Cook’s distance is a measure of how great an effect each point has on the estimates of the parameters in the regression equation. Observations with leverages much higher than the expected leverages are potentially influential points. Select the DFFITS check box to compute this value for all points and flag influential points. Cook’s distance depends on both the values of the independent and dependent variables.. To avoid flagging more influential points. Leverage.0 standard errors. Cook’s Distance. to flag less influential points. which indicates that the point has a strong influence on the data. Predicted values that change by more than two standard errors when the data point is removed are considered to be influential. lower this value. Observations with high leverage tend to be at the extremes of the independent variables. i. where small changes in the independent variables can have large effects on the predicted values of the dependent variable. i. To avoid flagging more potentially influential points. decrease this value.398 Chapter 8 DFFITS. Cook’s distance assesses how much the values of the regression coefficients change if a point is deleted from the analysis. increase this value. The suggested value is 2.. Leverage depends only on the value of the independent variable(s).0 times the expected leverage for the regression. DFFITSi is the number of estimated standard errors that the predicted value changes for the ith data point when it is removed from the data set. to flag points with less potential influence. Select the Leverage check box to compute the leverage for each point and automatically flag potentially influential points. It is another measure of the influence of a data point on the prediction used to compute the regression coefficients.e. . those with DFFITS greater than the value specified in the Flag Values > edit box.

The most common form of structural multicollinearity occurs when a polynomial regression equation contains several powers of the independent variable. each variable has a correlation with the others).e. to flag less influential points. Because these powers (e. Sample-Based Multicollinearity. Cook’s distances exceeding 4 indicate that the point has a major effect on the values of the parameter estimates. height. Regression procedures assume that the independent variables are statistically independent of each other. When the independent variables are correlated. Variance Inflation Factor The Variance Inflation Factor option measures the multicollinearity of the independent variables. To avoid flagging more influential points.0. x2 . make sure the Report Flagged Values Only check box is selected. the estimates of the parameters in the regression model can become unreliable.) are correlated with each other. the parameter estimates can become unreliable. increase this value. or contain redundant information. i. Report Flagged Values Only. they contain some common information and "contaminate" the estimates of the parameters. or the linear combination of the independent variables in the fit. those with a Cook’s distance greater than the specified value. Sample-based multicollinearity occurs when the sample observations are collected in such a way that the independent variables are correlated (for example. structural multicollinearity occurs. this ideal situation rarely occurs in the real world.g. Structural Multicollinearity. The suggested value is 4. However. The parameters in regression models quantify the theoretically unique contribution of each independent variable to predicting the dependent variable. If the multicollinearity is severe. There are two types of multicollinearity. Including interaction terms in a regression equation can also result in structural multicollinearity..399 Prediction and Correlation Select the Cook’s Distance check box to compute this value for all points and flag influential points. Clear this option to include all influential points in the report. x. To include only the influential points flagged by the influential point tests in the report.. When the independent variables are correlated.. lower this value. if age. i. Cook’s distances above 1 indicate that a point is possibly influential. . etc. and weight are collected on children of varying ages.e. that the value of one independent variable does not affect the value of another. Structural multicollinearity occurs when the regression equation contains several independent variables which are functions of each other.

values above 10 indicate serious multicollinearity.400 Chapter 8 SigmaPlot can automatically detect multicollinear independent variables using the variance inflation factor. Structural multicollinearities can be resolved by centering the independent variable before forming the power or interaction terms. When the variance inflation factor is large. you can reference an appropriate statistics reference. To only include only the points flagged by the influential point tests and values exceeding the variance inflation threshold in the report.0 will be flagged as multicollinear. To allow greater correlation of the independent variables before flagging the data as multicollinear. For descriptions of how to handle multicollinearity. there are redundant variables in the regression model. The default threshold value is 4. the regression equation is over parameterized and one or more of the independent variables must be dropped to eliminate the multicollinearity. Flagging Multicollinear Data. Report Flagged Values Only.0. increase this value. Power Select the Other Diagnostics tab in the options dialog to view the Power options. . and the parameter estimates may not be reliable. The power of a regression is the power to detect the observed relationship in the data. The alpha ( α ) is the acceptable probability of incorrectly concluding there is a relationship. Variance inflation factor values above 4 suggest possible multicollinearity. click the right pointing arrow to the right of the tabs to move it into view. Sample-based multicollinearity can sometimes be resolved by collecting more data under other conditions to break up the correlation among the independent variables. If Other Diagnostic is hidden. To make this test more sensitive to possible multicollinearity. What to Do About Multicollinearity. meaning that any value greater than 4. make sure the Report Flagged Values Only check box is selected. If this is not possible. Use the left pointing arrow to move the other tabs back into view. Use the value in the Flag Values > edit box as a threshold for multicollinear variables. decrease this value. Clear this option to include all influential points in the report. Click the Other Diagnostics tab in the Options dialog to view the Variance Inflation Factor option.

Larger values of α make it easier to conclude that there is a relationship. Display the PRESS statistic error. deleted. or that you are willing to conclude there is a significant relationship when P < 0. Select Backward Stepwise Regression from the drop-down list in the Standard toolbar. To change the Backward Stepwise Regression options: 1. The suggested value is α = 0. 2. Display the power of the regression. drag the pointer over your data. Specify the residuals to display and save them to the worksheet. but a greater possibility of concluding there is no relationship when one exists. Setting Backward Stepwise Regression Options Use the Backward Stepwise Regression options to: Specify which independent variables are entered. Display standardized regression coefficients. but also increase the risk of reporting a false positive. This indicates that a one in twenty chance of error is acceptable. 3.401 Prediction and Correlation Check the Power check box to compute the power for the stepwise linear regression data.05. Set assumption checking options. replaced. and want to select your data before you run the test. From the menus select: Statistics Current Test Options . and/or removed into or from a regression equation during forward or backward stepwise regression. Set confidence interval options. Smaller values of α result in stricter requirements before concluding there is a significant relationship. Set the number of steps permitted before the stepwise algorithm stops. If you are going to run the test after changing test options. Change the alpha value by editing the number in the Alpha Value edit box.05.

Options for Backward Stepwise Regression: Criterion Select the Criterion tab from the options dialog box to view the F-to-Enter. Click the Criterion tab to return to the F-to-Enter. see “Options for Backward Stepwise Regression: Other Diagnostics” on page 408. Constant Variance. Click the Post Hoc Tests tab to view the Power options. Click the Assumption Checking tab to view the Normality. see “Options for Backward Stepwise Regression: More Statistics” on page 407. PRESS Prediction Error. The F-to-Enter value controls which independent variables are entered into the regression equation during forward stepwise regression or replaced after each step during backwards stepwise regression. Click the More Statistics tab to view the confidence intervals. F-to-Remove. click OK. see “Options for Backward Stepwise Regression: Residuals” on page 405. 5. see “Running a Stepwise Regression” on page 412. To accept the current settings and close the dialog box. 4. Residuals. and Number of Steps options. and Durbin-Watsonoptions. . Standardized Coefficients options. F-to-Enter Value. For more information. All independent variables producing incremental F values above the F-to-Enter value are added to the model. Assumption Checking. see “Options for Backward Stepwise Regression: Criterion” on page 402. For more information. For more information. The F-to-Enter value is the minimum incremental F value associated with an independent variable before it can be entered into the regression equation. To continue the test. Use these options to specify the independent variables that are entered into. Click the Residuals tab to view the residual options. For more information. see “Options for Backward Stepwise Regression: Assumption Checking” on page 404.402 Chapter 8 The Options for Backward Stepwise Regression dialog box appears with five tabs: Criterion. and Number of Stepsoptions. click Run Test. More Statistics. Other Diagnostics. F-toRemove. For more information. replaced. Options settings are saved between SigmaPlot sessions. and to specify when the stepwise algorithm stops. For more information. or removed from the regression equation during the stepwise regression.

However. Increasing the F-to-Remove value makes it easier to delete variables from the equation.g. Increasing F-to-Enter requires a potential independent variable to have a greater effect on the ability of the regression equation to predict the dependent variable before it is accepted. but may stop too soon and exclude important variables. Reducing the F-to-Remove value makes it easier to retain a variable in the regression equation because variables that have smaller effects on the ability of the regression equation to predict the dependent variable are still accepted. but may produce redundant variables and result in multicollinearity. e. Use this option to set the maximum number of steps permitted before the stepwise algorithm stops. The F-to-Remove is the maximum incremental F value associated with an independent variable before it can be removed from the regression equation. to avoid cycling variables in and out of the regression model.403 Prediction and Correlation The suggested F-to-Enter value is 4. Number of Steps. Important variables may also be deleted.0. F-to-Remove Value. All independent variables producing incremental F values below the F-to-Remove value are deleted from the model. Note: The F-to-Remove value should always be less than or equal to the F-to-Enter value. The F-to-Remove value controls which independent variables are deleted from the regression equation during backwards stepwise regression. Note that if the algorithm stops because it ran out of steps. as variables that contain more predictive values can be removed.. Note: If you are performing backwards stepwise regression and you want any variable that has been removed to remain deleted. however. . Reducing the F-to-Enter value makes it easier to add a variable. or removed after each step in backward stepwise regression. The suggested F-to-Remove value is 3. because it relaxes the importance of a variable required before it is accepted. Note: If you are performing backward stepwise regression and you want any variable that has been entered to remain in the equation. set the F-to-Remove value to zero. 100000. resulting in multicollinearity.9. increase the F-to-Enter value to a large number. to avoid cycling variables in and out of the regression model. Note: The F-to-Enter value should always be greater than or equal to the F-to-Remove value. the regression may still contain redundant variables.

and you should consider trying a different model (i.05. A Stepwise Linear Regression assumes: That the source population is normally distributed about the regression. Options for Backward Stepwise Regression: Assumption Checking Select the Assumption Checking tab from the options dialog box to view the Normality. Normality Testing. These options test your data for its suitability for regression analysis by checking three assumptions that a Stepwise Linear Regression makes about the data. When this correlation is significant.e. or transforming one or more of the independent variables to stabilize the variance. That the residuals are independent of each other. and Durbin-Watson options. SigmaPlot uses the Kolmogorov-Smirnov test to test for a normally distributed population. the suggested value in SigmaPlot is 0. increase the P value. The variance of the dependent variable in the source population is constant regardless of the value of the independent variable(s).404 Chapter 8 the results are probably not reliable.. the constant variance assumption may be violated. All assumption checking options are selected by default. Only disable these options if you are certain that the data was sampled from normal populations with constant variance and that the residuals are independent of each other. . P Values for Normality and Constant Variance The P value determines the probability of being incorrect in concluding that the data is not normally distributed (P value is the risk of falsely rejecting the null hypothesis that the data is normally distributed). If the P computed by the test is greater than the P set here. To require a stricter adherence to normality and/or constant variance. The suggested number of steps is 20 added or deleted independent variables. Constant Variance Testing. the test passes. one that more closely follows the pattern of the data). Constant Variance. Because the parametric statistical methods are relatively robust in terms of detecting violations of the assumptions. SigmaPlot tests for constant variance by computing the Spearman rank correlation between the absolute values of the residuals and the observed value of the dependent variable.

405 Prediction and Correlation Larger values of P (for example.. To relax the requirement of normality and/or constant variance. Note: Although the assumption tests are robust in detecting data from populations that are non-normal or with non-constant variances. To require a stricter adherence to independence. flag the residuals as correlated. decrease the acceptable difference from 2. If the computed DurbinWatson statistic deviates from 2. the Durbin-Watson statistic will be 2. The Durbin-Watson statistic is a measure of serial correlation between the residuals. Enter the acceptable deviation from 2. Difference from 2 Value.05. The suggested deviation value is 0.e.0 box. Requiring smaller values of P to reject the normality assumption means that you are willing to accept greater deviations from the theoretical normal distribution before you flag the data as non-normal. Options for Backward Stepwise Regression: Residuals Select the Residuals tab in the options dialog box to view the Predicted Values.01 for the normality test requires greater deviations from normality to flag the data as non-normal than a value of 0. SigmaPlot warns you that the residuals may not be independent. For example. 0. The residuals are often correlated when the independent variable is time.0. If the residuals are not correlated. However.0 that you consider as evidence of a serial correlation in the Difference for 2. i.50.SigmaPlot uses the Durbin-Watson statistic to test residuals for their independence of each other. decrease P. To relax the requirement of independence. Standardized. and Report Flagged Values Only options. these conditions should be easily detected by visually examining the data without resorting to the automatic assumption tests.0 more than the entered value. there are extreme conditions of data distribution that these tests cannot detect. Studentized. Raw.5. . Durbin-Watson Statistic values greater than 2. a P value of 0. Studentized Deleted.10) require less evidence to conclude that the residuals are not normally distributed or the constant variance assumption is violated. and the deviation between the observation and the regression line at one time are related to the deviation at the previous time. increase the acceptable difference from 2. Durbin-Watson Statistic.0.5 or less than 1.

Standardized Residuals. and is a measure of variability around the regression line. If you select none from the drop-down list and the Raw check box is selected. The standardized residual is the residual divided by the standard error of the estimate.. Studentized residuals scale the standardized residuals by taking into account the greater precision of the regression line near the middle of the data versus the extremes. The raw residuals are the differences between the predicted and observed values of the dependent variables. . The Studentized residuals tend to be distributed according to the Student t distribution. Click the selected check box if you do not want to include raw residuals in the worksheet. Raw Residuals. outlying data points. To assign the raw residuals to a worksheet column. To include standardized residuals in the report. To include Studentized residuals in the report.. To include raw residuals in the report. The standard error of the residuals is essentially the standard deviation of the residuals. Select this option to calculate the predicted value of the dependent variable for each observed value of the independent variable(s). SigmaPlot automatically flags data points lying outside of the confidence interval specified in the corresponding box. SigmaPlot automatically flags data points with "large" values of the Studentized residuals. make sure this check box is selected. select the number of the desired column from the corresponding drop-down list. the values appear in the report but are not assigned to the worksheet.e. make sure this check box is selected. select the worksheet column you want to save the predicted values to from the corresponding drop-down list. the values appear in the report but are not assigned to the worksheet. Click the selected check box if you do not want to include raw residuals in the worksheet. then save the results to the data worksheet. If you select none and the Predicted Values check box is selected. i. so the t distribution can be used to define "large" values of the Studentized residuals.406 Chapter 8 Predicted Values. You can change which data points are flagged by editing the value in the Flag Values > edit box. To assign predicted values to a worksheet column. These data points are considered to have "large" standardized residuals. make sure this check box is selected. i. Studentized Residuals. Click the selected check box if you do not want to include Studentized residuals in the worksheet.e. the suggested data points flagged lie outside the 95% confidence interval for the regression population. outlying data points.

e. The confidence interval for the population gives the range of values that define the region that contains the population from which the observations were drawn. SigmaPlot can automatically flag data points with "large" values of the Studentized deleted residual. make sure this check box is selected. To include confidence intervals for the population in the report. and then save them to the worksheet. Options for Backward Stepwise Regression: More Statistics Select the More Statistics tab in the options dialog to view the confidence interval options. Clear this option to include all standardized and studentized residuals in the report. To include only the flagged standardized and Studentized deleted residuals in the report. To include confidence intervals for the regression in the report. Uncheck the selected check box if you do not want to include the confidence intervals for the regression in the report. or both. The suggested confidence level is 95%. the suggested data points flagged lie outside the 95% confidence interval for the regression population. make sure the Regression check box is selected. . regression. The confidence interval for the regression line gives the range of values that defines the region containing the true mean relationship between the dependent and independent variables.. Confidence Interval for the Regression. You can set the confidence interval for the population. The confidence level can be any value from 1 to 99.407 Prediction and Correlation Studentized Deleted Residuals. Note: Both Studentized and Studentized deleted residuals use the same confidence interval setting to determine outlying points. Studentized deleted residuals are similar to the Studentized residuals. Uncheck the selected check box if you do not want to include the confidence intervals for the population in the report. with the specified level of confidence. Click the selected check box if you do not want to include Studentized deleted residuals in the worksheet. make sure the Population check box is selected. To include Studentized deleted residuals in the report. make sure the Report Flagged Values Only check box is selected. Confidence Interval for the Population. Report Flagged Values Only. except that the residual values are obtained by computing the regression equation without using the data point in question. outlying data points. then specify a confidence level by entering a value in the percentage box. i.

Leave this check box selected to evaluate the fit of the equation using the PRESS statistic. Options for Backward Stepwise Regression: Other Diagnostics Select the Other Diagnostics tab in the options dialog box to view the Influence. It is another . The PRESS Prediction Error is a measure of how well the regression equation fits the data. DFFITSi is the number of estimated standard errors that the predicted value changes for the ith data point when it is removed from the data set. These are the coefficients of the regression equation standardized to dimensionless values. Standardized Coefficients. PRESS Prediction Error. click the right pointing arrow to the right of the tabs to move it into view. Clear the selected check box if you do not want to include the PRESS statistic in the report.408 Chapter 8 Saving Confidence Intervals to the Worksheet. select Standardized Coefficients. where bi = regression coefficient. select the column number of the first column you want to save the intervals to from the Starting in Column drop-down list. s xi = standard deviation of the independent variable xi. Influence options automatically detect instances of influential data points. Variance Inflation Factor and Power options. If Other Diagnostic is hidden. Clear the check box if you do not want to include the standardized coefficients in the worksheet. To include the standardized coefficients in the report. that is. To save the confidence intervals to the worksheet. Most influential points are data points which are outliers. These points can have a potentially disproportionately strong influence on the calculation of the regression line. The selected intervals are saved to the worksheet starting with the specified column and continuing with successive columns in the worksheet. they do not "line up" with the rest of the data points. Use the left pointing arrow to move the other tabs back into view. DFFITS. You can use several influence tests to identify and quantify influential points. and sy = standard deviation of dependent variable y.

Leverage is used to identify the potential influence of a point on the results of the regression equation. The suggested value is 2. Leverage depends only on the value of the independent variable(s). Select the Cook’s Distance check box to compute this value for all points and flag influential points. decrease this value.0 times the expected leverage for the regression. To avoid flagging more potentially influential points.e.. Cook’s distance assesses how much the values of the regression coefficients change if a point is deleted from the analysis. those with DFFITS greater than the value specified in the Flag Values > edit box. to flag points with less potential influence. where small changes in the independent variables can have large effects on the predicted values of the dependent variable. Select the DFFITS check box to compute this value for all points and flag influential points.e. . Cook’s distance is a measure of how great an effect each point has on the estimates of the parameters in the regression equation.. i. Predicted values that change by more than two standard errors when the data point is removed are considered to be influential. to flag less influential points. those with a Cook’s distance greater than the specified value. Observations with leverages much higher than the expected leverages are potentially influential points. Cook’s Distance.e. increase this value.409 Prediction and Correlation measure of the influence of a data point on the prediction used to compute the regression coefficients.. Observations with high leverage tend to be at the extremes of the independent variables. those points that could have leverages greater than the specified value times the expected leverage. which indicates that the point has a strong influence on the data. Leverage. To avoid flagging more influential points. The expected leverage of a data point is where there are k independent variables and n data points. lower this value. increase this value.0 standard errors. Cook’s distance depends on the values of both the independent and dependent variables. i. i. The suggested value is 2. Select the Leverage check box to compute the leverage for each point and automatically flag potentially influential points.

When the independent variables are correlated. There are two types of multicollinearity. or the linear combination of the independent variables in the fit. When the independent variables are correlated. Structural Multicollinearity.410 Chapter 8 The suggested value is 4. The most common form of structural multicollinearity occurs when a polynomial regression equation contains several powers of the independent variable. Sample-Based Multicollinearity.g. i. Click the Other Diagnostics tab in the Options dialog to view the Variance Inflation Factor option. lower this value. SigmaPlot can automatically detect multicollinear independent variables using the variance inflation factor. Uncheck this option to include all influential points in the report. Variance Inflation Factor The Variance Inflation Factor option measures the multicollinearity of the independent variables. etc. increase this value. Sample-based multicollinearity occurs when the sample observations are collected in such a way that the independent variables are correlated (for example.0. to flag less influential points. they contain some common information and "contaminate" the estimates of the parameters.. The parameters in regression models quantify the theoretically unique contribution of each independent variable to predicting the dependent variable. and weight are collected on children of varying ages. To avoid flagging more influential points. Cook’s distances above 1 indicate that a point is possibly influential. make sure the Report Flagged Values Only check box is selected. the estimates of the parameters in the regression model can become unreliable. if age.e. x.. the parameter estimates can become unreliable. x2 . or contain redundant information. structural multicollinearity occurs. Because these powers (e. height. that the value of one independent variable does not affect the value of another. Report Flagged Values Only. this ideal situation rarely occurs in the real world. Including interaction terms in a regression equation can also result in structural multicollinearity. each variable has a correlation with the others). . However.) are correlated with each other. Regression procedures assume that the independent variables are statistically independent of each other. To include only the influential points flagged by the influential point tests in the report. Structural multicollinearity occurs when the regression equation contains several independent variables which are functions of each other. If the multicollinearity is severe. Cook’s distances exceeding 4 indicate that the point has a major effect on the values of the parameter estimates.

click the right pointing arrow to the right of the tabs to move it into view. What to Do About Multicollinearity. For descriptions of how to handle multicollinearity. make sure the Report Flagged Values Only check box is selected.0. meaning that any value greater than 4. Use the value in the Flag Values > edit box as a threshold for multicollinear variables. The suggested value is α = 0. The alpha ( α ) is the acceptable probability of incorrectly concluding there is a relationship. Variance inflation factor values above 4 suggest possible multicollinearity. or that you are willing to conclude there is a significant relationship when P < 0.05. and the parameter estimates may not be reliable. Structural multicollinearities can be resolved by centering the independent variable before forming the power or interaction terms. increase this value. This indicates that a one in twenty chance of error is acceptable. The default threshold value is 4. Use the left pointing arrow to move the other tabs back into view. decrease this value. there are redundant variables in the regression model. Check the Power check box to compute the power for the stepwise linear regression data. To allow greater correlation of the independent variables before flagging the data as multicollinear. Change the alpha value by editing the number in the Alpha Value edit box.411 Prediction and Correlation Flagging Multicollinear Data. you can reference an appropriate statistics reference. Clear this option to include all influential points in the report. When the variance inflation factor is large. Report Flagged Values Only. The power of a regression is the power to detect the observed relationship in the data. Sample-based multicollinearity can sometimes be resolved by collecting more data under other conditions to break up the correlation among the independent variables. . To include only the points flagged by the influential point tests and values exceeding the variance inflation threshold in the report.05. If Other Diagnostic is hidden. the regression equation is over parameterized and one or more of the independent variables must be dropped to eliminate the multicollinearity.0 will be flagged as multicollinear. To make this test more sensitive to possible multicollinearity. Power Select the Other Diagnostics tab in the options dialog to view the Power options. If this is not possible. values above 10 indicate serious multicollinearity.

2. the dialog prompts you to pick your data. The first selected column is assigned to the Dependent Variable row in the Selected Columns list. the selected columns appear in the column list. select the columns in the worksheet. or select the columns from the Data for Dependent and Independent drop-down list. 3. but also increase the risk of reporting a false positive. If you want to select your data before you run the regression. The . To assign the desired worksheet columns to the Selected Columns list. Larger values of α make it easier to conclude that there is a relationship. but a greater possibility of concluding there is no relationship when one exists. If you have not selected columns. Running a Stepwise Regression To run a Stepwise Regression you need to select the data to test. To run a Stepwise Regression: 1.412 Chapter 8 Smaller values of α result in stricter requirements before concluding there is a significant relationship. and the second column is assigned to the Independent Variable row. You use the Pick Columns dialog box to select the worksheet columns with the data you want to test. If you selected columns before you chose the test. Select Stepwise Regression from the drop-down list on the Standard toolbar or from the menus select: Statistics Regression Stepwise Forward or Statistics Regression Stepwise Backward The Pick Columns for Forward Stepwise Regression or Pick Columns for Backward Stepwise Regression dialog box appears. drag the pointer over your data.

If you elected to test for normality.413 Prediction and Correlation title of selected columns appears in each row. see “Setting Forward Stepwise Regression Options” on page 390. the regression coefficients. You can also clear a column assignment by double-clicking it in the Selected Columns list. and information about the variables in and not in the model. Note: Worksheet results can only be obtained using order only stepwise regression. an ANOVA table. and independent residuals. constant variance. and selected to place predicted values. 4. SigmaPlot warns you. and/or independent residuals. use the buttons in the formatting toolbar to move one page up and down in the report. the report appears displaying the results of the Stepwise Regression. Interpreting Stepwise Regression Results The report for both Forward and Backward Stepwise Regression displays the variables that were entered or removed for that step. and predicted values are listed for the final regression model if these options were selected in the Options for Forward or Backward Regressiondialog box. confidence intervals. 5. they are placed in the specified data columns and are labeled by content and source column. SigmaPlot performs the tests for normality (Kolmogorov-Smirnov). To change your selections. If your data fail either of these tests. constant variance. residuals. select the assignment in the list. Click Finish to run the regression. you can reference an appropriate statistics reference. Note: The report scroll bars only scroll to the top and bottom of the current page. Regression diagnostics. then select new column from the worksheet. To move to the next or the previous page in the report. You are only prompted for one dependent and one independent variable column. and/or other test results in the worksheet. What to Do About Influential Points Influential points have two possible causes: . When the test is complete. For descriptions of the computations of these results. If you are performing a regression using one order only. For more information.

The suggested F-to-Enter value is 4. expanded explanations of the results may also appear. F-to-Enter. For descriptions of how to handle influential points. or removed after each step in Forward Stepwise Regression. correct the value. or a Nonlinear Regression. caused by an error in observation or data entry. The suggested F-to-Remove value is 3. The model is incorrect. Result Explanations In addition to the numerical results. . you may be able to justify deleting the data point. If the model appears to be incorrect. or replaced after each step during backwards stepwise regression. If you do not know the correct value. All independent variables with incremental F values above the F-to-Enter value are added to the model. The F-to-Remove value controls which independent variables are deleted from the regression equation during Backwards Stepwise Regression. If a mistake was made in data collection or entry. F-to-Enter. All independent variables with incremental F values below the F-to-Remove value are deleted from the model. F-to-Remove This is the worksheet column used as the dependent variable in the regression computation. You can turn off this text on the Options dialog box. The F-to-Enter value controls which independent variables are entered into the regression equation during forward stepwise regression. F-to-Remove.0. It is the minimum incremental F value associated with an independent variable before it can be entered into the regression equation. It is the maximum incremental F value associated with an independent variable before it can be removed from the regression equation. These are the F values specified in the Options for Stepwise Regression dialog boxes.414 Chapter 8 There is something wrong with the data point. you can reference an appropriate statistics reference.9. You can also set the number of decimal places to display in the Options dialog box. try regression with different independent variables.

are both measures of how well the regression model describes the data. The sum of squares due to regression measures the difference of the regression plane from the mean of the dependent variable The residual sum of squares is a measure of the size of the residuals. SS (Sum of Squares). and equals 1 when you can perfectly predict the dependent variables from the independent variables. This statistic is displayed for the results of each step. R. The standard error of the estimate S y x is a measure of the actual variability about the regression plane of the underlying population. R values near 1 indicate that the equation is a good description of the relation between the independent and dependent variables. and R2. R equals 0 when the values of the independent variable does not allow any prediction of the dependent variables.415 Prediction and Correlation Step The step number. The adjusted R2. . and standard error of the estimate are all listed under this heading. R2 . Standard Error of the Estimate. the multiple correlation coefficient. which reflects the degrees of freedom. The underlying population generally falls within about two standard errors of the observed sample. R. the coefficient of determination for Stepwise Regression. Larger R2 values (nearer to 1) indicate that the ad equation is a good description of the relation between the independent and dependent variables. R2 and the adjusted R2 for the equation. Adjusted R2 . The sum of squares are measures of variability of the dependent variable. Degrees of freedom represent the number observations and variables in the regression equation. variable added or removed. Analysis of Variance (ANOVA) Table The ANOVA (analysis of variance) table lists the ANOVA statistics for the regression and the corresponding F value for each step. R and R2. is also a measure of how well the regression ad model describes the data. but takes into account the number of independent variables. which are the differences between the observed values of the dependent variable and the values predicted by regression model DF (Degrees of Freedom).

MS (Mean Square). the probability of falsely rejecting the null hypothesis. Comparing these variance estimates is the basis of analysis of variance. The F test statistic gauges the contribution of the independent variables in predicting the dependent variable.e. the greater the probability that there is an association.416 Chapter 8 The regression degrees of freedom is a measure of the number of independent variables. . based on F). The residual degrees of freedom is a measure of the number of observations less the number of terms in the equation. The P value is the probability of being wrong in concluding that there is an association between the dependent and independent variables (i. The smaller the P value. or The residual mean square is also equal to S y x F Statistic. the data is consistent with the null hypothesis that all the samples are just randomly distributed). or committing a Type I error. It is the ratio 2 If F is a large number. If the F ratio is around 1.e... and the "unexplained variability" is smaller than what is expected from random sampling variability of the dependent variable about its mean). you can conclude that there is no association between the variables (i. P Value. The mean square regression is a measure of the variation of the regression from the mean of the dependent variable. The mean square provides two estimates of the population variances. at least one of the coefficients is different from zero.. you can conclude that the independent variables contribute to the prediction of the dependent variable (i. or The residual mean square is a measure of the variation of the residuals about the regression plane.e.

and the corresponding P value for the F-to-Remove are listed. P Value. Variables in Model Information about the independent variables used in the regression equation for the current step are listed under this heading. An asterisk (*) indicates variables that were forced into the model. the variable remains in or is added back to the equation. the greater the probability that adding the variable contributes to the model. The standard errors are estimates of the regression coefficients (analogous to the standard error of the mean).417 Prediction and Correlation Traditionally. the probability of falsely rejecting the null hypothesis.e. standard errors. The P value is the probability of being wrong in concluding that adding the independent variable contributes to predicting the dependent variable (i. The F-to-Enter gauges the increase in predicting the dependent variable gained by adding the independent variable to the regression equation. It is the ratio If the F-to-Enter for a variable is larger than the F-to-Enter cutoff specified with the Stepwise Regression options. Coefficients. the F-to-Remove. The value for the constant and coefficients of the independent variables for the regression model are listed. Note: The F-to-Remove value is the cutoff that determines if a variable is removed from or stays out of the equation.05. or committing a Type I error. Standard Error. you can conclude that the independent variable can be used to predict the dependent variable when P < 0. F-to-Enter. The true regression coefficients of the underlying population generally fall within about two standard errors of the observed sample coefficients. P is the P value calculated for the F-to-Enter value.. . The value of the variable coefficients. Large standard errors may indicate multicollinearity. These statistics are displayed for each step. The smaller the P value. based on F-to-Enter).

The PRESS statistic is computed by summing the squares of the prediction errors (the differences between predicted and observed values) for each observation. you can conclude that the independent variable can be used to predict the dependent variable when P < 0. the probability of falsely rejecting the null hypothesis. Note: It is the F-to-Enter value that determines which variable is re-entered into or remains in the equation. The F-to-Remove gauges the increase in predicting the dependent variable gained by removing the independent variable from the regression equation. PRESS Statistic PRESS. P is the P value calculated for the F-to-Remove value. the Predicted Residual Error Sum of Squares. based on F-to-Enter). The smaller the P value. P Value.05. F-to-Remove..05.e. Variables not in Model The variables not entered or removed from the model are listed under this heading. Traditionally. with that point deleted from the computation of the regression equation. you can conclude that the independent variable can be used to predict the dependent variable when P < 0. the greater the probability that removing the variable contributes to the model. The smaller the PRESS statistic. or committing a Type I error. the better the predictive ability of the model. is a gauge of how well a regression model predicts new data. If the F-to-Remove for a variable is larger than the F-to-Remove cutoff specified with the stepwise regression options. The P value is the probability of being wrong in concluding that removing the independent variable contributes to predicting the dependent variable (i. .418 Chapter 8 Traditionally. the variable is removed from or stays out of the equation. along with their corresponding F-to-Remove and P values.

Failure of the normality test can indicate the presence of outlying influential points or an incorrect regression model. the greater the likelihood that the residuals are correlated.e. and the P value calculated by the test. you should consider trying a different model (i. or transforming the independent variable to stabilize the variance and obtain more accurate estimates of the parameters in the regression equation. one that more closely follows the pattern of the data)..419 Prediction and Correlation Durbin-Watson Statistic The Durbin-Watson statistic is a measure of correlation between the residuals. a warning appears in the report. . a warning appears in the report. The suggested trigger value is a difference of more than 0. Constant Variance Test The constant variance test result displays whether or not the data passed or failed the test of the assumption that the variance of the dependent variable in the source population is constant regardless of the value of the independent variable. This result appears unless you disabled normality testing in the Options for Best Subset Regression dialog box. Regression assumes that the residuals are independent of each other. a warning appears in the report. i.. the more this value differs from 2.50. When the constant variance assumption may be violated. All regression requires a source population to be normally distributed about the regression.5 or greater than 2. If you receive this warning. This result appears if it was selected in the Options for Stepwise Regression dialog box. and the P value calculated by the test.e. when the Durbin-Watson statistic is less than 1. the Durbin-Watson statistic will be 2. If the Durbin-Watson value deviates from 2 by more than the value set in the Options for Stepwise Regression dialog. Normality Test The Normality test result displays whether the data passed or failed the test of the assumption that the source population is normally distributed around the regression. the DurbinWatson test is used to check this assumption.5. If the residuals are not correlated. When this assumption may be violated.

Alpha ( α ) is the acceptable probability of incorrectly concluding that the model is correct. Smaller values of α result in stricter requirements before concluding the model is correct. but also increase the risk of accepting a bad model (a Type I error). the suggested value is α = 0. only those observations that have one or more residuals flagged as outliers are reported. all other results for that observation are also displayed. Regression power is affected by the number of observations. Residuals. the difference between the predicted and observed values for the dependent variables. if there is a relationship. and about 95% of the . Larger values of α make it easier to conclude that the model is correct. If you selected Report Cases with Outliers Only. The standardized residual is the raw residual divided by the standard error of the estimate. the chance of erroneously reporting a difference α (alpha). about 66% of the standardized residuals have values between -1 and +1. they may be used to plot the regression using SigmaPlot . however. This is the value for the dependent variable predicted by the regression model for each observation. If the residuals are normally distributed about the regression.05 which indicates that a one in twenty chance of error is acceptable. An a error is also called a Type I error (a Type I error is when you reject the hypothesis of no association when this hypothesis is true). Alpha. of a regression is the probability that the model correctly describes the relationship of the variables. and the slope of the regression. All results that qualify as outlying values are flagged with a < symbol. If these values were saved to the worksheet. Predicted Values. but a greater possibility of concluding the model is bad when it is really correct (a Type II error). The trigger values to flag residuals as outliers are set in the Options for Stepwise Regression dialog box. The α value is set in the Power Options dialog box. Regression Diagnostics The regression diagnostic results display only the values for the predicted and residual results selected in the Options for Stepwise Regression dialog. These are the raw residuals. or sensitivity.420 Chapter 8 Power This result is displayed if you selected this option in the Options for Stepwise Regression dialog box. Standardized Residuals. The power.

Influence Diagnostics The influence diagnostic results display only the values for the results selected in the Options dialog under the Other Diagnostics tab. By weighting the values of the residuals of the extreme data points (those with the lowest and highest independent variable values). All results that qualify as outlying values are flagged with a < symbol. the Studentized residual is more sensitive than the standardized residual in detecting outliers.421 Prediction and Correlation standardized residuals have values between -2 and +2. Cook’s Distance. A larger standardized residual indicates that the point is far from the regression. If you selected Report Cases with Outliers Only. however. Studentized Deleted Residual.5. The Studentized residual is a standardized residual that also takes into account the greater confidence of the data points in the "middle" of the data set. since the Studentized deleted residual results in much larger values for outliers than the Studentized residual. or externally Studentized residual. . Both Studentized and Studentized deleted residuals that lie outside a specified confidence interval for the regression are flagged as outlying points. Cook’s distance is a measure of how great an effect each point has on the estimates of the parameters in the regression equation. The trigger values to flag data points as outliers are also set in the Options dialog under the Other Diagnostics tab. The Studentized deleted residual is more sensitive than the Studentized residual in detecting outliers. This residual is also known as the internally Studentized residual. Both Studentized and Studentized deleted residuals that lie outside a specified confidence interval for the regression are flagged as outlying points: the suggested confidence value is 95%. the suggested value flagged as an outlier is 2. the suggested confidence value is 95%. because the standard error of the estimate is computed using all data. It is a measure how much the values of the regression equation would change if that point is deleted from the analysis. is a Studentized residual which uses the standard error of the estimate s y x ( –i ) . only observations that have one or more observations flagged as outliers are reported. Studentized Residuals. all other results for that observation are also displayed. The Studentized deleted residual.

Observations with leverages a specified factor greater than the expected leverages are flagged as potentially influential points. the suggested confidence level for both intervals is 95%. If the confidence interval does not include zero. . This can also be described as P < α (alpha). The specified confidence level can be any value from 1 to 99. high leverage points tend to be at the extremes of the independent variables (large and small values). you can conclude that the coefficient is different than zero with the level of confidence specified.α ).422 Chapter 8 Values above 1 indicate that a point is possibly influential. Pred (Predicted Values). where small changes in the independent variables can have large effects on the predicted values of the dependent variable. Predicted values that change by more than the specified number of standard errors when the data point is removed are flagged as influential: the suggested value is 2. It is the number of estimated standard errors the predicted value for a data point changes when the observed value is removed from the data set before computing the regression coefficients.0 standard errors. The DFFITS statistic is a measure of the influence of a data point on regression prediction. Cook’s distances exceeding 4 indicate that the point has a major effect on the values of the parameter estimates. The expected leverage of a data point is . Leverage. the suggested value is 4. Confidence Intervals These results are displayed if you selected them in the Options for Stepwise Regression dialog. where α is the acceptable probability of incorrectly concluding that the coefficient is different than zero. where there are k independent variables and n data points. and the confidence interval is 100(1 . Leverage values identify potentially influential points. Points with Cook’s distances greater than the specified value are flagged as influential. DFFITS.0 times the expected leverage. Because leverage is calculated using only the dependent variable. This is the value for the dependent variable predicted by the regression model for each observation. the suggested value is 2.

see “Scatter Plot of the Residuals” on page 545. for the specified level of confidence. Obs (Observations). The confidence interval for the population gives the range of variable values computed for the region containing the population from which the observations were drawn. see “Bar Chart of the Standardized Residuals” on page 546. Normal probability plot of residuals. For more information. The confidence interval for the regression gives the range of variable values computed for the region containing the true relationship between the dependent and independent variables. from the menus select: Graph Create Result Graph The Create Result Graph dialog box appears displaying the types of graphs available for the Stepwise Regression results. see “2D Line/Scatter Plots of the Regressions with Prediction and Confidence Intervals” on page 550. 3D scatter plot of the residuals. Creating Stepwise Regression Report Graphs To generate a graph of Stepwise Regression report data: 1. . For more information. For more information. Stepwise Regression Report Graphs You can generate up to five graphs using the results from a Simple Linear Regression. Line/scatter plot of the regression with confidence and prediction intervals. For more information. For more information. Scatter plot of the residuals.423 Prediction and Correlation Mean. see “Normal Probability Plot” on page 549. They include a: Histogram of the residuals. With the report in view. Bar chart of the standardized residuals. see “3D Residual Scatter Plot” on page 551.For more information. for the specified level of confidence. see “Histogram of Residuals” on page 547.

. use Multiple Linear Regression. b1.bk are the regression coefficients. If the relationship is not a straight line or plane. the corresponding value for y either increases or decreases. Best Subset Regression assumes an association between the independent and dependent variables that fits the general equation for a multidimensional plane: where y is the dependent variable... use Stepwise Regression. b2.. As the values for xi vary. a corresponding value for the dependent. variable is produced. see “Generating Report Graphs” on page 539. and b0. or response. x3. Select the type of graph you want to create from the Graph Type list.. When the independent variable is varied. use Polynomial or Nonlinear Regression. or predicted. and you want to find the subsets of independent variables that best contribute to predicting the dependent variable. x1. Best subsets regression searches for those combinations of the independent variables that give the “best” . x2.424 Chapter 8 2. The specified graph appears in a graph window or in the report. If you want to select the equation model by incrementally adding or deleting variables from the model. xk are the independent variables. Do not know which independent variables contribute to the prediction of the dependent variable. About Best Subset Regression Best Subsets Regression is a technique for selecting variables in a multiple linear regression by systematically searching through the different combinations of the independent variables and selecting the subsets of variables that best contribute to predicting the dependent variable. variable. Best Subsets Regression Use Linear Best Subsets Regression when you: Need to predict a trend in the data. then click OK.. If you already know which independent variables to use... or predict the value of one variable from the values of one or more other variables. For more information. by fitting a line or plane (or hyperplane) through the data. The independent variable is the known.

Note that the fully specified model will always have a Cp = p 2 Performing a Best Subset Regression To perform a Best Subset Regression: .” and the results depend on which criterion you select. To view results. but takes into account the number of independent variables. The adjusted R2. residuals. Consequently. There are several criteria for “best. The larger the value of R2. then perform a multiple linear regression using only those independent variables. or other results are produced with a best subsets regression. R adj . R2. No predicted values. the less likely a relevant variable was omitted. is a measure of how well the regression model describes the data based on R2. is a measure of how well the regression model describes the data. the coefficient of determination for multiple regression. note which independent variables were used for the desired model. R Squared. as computed from the number of parameters plus a measure of the difference between the predicted and true population means of the dependent variable. the better the model predicts the dependent variable However. the number of variables used in the equation is not taken into account. you can reference an appropriate statistics reference.425 Prediction and Correlation prediction of the dependent variable. The closer the value of Cp is to the number of parameters. Mallows. For a further discussion of these statistics. Cp is a gauge of the size of the bias introduced into the estimate of the dependent variable when independent variables are omitted from the regression equation. equations with more variables will always have higher Adjusted R Squared. The optimal value of C p = p = k + 1 where p is the number of parameters and k is the number of independent variables. graphs. "Best" Subsets Criteria There are three statistics that can be used to evaluate which subsets of variables best contribute to predicting the dependent variable. These criteria are specified in the Options for Best Subset Regression dialog box.

see “Arranging Best Subset Regression Data” on page 426. see “Interpreting Best Subset Regression Results” on page 430. Run the test. drag the pointer over your data. and want to select your data before you run the test.426 Chapter 8 1. Setting Best Subset Regression Options Use the Best Subset Regression options to Specify the criterion to use to predict the dependent variable and the number of subsets used in the equation. Arranging Best Subset Regression Data Place the data for the observed dependent variable in a single column and the corresponding data for the independent variables in one or more columns. For more information. If you are going to run the test after changing test options. Rows containing missing values are ignored. 5. If desired. see “Running a Best Subset Regression” on page 429. and the columns must be of equal length. see “Setting Best Subset Regression Options” on page 426. For more information. 2. 3. View and interpret the Best Subset Regression report. set the Best Subset Regression options. To change Best Subset Regression options: 1. . For more information. Enter or arrange your data in the worksheet. Enable the variance inflation factor to identify potential difficulties with the regression parameter estimates (multicollinearity). From the menus select: Statistics Regression Best Subsets 4. For more information.

e. Adjusted R2. Best Criterion. see “Running a Best Subset Regression” on page 429 above. To accept the current settings and close the dialog box. Select Adjusted R Squared (Adjusted R2) from the Best Criterion 2 2 drop-down list to use the largest R adj values to select the best regressions. etc. Select Best Subset Regression from the drop-down list in the Standard toolbar. The number of subsets listed is equal to the number set with the Number of Subsets option. 5. one independent variable. two variables. Mallows. Select R Squared (R2) from the Best Criterion drop-down list to use the largest coefficient of determination to find the best fitting subset. R2 contains no information on the number of variables used. click Run Test.427 Prediction and Correlation 2.. Options settings are saved between SigmaPlot sessions. 3. The maximum number of subsets listed for each number of possible variables is equal to the Number of Subsets option. To continue the test. 4. Options for Best Subset Regression: Criterion Use the Best Criterion option to select the criterion used to determine the best subsets and the Number of Subsets option to specify the number of subsets to list. R2. click OK. up to all variables selected). Select Mallows C(p) from the Best Criterion drop-down list to use a gauge of the bias introduced when variables are omitted to quickly screen large numbers of potential variables and produce a few subsets that include only the relevant variables. From the menus select: Statistics Current Test Options The Options for Best Subset Regression dialog box appears with the Criterion tab in view. R adj takes into account the loss of degrees of freedom when additional independent .. For more information. see “Options for Best Subset Regression: Criterion” on page 427. Select the criterion to determine the best subsets from this drop-down list. so subsets are listed for each number of possible variables (i. For more information.

For more information. Sample-based multicollinearity occurs when the sample observations are collected in such a way that the independent variables are correlated (for example. Because these powers (e.428 Chapter 8 variables are added to the regression equation. The number of subsets listed is equal to the number set with the Number of Subsets option. Regression procedures assume that the independent variables are statistically independent of each other. and weight are collected on children of varying ages. Structural Multicollinearity. make sure the Report Flagged Values Only check box is selected.. meaning that any value . Flagging Multicollinear Data. Use the value in the Flag Values > edit box as a threshold for multicollinear variables. Use this option to specify the number of most contributing variable groups to list by entering the desired value in the Number of Subsets edit box. When the independent variables are correlated. If the multicollinearity is severe. The parameters in regression models quantify the theoretically unique contribution of each independent variable to predicting the dependent variable.g. or the linear combination of the independent variables in the fit. For Variance Inflation Factor. if age. or contain redundant information. each variable has a correlation with the others). x. i. When the independent variables are correlated. To only include only the points flagged by the influential point tests and values exceeding the variance inflation threshold in the report. the parameter estimates can become unreliable. Clear this option to include all influential points in the report. Report Flagged Values Only. see “Flagging Multicollinear DataFlagging Multicollinear Data” below. Structural multicollinearity occurs when the regression equation contains several independent variables which are functions of each other. Number of Subsets. However. There are two types of multicollinearity. this ideal situation rarely occurs in the real world.. they contain some common information and "contaminate" the estimates of the parameters. The most common form of structural multicollinearity occurs when a polynomial regression equation contains several powers of the independent variable. height.e. The default threshold value is 4.0. Sample-Based Multicollinearity. the estimates of the parameters in the regression model can become unreliable. Use Variance Inflation Factor option to measure the multicollinearity of the independent variables. that the value of one independent variable does not affect the value of another.

increase this value. values above 10 indicate serious multicollinearity. Sample-based multicollinearity can sometimes be resolved by collecting more data under other conditions to break up the correlation among the independent variables. . When the variance inflation factor is large. If you want to select your data before you run the regression. and the parameter estimates may not be reliable. To make this test more sensitive to possible multicollinearity. What to Do About Multicollinearity. You use the Pick Columns dialog box to select the worksheet columns with the data you want to test. you can reference an appropriate statistics reference. If this is not possible. If you have not selected columns. To run a Best Subset Regression: 1. the selected columns appear in the column list. drag the pointer over your data. the regression equation is over parameterized and one or more of the independent variables must be dropped to eliminate the multicollinearity. Structural multicollinearities can be resolved by centering the independent variable before forming the power or interaction terms. For descriptions of how to handle multicollinearity. Variance inflation factor values above 4 suggest possible multicollinearity. To allow greater correlation of the independent variables before flagging the data as multicollinear. 2. the dialog prompts you to pick your data. decrease this value. Running a Best Subset Regression To run a Best Subset Regression. If you selected columns before you chose the test. you need to select the data to test.0 will be flagged as multicollinear. there are redundant variables in the regression model.429 Prediction and Correlation greater than 4. From the menus select: Statistics Regression Best Subsets The Pick Columns for Best Subset Regression dialog box appears.

Note: No predicted values. 2 . the maximum number of subsets reported for each number of variables included is the number set in the Best Subsets Regression Options dialog box. When the test is complete. and the criterion used to select the best subsets. The Best Subset Regression is performed. select the assignment in the list. Interpreting Best Subset Regression Results A Best Subsets Regression report lists a summary table of the "best" criteria statistics for all variable subsets. Click Finish to run the regression. You are only prompted for one dependent and one independent variable column. select the columns in the worksheet. To change your selections. note which independent variables were used for that model. perform a Multiple Linear Regression using the variables in the subset(s) of interest. The title of selected columns appears in each row. 5. To view results for models. and the second column is assigned to the Independent Variable row. along with the error mean square and the specific member variables of the subset. To assign the desired worksheet columns to the Selected Columns list. The first selected column is assigned to the Dependent Variable row in the Selected Columns list. Note that the number of subsets listed is determined by the number of subsets selected in the Options for Best Subsets Regression dialog. To view a graph. or select the columns from the Data for Dependent and Independent drop-down list. the Best Subset regression report appears. If you used R2.430 Chapter 8 3. the number of subset results reported is the number set in the Options for Best Subsets Regression dialog box. residuals and other test results are computed or placed in the worksheet. Detailed results for each subset regression equation are then listed individually. then perform a Multiple Linear Regression using only those independent variables. Note: You cannot generate report graphs for Best Subsets Regression. then select new column from the worksheet. 4. You can also clear a column assignment by double-clicking it in the Selected Columns list. If you used R adj or Cp.

The closer the value of Cp is to the number of parameters. To move to the next or the previous page in the report. The closer the value of R2 to 1. whether or not the additional variables really contribute to the prediction. higher order subsets will always have higher R2 values. R2. The adjusted R2. Summary Table Variables. or Cp = p = k + 1 where p is the number of parameters and k is the number of independent variables. 2 . The optimal value of Cp is equal to the number of parameters (the independent variables used in the subset plus the constant). You can turn off this text on the Options dialog box.431 Prediction and Correlation and graph those results. Subsets with low orders that also have Cp values close to k + 1 are good candidates for the best subset of variables. Result Explanations In addition to the numerical results. You can also set the number of decimal places to display in the Options dialog box. the better the model predicts the dependent variable. see “Multiple Linear Regression” on page 325. R adj . However. use the buttons in the formatting toolbar to move one page up and down in the report. the less likely a relevant variable was omitted. is a measure of how well the regression model describes the data based on R2. is a measure of how well the regression model describes the data. For more information. expanded explanations of the results may also appear. R2. The variables included in the subset are noted by asterisks (*) which appear below the variable symbols on the right side of the table. Tip: The report scroll bars only scroll to the top and bottom of the current page. because the number of variables used is not taken into account. the coefficient of determination for multiple regression. Adjusted R2. but takes into account the number of independent variables. Mallows. Cp is a gauge of the bias introduced into the estimate of the dependent variable when independent variables are omitted from the regression equation.

Subsets Results Tables of statistical results are listed for each regression equation identified in the summary table. The error mean square (residual. or within groups): is an estimate of the variability in the underlying population. The t statistic tests the null hypothesis that the coefficient of each independent variable is zero. .Note that the subset that includes all variables always has a Cp = p. These values are used to compute t for the regression coefficients. The standard errors are estimates of these regression coefficients (analogous to the standard error of the mean). The residual sum of squares is a measure of the size of the residuals. that the coefficient is not zero). that is. 2 MSerr (Error Mean Square). computed from the random component of the observations.432 Chapter 8 Larger R adj values (nearer to 1) indicate that the equation is a good description of the relation between the independent and dependent variables. Coefficient. or: You can conclude from "large" t values that the independent variable(s) can be used to predict the dependent variable (i. the independent variable does not contribute to predicting the dependent variable. Large standard errors may indicate multicollinearity.. . The true regression coefficients of the underlying population generally fall within about two standard errors of the observed sample coefficients. t is the ratio of the regression coefficient to its standard error. Residual Sum of Squares. Std Err (Standard Error).e. which are the differences between the observed values of the dependent variable and the values predicted by regression model. t Statistic. The value for the constant and coefficients of the independent variables for the regression model are listed.

you can conclude that the independent variable can be used to predict the dependent variable when P < 0. based on t). P is the P value calculated for t. You want to determine if the relationship. The variance inflation factor is a measure of multicollinearity.. If you need to find the correlation of data measured by rank or order. The Pearson Product Moment Correlation coefficient is the most commonly used correlation coefficient. The residuals (distances of the data points from the regression line) are normally distributed with constant variance. Pearson Product Moment Correlation Use Pearson Product Moment Correlation when: You want to measure the strength of the association between pairs of variables without regard to which variable is dependent or independent. and the parameter estimates may not be reliable. or committing a Type I error. there is no redundant information in the other independent variables. The smaller the P value. there are redundant variables in the regression model. It measures the "inflation" of a regression parameter (coefficient) for an independent variable due to redundant information in other independent variables.0. Traditionally.e. If you want to predict the value of one variable from another. the probability of falsely rejecting the null hypothesis. use Simple or multiple Linear Regression. VIF (Variance Inflation Factor). between the variables is a straight line. If the variance inflation factor is much larger. use the nonparametric Spearman Rank Order Correlation.433 Prediction and Correlation P Value. If the variance inflation factor is at or near 1. This result appears unless it was disabled in the Options for Best Subset Regression dialog box. if any. .05. The P value is the probability of being wrong in concluding that there is a true association between the variables (i. the greater the probability that the independent variable helps predict the dependent variable.

Arranging Pearson Product Moment Correlation Data Place the data for each variable in a column. or from the menus select: Statistics Correlation Pearson Product Moment 3. Computing the Pearson Product Moment Correlation Coefficient To compute the Pearson Product Moment Correlation coefficient: 1.434 Chapter 8 About the Pearson Product Moment Correlation Coefficient When an assumption is made about the dependency of one variable on another. Enter or arrange your data appropriately in the data worksheet. Select Pearson Correlation from the toolbar. only the strength of association is measured. 2. The Pearson Product Moment Correlation coefficient does not require the variables to be assigned as independent and dependent. it affects the computation of the regression line. including missing values created by columns of unequal length. Pearson Product Moment Correlation is a parametric test that assumes the residuals (distances of the data points from the regression line) are normally distributed with constant variance. Observations containing missing values are ignored. . View and interpret the Pearson Product Moment Report and generate report graph. Instead. 4. Run the test by selecting the worksheet columns with the data you want to test using the Pick Columns dialog box. with a maximum of 64 columns. You must have at least two columns of variables. then click the Run button. Reversing the assumption of the variable dependencies results in a different regression line.

3. Select Pearson Product Moment from the drop-down list on the Standard toolbar or from the menus select: Statistics Correlation Pearson Product Moment The Pick Columns for Pearson Product Moment dialog box appears. The Pick Columns dialog box is used to select the worksheet columns with the data you want to test. If you selected columns before you chose the test. The selected columns are assigned to the Variables row in the Selected Columns list in the order they are selected from the worksheet. you need to select the data to test. To change your selections. If you want to select your data before you run the regression. You can also clear a column assignment by double-clicking it in the Selected Columns list. Click Finish. To assign the desired worksheet columns to the Selected Columns list.435 Prediction and Correlation Running a Pearson Product Moment Correlation To run a Pearson Product Moment test. select the columns in the worksheet. then select new column from the worksheet. The correlation coefficient is computed. SigmaPlot computes the correlation coefficient for every possible pair. To run a Pearson Product Moment Correlation: 1. When the test is complete. . the Pearson Product Moment Correlation Coefficient report appears. select the assignment in the list. 5. If you have not selected columns. The title of selected columns appears in each row. 4. the selected columns appear in the column list. the dialog box prompts you to pick your data. You can select up to 64 variable columns. 2. drag the pointer over your data. or select the columns from the Data for Variable drop-down list.

A correlation coefficient of 0 indicates no relationship between the two variables.e. . Correlation Coefficient The correlation coefficient r quantifies the strength of the association between the variables. with both always increasing together. for each pair of variables. You can also set the number of decimal places to display in the Options dialog box. the P value for the correlation coefficient. the probability of falsely rejecting the null hypothesis. and the number of data points used in the computation.05. P Value The P value is the probability of being wrong in concluding that there is a true association between the variables (i.436 Chapter 8 Interpreting Pearson Product Moment Correlation Results The report for a Pearson Product Moment Correlation displays the correlation coefficient r. r varies between -1 and +1. You can turn off this text on the Options dialog box. Result Explanations In addition to the numerical results. To move to the next or the previous page in the report. expanded explanations of the results may also appear. use the buttons in the formatting toolbar to move one page up and down in the report. Traditionally. A correlation coefficient near +1 indicates there is a strong positive relationship between the two variables. Note: The report scroll bars only scroll to the top and bottom of the current page. The smaller the P value. the greater the probability that the variables are correlated. with one always decreasing as the other increases. you can conclude that the independent variable can be used to predict the dependent variable when P < 0. or committing a Type I error).. A correlation coefficient near -1 indicates there is a strong negative relationship between the two variables.

the second row of the matrix represents the second set of variables or the second data column. from the menus select: Graph Create Result Graph The Create Result Graph dialog box appears displaying a Scatter Matrix graph. etc. The X data for the graphs in the third row of the matrix is taken from the second column of tested data. The X and Y data for the graphs correspond to the column and row of the graph in the matrix. This number reflects samples omitted because of missing values in one of the two variables used to compute each correlation coefficient. The first row of the matrix represents the first set of variables or the first column of data. . Pearson Product Moment Correlation Report Graph The Pearson Moment Correlation matrix is a series of scatter graphs that plot the associations between all possible combinations of variables. 2. and the third row of the matrix represents the third set of variables or third data column. the X data for the graphs in the first row of the matrix is taken from the second column of tested data. and the Y data is taken from the first column of tested data. The selected graph appears in a graph window. With the Pearson Product Moment report in view. Click OK. Creating the Pearson Product Moment Report Graph To generate a report graph of Pearson Product Moment report data: 1. and the Y data is taken from the third column of tested data. and the Y data is taken from the second column of tested data. For example.437 Prediction and Correlation Number of Samples This is the number of data points used to compute the correlation coefficient. The X data for the graphs in the second row of the matrix is taken from the first column of tested data. The number of graph rows in the matrix is equal to the number of data columns being tested.

About the Spearman Rank Order Correlation Coefficient When an assumption is made about the dependency of one variable on another. Computing the Spearman Rank Order Correlation Coefficient To compute the Spearman Rank Order Correlation coefficient: 1. it affects the computation of the regression line. The Spearman Rank Order Correlation coefficient is computed by ranking all values of each variable. use the parametric Pearson Product Moment Correlation. use some form of regression. then computing the Pearson Product Moment Correlation coefficient of the ranks. If you need to find the correlation of normally distributed data. If you want to assume that the value of one variable affects the other. . The residuals (distances of the data points from the regression line) are not normally distributed with constant variance. Spearman Rank Order Correlation is a nonparametric test that does not require the data points to be linearly related with a normal distribution about the regression line with constant variance. Reversing the assumption of the variable dependencies results in a different regression line. Enter or arrange your data appropriately in the worksheet. Instead. The Spearman Rank Order Correlation coefficient does not require the variables to be assigned as independent and dependent. only the strength of association is measured.438 Chapter 8 Spearman Rank Order Correlation Use Spearman Rank Order Correlation when: You want to measure the strength of association between pairs of variables without specifying which variable is dependent or independent.

439 Prediction and Correlation

2. Select Spearman Correlation from the toolbar, then click Run, or from the menus select:
Statistics Correlation Spearman Rank Order

3. Run the test. 4. View and interpret the Spearman rank order correlation report and generate the report graph.

Arranging Spearman Rank Order Correlation Coefficient Data
Place the data for each variable in a column. You must have at least two columns of variables, with a maximum of 64 columns. Observations containing missing values are ignored. However, rank order correlations require columns of equal length.

Running a Spearman Rank Order Correlation
To run a Spearman Rank Order Correlation test, you need to select the data to test. The Pick Columns dialog box is used to select the worksheet columns with the data you want to test and to specify how your data is arranged in the worksheet.
To run a Spearman Rank Order Correlation:

1. If you want to select your data before you run the regression, drag the pointer over your data. 2. Select Spearman Correlation from the drop-down list on the Standard toolbar and click the Run Test button, or from the menus select:
Statistics Correlation Spearman Correlation

The Pick Columns for Spearman Correlation dialog box appears. If you selected columns before you chose the test, the selected columns appear in the column list. If you have not selected columns, the dialog box prompts you to pick your data.

440 Chapter 8

3. To assign the desired worksheet columns to the Selected Columns list, select the columns in the worksheet, or select the columns from the Data for Variable drop-down list. The selected columns are assigned to the Variables row in the Selected Columns list in the order they are selected from the worksheet. The title of selected columns appears in each row. You can select up to 64 variable columns. SigmaPlot computes the correlation coefficient for every possible pair. 4. To change your selections, select the assignment in the list, then select new column from the worksheet. You can also clear a column assignment by double-clicking it in the Selected Columns list. 5. Click Finish. The correlation coefficient is computed. When the test is complete, the Spearman Rank Order Correlation Coefficient report appears.

Interpreting Spearman Rank Correlation Results
The report for a Spearman Rank Order Correlation displays the correlation coefficient r, the P value for the correlation coefficient, and the number of data points used in the computation, for each pair of variables.
Result Explanations

In addition to the numerical results, expanded explanations of the results may also appear. You can turn off this text on the Options dialog box. You can also set the number of decimal places to display in the Options dialog box.

Spearman Correlation Coefficient rs
The Spearman correlation coefficient rs quantifies the strength of the association between the variables. rs varies between -1 and +1. A correlation coefficient near +1 indicates there is a strong positive relationship between the two variables, with both always increasing together. A correlation coefficient near -1 indicates there is a strong negative relationship between the two variables, with one always decreasing as the other increases. A correlation coefficient of 0 indicates no relationship between the two variables.

441 Prediction and Correlation

P Value
The P value is the probability of being wrong in concluding that there is a true association between the variables (i.e., the probability of falsely rejecting the null hypothesis, or committing a Type I error). The smaller the P value, the greater the probability that the variables are correlated. Traditionally, you can conclude that the independent variable can be used to predict the dependent variable when P < 0.05.

Number of Samples
This is the number of data points used to compute the correlation coefficient. This number reflects samples omitted because of missing values in one of the two variables used to compute each correlation coefficient.

Spearman Rank Order Correlation Report Graph
The Spearman Rank Order Correlation matrix of scatter graphs is a series of scatter graphs that plot the associations between all possible combinations of variables. The first row of the matrix represents the first set of variables or the first column of data, the second row of the matrix represents the second set of variables or the second data column, and the third row of the matrix represents the third set of variables or third data column. The X and Y data for the graphs correspond to the column and row of the graph in the matrix. For example, the X data for the graphs in the first row of the matrix is taken from the second column of tested data, and the Y data is taken from the first column of tested data. The X data for the graphs in the second row of the matrix is taken from the first column of tested data, and the Y data is taken from the second column of tested data. The X data for the graphs in the third row of the matrix is taken from the second column of tested data, and the Y data is taken from the third column of tested data, etc. The number of graph rows in the matrix is equal to the number of data columns being tested. For more information, see “Generating Report Graphs” on page 539.

442 Chapter 8

Creating the Spearman Correlation Report Graph
To generate the graph of Spearman Correlation report data: 1. With the Spearman Correlation report in view, from the menus select:
Graph Create Result Graph

The Create Result Graph dialog box appears displaying a Scatter Matrix graph. 2. Click OK. The selected graph appears in a graph window.

Chapter

9

Survival Analysis

Survival analysis studies the variable that is the time to some event. The term survival originates from the event death. But the event need not be death; it can be the time to any event. This could be the time to closure of a vascular graft or the time when a mouse footpad swells from infection. Of course it need not be medical or biological. It could be the time a motor runs until it fails. For consistency we will use survival and death (or failure) here. Sometimes death doesn’t occur during the length of the study, or the patient dies from some other cause, or the patient relocates to another part of the country. Though a death did not occur, this information is useful since the patient survived up until the time he or she left the study. When this occurs, the patient is referred to as censored. This comes from the expression censored from observation‚ Äì the data has been lost from view of the study. Examples of censored values are patients who moved to another geographic location before the study ended and patients who are alive when the study ended. Kaplan-Meier survival analysis includes both failures (death) and censored values.

Three Survival Tests
Use the Survival statistic to obtain one of the following three tests. Single Group. Use this to analyze and graph one survival curve. For more information, see “Single Group Survival Analysis” on page 446. LogRank. Use this to compare two or more survival curves. The LogRank test assumes that all survival time data is equally accurate and all data will be equally

443

444 Chapter 9

weighted in the analysis. For more information, see “LogRank Survival Analysis” on page 455. Gehan-Breslow. Use this to compare two or more survival curves when you expect the early data to be more accurate than later. Use this, for example, if there are many more censored values at the end of the study than at the beginning. For more information, see “Gehan-Breslow Survival Analysis” on page 470.

Two Multiple Comparison Tests
If the LogRank or Gehan-Breslow statistic yields a significant difference in survival curves, then you have the option to use one of two multiple comparison procedures to determine exactly which pairs of curves are different. These are the Bonferroni and Holm-Sidak tests and are described for each test.

Data Format for Survival Analysis
Survival data consists of three variables: Survival time Status Group The survival times are the times when the event occurred. They must be positive, and all non-positive values will be considered missing values. Survival time or group need not sort the data. The status variable defines whether the data is a failure or censored value. You are allowed to use multiple names for both failure and censored. These can be text or numeric. The group variable defines each individual survival data set (and curve). Arrange data in the worksheet in either of two formats: Raw data format. Column pairs of survival time and status value for each group. For more information, see “Raw Data” on page 445. Indexed data format. Data indexed to a group column. For more information, see “Indexed Data” on page 445.

445 Survival Analysis

Raw Data
To enter the data in Raw data format, enter the survival time in one column and the corresponding status in a second column. Do this for each group. If you wish, you can identify each group with a column title in the survival time column. If you do this then these group titles will be used in the graph and report.
Figure 0-1 Raw Data Format for a Survival Analysis with Two Groups

In the graph above, columns 1 and 2 are the survival time and status values for the first group - Affected Node. Columns 3 and 4 are the same for the second group - Total Node. The report and the survival curve graph will use the text strings (“Affected Node,” “Total Node”) found in the survival time column titles. Note: The worksheet columns for each group must be the same length. If not, the cells in the longer length column will be considered missing. All non-positive survival times will also be considered missing. All status variable values not defined as either a failure or a censored value will be considered missing.

Indexed Data
Indexed data is a three-column format. The survival time and status variable in two columns are indexed on the group names in a third column. Informative column titles are not necessary but are useful when selecting columns in the wizard.

446 Chapter 9

Figure 0-2 Indexed Data Format - a Three-Column Format Consisting of Group, Survival Time, and Status

In the example above, group is in column 1, survival time is in column 2 and the status variable is in column 3. Note: The Transforms menu Index and Unindex commands are not designed for converting between survival analysis data formats. To use these features you must index and unindex the survival time and status variables separately and then reorganize the resulting columns.

Single Group Survival Analysis
Single Group Survival Analysis analyzes the survival data from one group, and then creates a report and a graph with a single survival curve. There is no statistical test performed but statistics associated with the data, such as the median survival time, are calculated and presented in the report.

Performing a Single Group Survival Analysis
1. Enter or arrange you data in the worksheet. For more information, see “Arranging Single Group Survival Analysis Data” on page 447.

447 Survival Analysis

2. If desired, set the Single Group options. For more information, see “Setting Single Group Test Options” on page 448. 3. From the menus select:
Statistics Survival Single Group

4. Select the two worksheet columns with the survival times and status values in the Pick Columns dialog box. 5. Click Next and select the Event and Censored labels. You may select multiple labels for each. 6. Click Finish. 7. View single group survival graph. For more information, see “Single Group Survival Graph” on page 455. 8. Interpret the Single Group survival analysis report and curve. For more information, see “Interpreting Single Group Survival Results” on page 453.

Arranging Single Group Survival Analysis Data
Two data columns are required, a column with survival times and a column with status labels. These can be just two columns in a worksheet or two columns from a multigroup data set. You can select a single pair of columns from the multiple groups in the Raw data format. Note: Use this option to analyze all groups as a single group from an indexed format data set. For example, select the last two columns in the worksheet to analyze both groups as one group. You cannot do this directly with Raw data format since the groups are not concatenated in two columns. You would need to use the Stack transform in Transforms to concatenate the columns.

448 Chapter 9

Setting Single Group Test Options
Use the Survival Curve Test Options to: Specify attributes of the generated survival curve graph. Customize the post-test contents of the report and worksheet.
To change the Survival Curve options:

1. If you are going to analyze your survival curve after changing test options, and want to select your data before you create the curve, then drag the pointer over your data. 2. Select Survival Single Group from the Standard toolbar drop-down list 3. From the menus select:
Statistics Current Test Options

The Options for Survival Single Group dialog box appears with two tabs: Graph Options. Click the Graph Options tab to view the graph symbol, line and scaling options. You can select additional statistical graph elements here. For more information, see “Options for Survival Single Group: Graph Options” on page 449.
Figure 3-1 The Options for Survival Curve Dialog Displaying the Graph Options

449 Survival Analysis

Results. Click the Results tab to specify the survival time units and to modify the content of the report and worksheet. For more information, see “Options for Single Group Survival: Results” on page 450.
Figure 3-2 The Options for Survival Curve Dialog Displaying the Results Options

SigmaPlot saves the options settings between sessions. 4. To continue the test, click Run Test. The Pick Columns panel appears. 5. To accept the current settings and close the dialog box, click OK. Note: All options in these dialog boxes are "sticky" and remain in the state that you have selected until you change them.

Options for Survival Single Group: Graph Options
Status Symbols. All graph options apply to graphs that are created when the analysis is run. Censored. Click the Graph Options tab from the Options for Survival Single Group dialog box to view the status symbols options. Censored symbols are graphed by default. Clear this option to not display the censored symbols. Failures. Select Failures to display symbols at the failure times. These symbols always occupy the inside corners of the steps in the survival curve. As such they provide redundant information and need not be displayed.

Cumulative Probability Table. Survival Scale. survival line. Standard Error Bars. Note: The results in the report are always expressed in fractional terms no matter which option is selected for the graph. All objects (for example. Worksheet. symbols. Options for Single Group Survival: Results Report. Percent. These are placed into the first empty worksheet columns. Selecting adds the upper and lower confidence lines in a stepped line format. Selecting this will result in a Y-axis scaling from 0 to 100. from Graph Properties after creating the graph. Select this to place the survival curve upper and lower 95% confidence interval values into the worksheet. The color of the objects in a survival curve group may be changed with this option. You may change these colors. 95% Confidence Intervals. . Select a time unit from the drop-down list or enter a unit. Selecting this will add error bars for the standard errors of the survival probability. confidence interval lines) are changed to the selected color. If you select this then the Y-axis scaling will be from 0 to 1. Select one of the following: Fraction. and other graph attributes. You can display the survival graph either using fractional values (probabilities) or percents.450 Chapter 9 Group Color. You can add two different types of graph elements to your survival curve from the Type drop-down list: 95% Confidence Intervals. These units are used in the graph axis titles and the survival report. This reduces the length of the report for large data sets. Time Units. Additional Plot Statistics. Clear this option to exclude the cumulative probability table from the report. All of these elements will be graphed with the same color as the survival curve. These are placed at the failure times.

451 Survival Analysis Running a Single Group Survival Analysis To run a single group survival analysis you need to select survival time and status data columns to analyze. To run a Single Group analysis: 1. The Survival Time column must precede and be adjacent to the Status column. 3. the selected columns appear in the Selected Columns list. . For more information. 4. If you selected columns before you chose the test. From the menus select: Statistics Survival Single Group The Pick Columns for Survival Single Group dialog box appears prompting you to select your data columns. If you want to select your data before you run the test then drag the pointer over your data. 2. see “Setting Single Group Test Options” on page 448. Specify any options for your graph and report. Select Survival Single Group from the Standard toolbar drop-down list. Use the Pick Columns panel to select these two columns in the worksheet.

Click Next to choose the status variables. The first selected column is assigned to the first row (Time) in the Selected Columns list. You can also clear a column assignment by doubleclicking it in the Selected Columns list. select the assignment in the list and then select a new column from the worksheet. 6. and the next selected column is assigned to the next row (Status) in the list. To assign the desired worksheet columns to the Selected Columns list. or select the columns from the Data for drop-down list. The number or title of selected columns appears in each row. The status variables found in the columns you selected are shown in the Status labels in selected columns window.452 Chapter 9 Figure 4-1 The Pick Columns for Survival Single Group Panel Prompting You to Select Time and Status Columns 5. . 7. Select these and click the right arrow buttons to place the event variables in the Event window and the censored variable in the Censored window. select the columns in the worksheet. To change your selections.

For more information. see “Setting Single Group Test Options” on page 448. You can have more than one Event label and more than one Censored label. then the saved selections appear in the Event and Censored windows. If the next data set contains exactly the same status labels. Click the back arrows to remove labels from the Event and Censored windows. Click Finish to create the survival graph and report. 8. This places them back in the Status labels in selected columns window. 9. The results you obtain depend on the Test Options that you selected. or if you are reanalyzing your present data set. . For descriptions of the derivations for survival curve statistics see Hosmer & Lemeshow or Kleinbaum. You need not select all the variables. any data associated with cleared status variables will be considered missing. though. You need not select a censored variable. SigmaPlot saves the Event and Censored labels that you selected for your next analysis. Interpreting Single Group Survival Results The Single Group survival analysis report displays information about the origin of your data. You must select one Event label in order to proceed.453 Survival Analysis Figure 7-1 The Pick Columns for Survival Single Group Panel Prompting You to Select the Status Variables. a table containing the cumulative survival probabilities and summary statistics of the survival curve. and some data sets will not have any censored values.

for each event time. the cumulative survival probability and its standard error. You can also set the number of decimal places to display. The upper and lower 95% confidence limits are not displayed but these may be placed into the worksheet. the number of events that occurred. Data Summary Table The data summary table shows the total number of cases. the time units used are displayed. . You can turn off this text on the Options dialog box.454 Chapter 9 Results Explanations In addition to the numerical results. but you can infer their existence from jumps in the Number at Risk data and the summary table immediately below this table You can turn the display of this table off by clearing this option in the Results tab of Test Options. The sum of the number of events. Report Header Information The report header includes the date and time that the analysis was performed. The data source is identified by the worksheet title containing the data being analyzed and the notebook name. The median survival time is commonly used in publications. censored and missing values. Statistical Summary Table The mean and percentile survival times and their statistics are listed in this table. shown below this. expanded explanations of the results may also appear. Also. Failure times are not shown. the number of subjects remaining at risk. The event and censor labels used in this analysis are listed. will equal the total number of cases. Survival Cumulative Probability Table The survival probability table lists all event times and. This is useful for large data sets.

) so you have considerable control over the appearance of your graph. are calculated and presented in the report. This is different from the other statistical tests where you select a report graph a posteriori.455 Survival Analysis Single Group Survival Graph Visual interpretation of the survival curve is an important component of survival analysis. failure symbols. censored symbols. Statistics associated with each group. . LogRank Survival Analysis LogRank Surval Analysis analyzes survival data from multiple groups and creates a report and a graph showing multiple survival curves. Figure 9-1 A Single Group Survival Curve You can control the graph in two ways: Each object in the graph is a separate plot (for example. For this reason SigmaPlot always generates a survival curve graph. etc. upper confidence limit. survival curve. such as the median survival time.

You may select multiple labels for each. 4. 6. The chisquare is formed from the sum across groups of the square of the difference of the actual and estimated number of events for each group (censored values removed) divided by the estimated number of events (S(Oi-Ei)2/Ei). Status column pairs for Raw data format or Group. If desired set the LogRank options. 7. Enter or arrange your data in either Indexed or Raw data format in the worksheet. This is different from the Gehan-Breslow test that weights the early data more since it assumes that this data is more accurate.Indexed or Raw . multiple Time. Select the appropriate data format . see “Setting LogRank Survival Options” on page 457. The LogRank test assumes that there is no difference in the accuracy of the data at any given time. Select the Event and Censored labels. It generates a P value that is the probability of the chance occurrence of survival curves as different (or more so) as those observed. Select the groups from the Group panel if you selected Indexed data format and click Next. Status for Indexed data format . Pick the worksheet columns. and click Next. 8. . It is a nonparametric test that uses a chi-square statistic to reject the null hypothesis that the survival curves came from the same population.456 Chapter 9 You can also perform the LogRank test to determine whether survival curves are significantly different. For more information. 2. Time.and click Next. From the menus select: Statistics Survival LogRank 5. Performing a LogRank Analysis 1. Select Survival LogRank from the Standard toolbar drop-down list. 3. see “Running a LogRank Survival Analysis” on page 461. For more information.

Arranging LogRank Survival Analysis Data Multiple Time. For Indexed data format. Time and Status. From the menus select: Statistics Current Test Options . View and interpret the LogRank survival analysis graph. Indexed data format requires three columns for Group. then drag the pointer over your data. Click Finish. You can preselect the data to have the column selection panel automatically select the Time. 10. see “LogRank Survival Graph” on page 469. View and Interpret the LogRank survival analysis report. and want to select your data before you create the curve. see “Interpreting LogRank Survival Results” on page 467. For more information.457 Survival Analysis 9. Time and Status variables in adjacent columns and in that order also allows automatic column selection. Status column pairs if you organize your worksheet with the Time column preceding the Status column and have all columns adjacent. Status column pairs (two or more) are required for Raw data format. 11. 2. Setting LogRank Survival Options Use the Survival LogRank Test Options to: Specify attributes of the generated survival curve graph Customize the post-test contents of the report and worksheet Select the multiple comparison test and its options To change the Survival Curve options: 1.For more information. If you are going to analyze your survival curve after changing test options. Select Survival LogRank from the Standard toolbar Select Test drop-down list. 3. placing the Group.

Click the Results tab to specify the survival time units and to modify the content of the report and worksheet. Click the Graph Options tab to view the graph symbol. Figure 3-1 The Options for Survival LogRank DialogBox Displaying the Graph Options Results. For more information. Additional statistical graph elements may also be selected here. For more information. see “Options for Survival Log Rank: Results” on page 460. Figure 3-2 The Options for Survival LogRank Dialog Displaying the Report and Worksheet Results Options .458 Chapter 9 The Options for Survival LogRank dialog box appears with three tabs: Graph Options. line and scaling options. see “Options for Survival LogRank: Graph Options” on page 459.

survival line. All graph options apply to graphs that are created when the analysis is run. . Selecting this box displays symbols at the failure times. To accept the current settings and close the options dialog box. confidence interval lines. Use Graph Properties to modify the attributes of the survival curves after they have been created. Censored symbols are graphed by default. Options for Survival LogRank: Graph Options Status Symbols. for example. As such they provide redundant information and need not be displayed. Censored Symbols. see “Options for Survival LogRank: Post Hoc Tests” on page 461 below. click Run Test. 5. click OK. Figure 3-3 The Options for Survival LogRank Dialog Displaying the Post Hoc Test Options SigmaPlot saves options settings between sessions. Clear this option to not display the censored symbols.459 Survival Analysis Post Hoc Tests. The color of the objects in a survival curve group may be changed with this option. To continue the test. For more information. Failures Symbols. see “Running a LogRank Survival Analysis” on page 461. Select the Graph Options tab on the Options dialog box to view the status symbols options. All objects. For more information. symbols. 4. These symbols always occupy the inside corners of the steps in the survival curve. Click the Post Hoc Tests tab to modify the multiple comparison options. Group Color.

If this is selected for the Bonferroni test. All of these elements will be graphed with the same color as the survival curve. The critical values for the Holm-Sidak test will vary for each pairwise test. Selecting this will add the upper and lower confidence lines in a stepped line format. Note: You can change the critical P value for the LogRank test on the Options dialog box. Additional Plot Statistics. P values for multiple comparisons. This reduces the length of the report for large data sets. or incrementing that is a multi-color scheme. A four density gray scale color scheme is used as the default. . These are placed at the failure times. Fraction. Clear this option to exclude the cumulative probability table from the report. Time Units. from Graph Properties after the graph has been created. Cumulative Probability Table. the critical values will be identical for all pairwise tests. Standard Error Bars. These units will be used in the graph axis titles and the survival report. Select a time unit from the drop-down list or enter a unit. This is a global setting for the critical P value and affects all tests in SigmaPlot . Selecting this will add error bars for the standard errors of the survival probability. You may change these colors. Options for Survival Log Rank: Results Report. Selecting this will result in a Y axis scaling from 0 to 100. Two different types of graph elements may be added to your survival curves. Use Graph Properties to modify individual object colors after the graph has been created.460 Chapter 9 will be changed to the selected color or color scheme. You may change this to black. Note: The results in the report are always expressed in fractional terms no matter which option is selected for the graph. and other graph attributes. You can display the survival graph either using fractional values (probabilities) or percents. Survival Scale. If you select this then the Y axis scaling will be from 0 to 1. You can select one of two Types: 95% Confidence Intervals. Percent. Select this to show both the P values from the pairwise multiple comparison tests and the critical values against which the pairwise P values are tested. where all survival curves and their attributes will be black.

. Status for Raw data and Group. Time Status for Indexed data). If the original comparison test is not significant then the multiple comparison results will also be not significant and will just clutter the report. You may elect to always show them by de-selecting the Only when Survival P Value is Significant option. Note: If multiple comparisons are triggered. Running a LogRank Survival Analysis To run a LogRank survival analysis you need to select data in the worksheet and specify the status variables. see “Arranging LogRank Survival Analysis Data” on page 457. Always Perform.01 using the Significance Value for Multiple Comparisons drop-down list. For more information. Select this to place multiple comparison results in the report only when the original comparison test is significant. The columns must be adjacent and in the correct order (Time.461 Survival Analysis Worksheet. Options for Survival LogRank: Post Hoc Tests Multiple Comparisons. or the sizes of these differences. Only when Survival P Value is Significant. Multiple comparison procedures isolate these differences. The significance level can be set to either 0. 95% Confidence Intervals. Select this to place the survival curve upper and lower 95% confidence intervals into the first empty worksheet columns. Select this option to always display multiple comparison results in the report. To run a LogRank Survival analysis: 1. the report will show the results of the comparison. LogRank tests the hypothesis of no differences between survival groups but do not determine which groups are different. You can select when multiple comparisons are to be computed and displayed in the report. The multiple comparison test is a separate computation from the original comparison test so it is possible to obtain significant results from the multiple comparison test when the original test was insignificant. If you want to select your data before you run the test then drag the pointer over your data.05 or 0.

Indexed data format when you have the groups specified by a column. . If you selected columns before you chose the test. From the menus select: Statistics Survival LogRank The Pick Columns for Survival LogRank dialog box appears. Status column pairs. 3.462 Chapter 9 2. the selected columns appear in the Selected Columns list. Figure 3-1 The Data Format Panel With Raw Data Format Selected 4. From the Data Format drop-down list select either: Raw data format when you have groups of data in multiple Time. Click Next to display the Pick Columns panel that prompts you to select your data columns.

Continue selecting Time. The first selected column is assigned to the first row (Time 1) in the Selected Columns list. Click Next to choose the status variables. To change your selections. Status columns for all groups that you wish to analyze. Select these and click the right arrow buttons to place the event variables in the Event: window and the censored variable in the Censored: window. The number or title of selected columns appears in each row. To assign the desired worksheet columns to the Selected Columns list. 6. You can also clear a column assignment by doubleclicking it in the Selected Columns list. . The status variables found in the columns you selected are shown in the Status labels in selected columns: box. select the assignment in the list and then select a new column from the worksheet.463 Survival Analysis Figure 4-1 The Pick Columns Panel for Survival LogRank Raw Data Format Prompting You to Select Multiple Time and Status Columns 5. 7. and the next selected column is assigned to the next row (Status 1) in the list. or select the columns from the Data for drop-down list. select the columns in the worksheet.

any data associated with unselected status variables will be considered missing. Click the back arrow keys to remove labels from the Event: and Censored: windows. This places them back in the Status labels in selected columns: window. and some data sets will not have any censored values. though. You need not select a censored variable. SigmaPlot saves the Event and Censored labels that you selected for your next analysis. then the saved selections appear in the Event: and Censored: windows. If the next data set contains exactly the same status labels. You must select one Event label in order to proceed. or if you are reanalyzing your present data set. You need not select all the variables. Figure 7-2 The Pick Columns for Survival LogRank Dialog Showing the Results of Selecting the Status Variables 8. .464 Chapter 9 Figure 7-1 The Pick Columns for Survival LogRank Panel Prompting You to Select the Status Variables You can have more than one Event label and more than one Censored label.

Otherwise select groups from the Data for Group drop-down list. You can select subsets of all groups and select them in the order that you wish to see them in the report. Figure 9-1 The Pick Columns Panel for Survival LogRank Indexed Data Format Prompting You to Select Group. then the Pick Columns panel asks you to select the three columns in the worksheet for your Group.465 Survival Analysis 9. If you want to analyze all groups found in the Group column then select Select all groups. Time and Status. If you selected Indexed data format. . Click Finish to create the survival graph and report. Time and Status Columns 10. Click Next to select the status variables as described above and then continue to complete the analysis to create the report and graph. Click Next to select the groups you want to include in the analysis. Figure 10-1 The Group Selection Panel for Survival LogRank Indexed Data Format Prompting You to Select Groups to Analyze 11. The results you obtain depend on the Test Options that you selected.

When performing the test. and LogRank produces a P value equal to or less than the trigger P value.466 Chapter 9 Multiple Comparison Options LogRank tests the hypothesis of no differences between the several survival groups. A P value less than the critical level indicates there is a significant difference between the corresponding two groups. see “Bonferroni Test” below. There are two multiple comparison tests to choose from for the LogRank survival analysis: Holm-Sidak Bonferroni Holm-Sidak. the rank of the P value. It is more powerful than the Bonferroni test and. For more information. It is recommended as the first-line procedure for pairwise comparison testing. Each P value is then compared to a critical level that depends upon the significance level of the test (set in the test options). or you selected to always run multiple comparison in the Options for LogRank dialog. Bonferroni. Holm-Sidak Test The Holm-Sidak Test can be used for both pairwise comparisons and comparisons versus a control group. the P values of all comparisons are computed and ordered from smallest to largest. see “Holm-Sidak Test” below. it is able to detect differences that the Bonferroni test does not. or the sizes of the differences. Multiple comparison tests isolate these differences by running comparisons between the experimental groups. If you selected to run multiple comparisons only when the P value is significant. the multiple comparison results are displayed in the Report. For more information. Figure 11-1 Holm-Sidak Multiple Comparison Results for VA Lung Cancer Study Bonferroni Test . consequently. and the total number of comparisons made. but does not determine which groups are different.

Since the critical level does not increase. The critical level is the ratio of the family P value to the number of comparisons. Figure 11-2 Bonferroni Multiple Comparison Results for VA Lung Cancer Study Interpreting LogRank Survival Results The LogRank survival analysis report displays information about the origin of your data.05/6 = 0.00833. The critical level is constant at 0. as it does for the Holm-Sidak test. For descriptions of the derivations for survival curve statistics see Hosmer & Lemeshow or Kleinbaum.467 Survival Analysis The Bonferroni test performs pairwise comparisons with paired chi-square tests. It is a more conservative test than the Holm-Sidak test in that the chi-square value required to conclude that a difference exists becomes much larger than it really needs to be. there will tend to be fewer comparisons with significant differences. summary statistics for each survival curve and the LogRank test of significance. It is computationally similar to the Holm-Sidak test except that it is not sequential (the critical level used is fixed for all comparisons). Multiple comparison test results will also be displayed provided significant differences were found or the Post Hoc Tests Options were selected to display them. tables containing the cumulative survival probabilities for each group. .

Report Header Information The report header includes the date and time that the analysis was performed. You can also set the number of decimal places to display in the Options dialog box. The data source is identified by the worksheet title containing the data being analyzed and the notebook name. Also. The event and censor labels used in this analysis are listed. the time units used are displayed. You can turn off this text on the Options dialog box. expanded explanations of the results may also appear. .468 Chapter 9 Figure 11-3 The LogRank Survival Analysis Results Report Results Explanations In addition to the numerical results.

for each event time. For this reason SigmaPlot always generates a . . Statistical Summary Table The mean and percentile survival times and their statistics are listed in this table. the number of subjects remaining at risk. The median survival time is commonly used in publications.469 Survival Analysis Survival Cumulative Probability Table The survival probability table lists all event times and. Failure times are not shown but you can infer their existence from jumps in the Number at Risk data and the summary table immediately below this table. censored and missing values. Data Summary Table The data summary table shows the total number of cases. This is different from the other statistical tests where you select a report graph a posteriori. The sum of the number of events. The upper and lower 95% confidence limits are not displayed but these may be placed into the worksheet. You can turn the display of this table off by clearing this option in the Results tab of Test Options. the number of events that occurred. LogRank Survival Graph Visual interpretation of the survival curve is an important component of survival analysis. will equal the total number of cases. shown below this. This is useful to keep the report a reasonable length when you have large data sets. the cumulative survival probability and its standard error.

the default Test Options.) so you have considerable control over the appearance of your graph. such as the median survival time. creates a report and a graph showing multiple survival curves. etc.. survival curve.g. was used. This is confirmed by the LogRank test. are calculated and presented in the report. Gehan-Breslow Survival Analysis The Gehan-Breslow option analyzes survival data from multiple groups. gray scale colors. solid circle symbols.470 Chapter 9 Figure 11-4 LogRank Survival Curves In the graph above. You can control the graph in two ways: Each object in the graph is a separate plot (e. Squamous and large cell carcinomas do not appear to be significantly different (as well as small cell and adenocarcinoma). censored symbols. It is a nonparametric test that uses a chi-square statistic to reject . upper confidence limit. The Gehan-Breslow test is also performed to determine whether survival curves are significantly different. failure symbols. Statistics associated with each group.

If desired set the Gehan-Breslow options. Pick the worksheet columns and click Next. Select the groups from the Group panel if you selected Indexed data format and click Next. . As an example. see “Running a Gehan-Breslow Survival Analysis” on page 476. This is different from the LogRank test that assumes there is no difference in the accuracy of the survival times. For more information.and click Next. For more information. Performing a Gehan-Breslow Analysis 1.Indexed or Raw . you would want to use Gehan-Breslow if there were many late-survival-time censored values. see “Arrange Gehan-Breslow Survival Analysis Data” on page 472. You may select multiple labels for each. From the menus select: Statistics Survival Gehan-Breslow 4. Select the Event and Censored labels. 8. The chisquare is formed from the sum across groups of the square of the difference of the actual and estimated number of events for each group (censored values removed) divided by the estimated number of events (S(Oi-Ei)2/Ei). For more information. 3. 5. see “Setting GehanBreslow Survival Options” on page 472.471 Survival Analysis the null hypothesis that the survival curves came from the same population. 7. Run the test. Select the appropriate data format . 2. The Gehan-Breslow test assumes that the early survival times are known more accurately than later times and weights the data accordingly. Enter or arrange your data in either Indexed or Raw data format in the worksheet. It generates a P value that is the probability of the chance occurrence of survival curves as different (or more so) as those observed. 6.

see “Interpreting Gehan-Breslow Survival Results” on page 482. and want to select your data before you create the curve. Customize the post-test contents of the report and worksheet. Status column pairs if you organize your worksheet with the Time column preceding the Status column and have all columns adjacent. Setting Gehan-Breslow Survival Options Use the Survival Gehan-Breslow Test Options to: Specify attributes of the generated survival curve graph. Arrange Gehan-Breslow Survival Analysis Data Multiple Time. If you are going to analyze your survival curve after changing test options. 3. Indexed data format requires three columns for Group. Time and Status. Status column pairs (two or more) are required for Raw data format. 2. 11.472 Chapter 9 9. View and interpret the Gehan-Breslow survival analysis report and curve. For more information. From the menus select: Statistics Current Test Options The Options for Survival Gehan-Breslow dialog box appears with three tabs: . Click Finish. 10. Generate a report graph. To change the Survival Curve options: 1. Select the multiple comparison test and its options. Select Survival Gehan-Breslow from the Standard toolbar drop-down list. You can preselect the data to have the column selection panel automatically select the Time. see “Gehan-Breslow Survival Graph” on page 484. then drag the pointer over your data. For more information.

473 Survival Analysis Graph Options. Click the Graph Options tab to view the graph symbol. For more information. Click the Results tab to specify the survival time units and to modify the content of the report and worksheet. For more information. For more information. Click the Post Hoc Tests tab to modify the multiple comparison options. . see “Options for Survival Gehan-Breslow: Post Hoc Tests” on page 476. line and scaling options. see “Options for Survival Gehan-Breslow: Results” on page 475. see “Options for Survival Gehan-Breslow: Graph Options” on page 474. Figure 3-1 The Options for Survival Gehan-Breslow Dialog Displaying the Graph Options Results. Figure 3-2 The Options for Survival Gehan-Breslow Dialog Displaying the Report and Worksheet Results Options Post Hoc Tests. You can select additional statistical graph elements here.

Censored symbols are graphed by default. for example. see “Options for Survival Gehan-Breslow: Post Hoc Tests” on page 476. 4. A four density gray scale color scheme is used as the default. confidence interval lines. Group Color. click OK. To continue the test. The color of the objects in a survival curve group may be changed with this option. will be changed to the selected color or color scheme.474 Chapter 9 Figure 3-3 The The Options for Survival Gehan-Breslow Dialog Displaying the Post Hoc Test Options SigmaPlot saves the options settings between sessions. Options for Survival Gehan-Breslow: Graph Options Status Symbols. These symbols always occupy the inside corners of the steps in the survival curve. Failures Symbols. All objects. You may change this to black. Clear this option to not display the censored symbols. symbols. For more information. where all survival curves . Censored Symbols. survival line. All graph options apply to graphs that are created when the analysis is run. 5. Select the Graph Options tab on the Options dialog box to view the status symbols options. Selecting this box displays symbols at the failure times. click Run Test. To accept the current settings. The Pick Columns panel appears. Use Graph Properties to modify the attributes of the survival curves after they have been created. As such they provide redundant information and need not be displayed.

Select a time unit from the drop-down list or enter a unit. You may change these colors. This reduces the length of the report for large data sets. If this is selected for the Bonferroni test. All of these elements will be graphed with the same color as the survival curve. Selecting this will result in a Y axis scaling from 0 to 100. These units will be used in the graph axis titles and the survival report. Select this to show both the P values from the pairwise multiple comparison tests and the critical values against which the pairwise P values are tested. Time Units. Cumulative Probability Table. Survival Scale. Additional Plot Statistics.475 Survival Analysis and their attributes will be black. from Graph Properties after the graph has been created. or incrementing that is a multi-color scheme. Use Graph Properties to modify individual object colors after the graph has been created. . and other graph attributes. Worksheet. Note: The results in the report are always expressed in fractional terms no matter which option is selected for the graph. You can display the survival graph either using fractional values (probabilities) or percents. Fraction. Selecting this will add the upper and lower confidence lines in a stepped line format. P values for multiple comparisons. Two different types of graph elements may be added to your survival curves. Selecting this will add error bars for the standard errors of the survival probability. If you select this then the Y axis scaling will be from 0 to 1. Percent. These are placed at the failure times. Clear this option to exclude the cumulative probability table from the report. Options for Survival Gehan-Breslow: Results Report. the critical values will be identical for all pairwise tests. Standard Error Bars. You can select one of two Types: 95% Confidence Intervals. Note: You can also change the critical P value for the Gehan-Breslow test on the Options dialog box. This is a global setting for the critical P value and affects all tests in SigmaPlot. The critical values for the Holm-Sidak test will vary for each pairwise test.

476 Chapter 9 95% Confidence Intervals. Note: If multiple comparisons are triggered. 2.05 or 0. for example: Time. Select this to place the survival curve upper and lower 95% confidence intervals into the first empty worksheet columns. For more information. Running a Gehan-Breslow Survival Analysis To run a Gehan-Breslow survival analysis you need to select data in the worksheet and specify the status variables. Select this option to always display multiple comparison results in the report. Status for Raw data and Group. To run a Gehan-Breslow Survival analysis: 1.01 using the Significance Value for Multiple Comparisons drop-down list. Specify any options for your graph. You can select when multiple comparisons are to be computed and displayed in the report. see “Setting Gehan-Breslow Survival Options” on page 472. Gehan-Breslow tests the hypothesis of no differences between survival groups but does not determine which groups are different. Select this to place multiple comparison results in the report only when the original comparison test is significant. The multiple comparison test is a separate computation from the original comparison test so it is possible to obtain significant results from the multiple comparison test when the original test was insignificant. Options for Survival Gehan-Breslow: Post Hoc Tests Multiple Comparisons. The significance level can be set to either 0. You may elect to always show them by clearing Only when Survival P Value is Significant. The columns must be adjacent and in the correct order. . or the sizes of these differences. Multiple comparison procedures isolate these differences. Always Perform. report and post-hoc tests. If the original comparison test is not significant then the multiple comparison results will also be not significant and will just clutter the report. Only when Survival P Value is Significant. If you want to select your data before you run the test then drag the pointer over your data. the report shows the results of the comparison. Time Status for Indexed data.

The number or title of selected columns appears in each row. Select the Indexed data format when you have the groups specified by a column. To assign the desired worksheet columns to the Selected Columns list. Click Next. . Select Survival Gehan-Breslow from the Standard toolbar drop-down list. and the next selected column is assigned to the next row (Status 1) in the list. Indexed. Status columns for all groups that you wish to analyze. From the menus select: Statistics Run Current Test The Pick Columns for Survival Gehan-Breslow dialog box appears. Figure 4-1 The Data Format Panel With Raw Data Format Selected 5. 6. If you selected columns before you chose the test. From the Data Format drop-down list select either: Raw. or select the columns from the Data for drop-down list. Select the Raw data format if you have groups of data in multiple Time.477 Survival Analysis 3. 7. The first selected column is assigned to the first row (Time 1) in the Selected Columns list. Status column pairs. 4. Continue selecting Time. select the columns in the worksheet. the selected columns appear in the Selected Columns list.

select the assignment in the list and then select a new column from the worksheet. Select these and click the right arrow buttons to place the event variables in the Event: window and the censored variable in the Censored: window. You can also clear a column assignment by doubleclicking it in the Selected Columns list. The status variables found in the columns you selected are shown in the Status labels in selected columns: window. .478 Chapter 9 Figure 7-1 The Pick Columns for Survival LogRank Panel Prompting You to Select Multiple Time and Status Columns 8. To change your selections. Click Next to choose the status variables. Figure 8-1 The Pick Columns for Survival Gehan-Breslow Panel Prompting You to Select the Status Variables 9.

Select one Event: label to proceed. Click Finish to create the survival graph and report. any data associated with cleared status variables are considered missing. If you selected Indexed data format then the Pick Columns panel asks you to select the three columns in the worksheet for your Group. If the next data set contains exactly the same status labels. 11. though. You also don’t need to select all the variables.479 Survival Analysis Figure 9-1 The Pick Columns for Survival Gehan-Breslow Dialog Showing the Results of Selecting the Status Variables You can have more than one Event: label and more than one Censored: label. 10. SigmaPlot saves the Event and Censored labels that you selected for your next analysis. The results you obtain will depend on the Test Options that you selected. and some data sets will not have any censored values. You don’t need to select a censored variable. . or if you are reanalyzing your present data set. Click the back arrow keys to remove labels from the Event: and Censored: windows. then the saved selections appear in the Event and Censored windows. This places them back in the Status labels in selected columns: window. Time and Status.

Click Next to select the groups you want to include in the analysis. Click Next to select the status variables as described above and then to complete the analysis to create the report and graph.480 Chapter 9 Figure 11-1 The Pick Columns Panel for Survival Gehan-Breslow Indexed Data Format Prompting You to Select Group. Figure 12-1 The Group Selection Panel for Survival Gehan-Breslow Indexed Data Format Prompting You to Select Groups to Analyze 13. but does not determine which groups are different. or the sizes of the . Time and Status Columns 12. Multiple Comparison Options Gehan-Breslow tests the hypothesis of no differences between the several survival groups. You can select subsets of all groups and select them in the order that you wish to see them in the report. If you want to analyze all groups found in the Group column then select Select all groups. Otherwise select groups from the Data for Group drop-down list.

Figure 13-1 Holm-Sidak Multiple Comparison Results for VA Lung Cancer Study Bonferroni Test The Bonferroni test performs pairwise comparisons with paired chi-square tests. There are two multiple comparison tests to choose from for the Gehan-Breslow survival analysis. and Gehan-Breslow produces a P value equal to or less than the trigger P value. It is recommended as the first-line procedure for pairwise comparison testing. the rank of the P value. or you selected to always run multiple comparison in the Options for Gehan-Breslow dialog box. the P values of all comparisons are computed and ordered from smallest to largest. The critical level for the Bonferroni test . For more information. Holm-Sidak Test The Holm-Sidak Test can be used for both pairwise comparisons and comparisons versus a control group. Bonferroni. For more information. When performing the test.481 Survival Analysis differences. it is able to detect differences that Bonferroni test does not. If you selected to run multiple comparisons only when the P value is significant. consequently. Multiple comparison tests isolate these differences by running comparisons between the experimental groups. Holm-Sidak. see “Holm-Sidak Test” on page 481. see “Bonferroni Test” below. It is more powerful than the Bonferroni test and. and the total number of comparisons made. the multiple comparison results are displayed in the Report. Each P value is then compared to a critical level that depends upon the significance level of the test (set in the test options). It is computationally similar to the Holm-Sidak test except that it is not sequential (the critical level used is fixed for all comparisons). A P value less than the critical level indicates there is a significant difference between the corresponding two groups.

summary statistics for each survival curve and the Gehan-Breslow test of significance.482 Chapter 9 is the ratio of the family P value to the number of comparisons. The critical level is constant at 0. there will tend to be fewer comparisons with significant differences. It is a more conservative test than the Holm-Sidak test in that the chi-square value required to conclude that a difference exists. This occurs here with three significant comparisons as compared to four for the Holm-Sidak case. For descriptions of the derivations for survival curve statistics see Hosmer & Lemeshow or Kleinbaum. Figure 13-2 Bonferroni Multiple Comparison Results for VA Lung Cancer Study Interpreting Gehan-Breslow Survival Results The Gehan-Breslow survival analysis report displays information about the origin of your data. becomes much larger than it really needs to be.05/6 = 0. . Since the critical level does not increase. tables containing the cumulative survival probabilities for each group. Multiple comparison test results will also be displayed provided significant differences were found or the Post Hoc Tests Options were selected to display them. as it does for the Holm-Sidak test.00833.

Survival Cumulative Probability Table The survival probability table lists all event times and. but these may be placed into the worksheet. The upper and lower 95% confidence limits are not displayed. Also.483 Survival Analysis Figure 13-3 The Gehan-Breslow Survival Analysis Results Report Results Explanations Report Header Information The report header includes the date and time that the analysis was performed. the number of subjects remaining at risk. for each event time. The event and censor labels used in this analysis are listed. The data source is identified by the worksheet title containing the data being analyzed and the notebook name. Failure times are not . the number of events that occurred. the time units used are displayed. the cumulative survival probability and its standard error.

The sum of the number of events. but you can infer their existence from jumps in the Number at Risk data and the summary table immediately below this table You can turn the display of this table off by clearing this option in the Results tab of Test Options. For this reason SigmaPlot always generates a survival curve graph. will equal the total number of cases. Statistical Summary Table The mean and percentile survival times and their statistics are listed in this table. Data Summary Table The data summary table shows the total number of cases. Gehan-Breslow Survival Graph Visual interpretation of the survival curve is an important component of survival analysis.484 Chapter 9 shown. This is useful to keep the report a reasonable length when you have large data sets. . This is different from the other statistical tests where you select a report graph a posteriori. censored and missing values. shown below this. The median survival time is commonly used in publications.

etc. upper confidence limit.485 Survival Analysis Figure 13-4 Gehan-Breslow Survival Curves In the graph above. see “Setting GehanBreslow Survival Options” on page 472. incrementing colors. The Holm-Sidak test showed these two curves to be significantly different at the 0. For more information. Survival Curve Graph Examples You can modify survival curve attributes using Test Options and Graph Properties. failure symbols. You can control the graph in two ways: Each object in the graph is a separate plot (for example. .001 level. censored symbols. survival curve. percent survival and 95% confidence interval options were selected from Test Options. For more information.) so you have considerable control over the appearance of your graph. see “Editing a Survival Curve Using SigmaPlot” below.

Under Status Symbols. select Censored. select both Censored and Failures. Figure 13-5 Survival Curve with Censored Symbols Survival curve with censored and failure symbols. you can open this dialog box by selecting from the menus: Statistics Current Test Options The options used to create the examples below appear on the Graph Options tab of any of the Options for Survival dialog boxes. Survival curve with censored symbols. .486 Chapter 9 Using Test Options to Modify Graphs The examples below show four variations that can be achieved by modifying the test options for survival curves. Once you’ve selected a test from the Statistics toolbar. Under Status Symbols.

From the Type drop-down list.487 Survival Analysis Figure 13-6 Survival Curve with Censored and Failure Symbols Survival curve with both symbol types and 95% confidence intervals. To add 95% confidence intervals: 1. Select Additional Plot Statistics. . Figure 2-1 Survival Curve with both Symbol Types and 95% Confidence Intervals Survival curve with standard error bars. To add standard error bars: 3. 2. select 95% Confidence Intervals. Select Additional Plot Statistics.

. From the Type drop-down list. see “Using Test Options to Modify Graphs” on page 486.488 Chapter 9 4. For more information. Figure 4-1 Survival Curve with Standard Error Bars Editing Survival Graphs Using Graph Properties This example shows modifications made from Graph Properties to a survival curve with both symbol types and 95% confidence intervals. Figure 4-2 Survival Curve with both Symbol Types and 95% Confidence Intervals The confidence interval lines were changed from small gray dashed to solid blue. select Standard Error Bars. The censored symbol type was also changed from a solid circle to a square.

Some rules that characterize survival curves are: A step decrease occurs at every failure. and Ties The relationship between failures. It is not necessary to display symbols for failures. Censored Values. censored values and ties effects the shape of a survival curve. Censored values cause the survival curve to decrease more slowly. . Larger step decreases result from multiple failures occurring at the same time (ties).489 Survival Analysis Figure 4-3 Modifications made using Graph Properties to a Survival Curve with both Symbol Types and 95% Confidence Interval Failures. Tied failure (and failure and censored) values superimpose at the appropriate inside corner of the step survival curve. It is useful to display symbols for censored values. The curve does not decrease at a censored value. The survival curve decreases to zero if the largest survival time is a failure.

are shown at time = 8.0. The censored value at time = 19. All failures occur at the inner corners so it is not necessary to display failure symbols.490 Chapter 9 Figure 4-4 A contrived survival curve with various combinations of failures.0 and 8. censored values and tied data that graphically shows the effects of these rules. respectively. .0 prevents the survival curve from touching the X-axis. but by default they are not visible. are shown in the time interval between 2. Censored values do not cause a decrease in the survival curve and nothing unusual occurs at tied censor values. They occur at the inside corner of the step since that is where failures are located.0 (the censored values are slightly displaced for clarity). You can display failure symbols in SigmaPlot . two of which are tied.0. Failures and censored values are shown above as open and filled circles. It is located at the inner corner of the step curve. Two tied failures are shown at time = 2. Cox Regression Cox Regression is a part of Survival Analysis that studies the impact of potential risk factors on the survival time of a population. Four censored values.0. A single failure is shown at time = 1. They superimpose at the inner corner of the step that has decreased roughly twice as much as the step for a single failure. two failures and two censored. Four tied values.

or just the hazard) is defined as the instantaneous rate of change in the likelihood of failure at each point in time.491 Survival Analysis SigmaPlot has two Cox Regression tests: Proportional Hazards. and two types of drug therapy on the survival of a population suffering from some form of cancer. age. This function is closely related to the survival function.1 at some .) Consider the possible effects of gender. The survival time may decrease as age increases. As an example. A survival curve plots the relationship between each value of time and the probability of surviving beyond that value. Gender and Drug Therapy. Age. as estimated from the sampled survival data. specific values for each of the covariates lead to one estimated survival function for the population. given survival up to that point. The simplest way to visualize the effect of covariates on survival time is to construct a survival curve. Gender. and Drug Therapy are the covariates that affect the survival experience. The hazard function (sometimes known as the conditional failure rate. In this study. In Cox survival analysis. it is called a continuous or nominal covariate. (The risk factors are also often called predictors or explanatory variables. Stratified Model. Death rates among males may be higher than for females. It also allows us to determine the significant effect of each covariate. The above covariates. on the survival time of a population. Frequently. This relationship is called the survival function (or survivorship function). or covariates. Finally. one survival function is defined that is independent of any covariates. the primary object of study is the hazard function of the population. Since the covariate Age can assume a continuous range of numeric values. In Kaplan-Meier survival analysis. There are two types of covariates. drug A may increase survival time more than drug B. a categorical covariate has numeric values assigned to its categories. About Cox Regression Cox Regression is a part of Survival Analysis that studies the impact of potential risk factors. In Cox Regression. This model helps to predict the likelihood of survival at each point in time for any values of the covariates. hazard rate. suppose h is the hazard function and suppose h(t) = . each have two categories of non-numeric values and are called categorical covariates. but these values are only used for naming purposes and are not used to indicate a measurement. The graph of such a function is called a covariate-adjusted survival curve. Cox Regression defines the model that describes the relationship between the covariates and survival time.

the hazard function assumes a specific form given by: where X1. . independent of both time and the covariates. the maximum likelihood estimates. we are assuming that every covariate is time-independent and so its value for each subject remains constant over time (it is possible. then the above definitions can be used to show that the survival function S is defined at each time t by: All of the functions discussed above are not only functions of time. given the subject has survived up to time t. The baseline survival function is defined by setting all covariates to zero. In our implementation of Cox Regression. The function h0 is called the baseline hazard function and only depends upon time. Once the coefficients are determined. but does not depend on time. the covariate-adjusted survival functions and cumulative hazard functions are determined for each event time t by: Our model of the hazard function shows that if there are two specifications for the values of the covariates. . If H denotes the cumulative hazard function. The resulting values of the coefficients are called the best-fit coefficients or. and their values are determined from the regression analysis by maximizing a quantity known as the partial likelihood function. sometimes. It provides a smoothed alternative to the hazard function as estimates of the hazard function itself can be too “noisy‚” for practical use. .492 Chapter 9 time t. there is a procedure that estimates the values of the baseline survival function at the sampled event times. Another function. is defined at each value of time as the integral of the hazard over all previous values of time. however. The exponential factor on the right-hand side of the equation involves the covariates. bn in our model are constants. then an interpretation of this value is that there is approximately a 10% chance that a subject will fail within the next unit time period. then the corresponding values of the hazards are proportional . X2. Denoting this function by S0.. In the Cox model. but also depend upon the covariates in the survival study. The coefficients b1. Xn are the covariates in the study. b2. the cumulative hazard function. to extend Cox Regression to include time-dependent covariates).

for males to females is not constant over time and the proportionality assumption fails. From the menus select: Statistics Survival Cox Regresion 4. Enter or arrange you data in the worksheet. Performing a Cox Regression Proportional Hazards Model 1. For example. Such a covariate cannot be included in the hazard model. For more information. or the hazard ratio. . If desired set the Cox Regression Proportional Hazards options. Click Next and select the Event and Censored labels. one for each stratum. Any variable whose values have been included in the survival data but is not included as a covariate in the hazard model for the reasons described above is called a stratification variable. and both genders die at the same rate during the next month of the study.493 Survival Analysis over time. then the survival study is partitioned into groups. see “Arranging Cox Regression Data” on page 495. then the ratio of the hazards. 3. 5. The best-fit coefficients are the same for each stratum. where each group has its own survival function that is determined from the regression analysis. If males are dying at twice the rate of females during the first month of a study. It is possible that a potential covariate for the model does not satisfy this assumption. A covariate may also be omitted from the model because its value is based on the design of the study and has secondary importance as a risk factor for survival. This is the reason the Cox model is called a proportional hazards model. When a stratification variable is present. You may select multiple labels for each. 2. then the variable Clinic is such a covariate. when a study is performed at two different clinics to determine the impact of age and drug therapy on patient recovery. but the baseline time-dependent factors in the model are different. Select the two worksheet columns with the survival times and status values in the Pick Columns dialog box. suppose we have the covariate Gender in a survival study. the levels are the strata. collectively. Each value or level of such a variable is called a stratum. For example.

7. Interpret the Cox Regression analysis report and curve. see “Arranging Cox Regression Data” on page 495. Performing a Cox Regression Stratified Model 1. 8. You may select multiple labels for each. see “Cox Regression Graph” on page 504. see “Interpreting Cox Regression Results” on page 502. From the menus select: Statistics Survival Cox Regression 4. 6. 8.For more information. View the Cox Regression graph.494 Chapter 9 6. View the Cox Regression graph. For more information. . For more information. Click Finish. Interpret the Cox Regression analysis report and curve. If desired set the Cox Regression Stratified options. Select the two worksheet columns with the survival times and status values in the Pick Columns dialog box. Click Finish.For more information. see “Interpreting Cox Regression Results” on page 502. For more information. Enter or arrange your data in the worksheet. 7. 2. Click Next and select the Event and Censored labels. see “Cox Regression Graph” on page 504. 5. 3.

You can select additional statistical graph elements here. 2. Customize the post-test contents of the report and worksheet. Click the Graph Options tab to view the graph symbol. Setting Cox Regression PH Options Use the Survival Curve Test Options to: Specify attributes of the generated survival curve graph.495 Survival Analysis Arranging Cox Regression Data Cox Regression in SigmaPlot consists of two separate tests. To change the Survival Curve options: 1. you also select the worksheet column containing the strata. If you are going to analyze your survival curve after changing test options. line and scaling options. . Each test requires at least three data columns: a time column. Proportional Hazards and Stratified Model. Select Survival Single Group from the Standard toolbar drop-down list 3. then drag the pointer over your data. From the menus select: Statistics Current Test Options The Options for Survival Single Group dialog box appears with two tabs: Graph Options. In the Stratified Model test. see “Options for Survival Single Group: Graph Options” on page 496. status column. and want to select your data before you create the curve. For more information. and any number of covariate columns.

For more information. click Run Test. . These symbols always occupy the inside corners of the steps in the survival curve. To accept the current settings and close the dialog box. 4. The Pick Columns panel appears. As such they provide redundant information and need not be displayed. To continue the test. see “Options for Single Group Survival: Results” on page 497. Click the Graph Options tab from the Options for Survival Single Group dialog box to view the status symbols options. All graph options apply to graphs that are created when the analysis is run. Clear this option to not display the censored symbols. 5. SigmaPlot saves the options settings between sessions. Select Failures to display symbols at the failure times. Failures. Note: All options in these dialog boxes are "sticky" and remain in the state that you have selected until you change them. click OK. Click the Results tab to specify the survival time units and to modify the content of the report and worksheet.496 Chapter 9 Figure 3-1 The Options for Survival Curve Dialog Displaying the Graph Options Results. Censored. Options for Survival Single Group: Graph Options Status Symbols. Censored symbols are graphed by default.

Percent. The color of the objects in a survival curve group may be changed with this option. Selecting this will result in a Y-axis scaling from 0 to 100. All objects (for example. survival line. This reduces the length of the report for large data sets. Additional Plot Statistics. . You can display the survival graph either using fractional values (probabilities) or percents. You can add two different types of graph elements to your survival curve from the Type drop-down list: 95% Confidence Intervals. Cumulative Probability Table. confidence interval lines) are changed to the selected color. Select this to place the survival curve upper and lower 95% confidence interval values into the worksheet. symbols.497 Survival Analysis Group Color. Options for Single Group Survival: Results Report. Selecting adds the upper and lower confidence lines in a stepped line format. Select one of the following: Fraction. You may change these colors. Selecting this will add error bars for the standard errors of the survival probability. These units are used in the graph axis titles and the survival report. Note: The results in the report are always expressed in fractional terms no matter which option is selected for the graph. These are placed at the failure times. Select a time unit from the drop-down list or enter a unit. 95% Confidence Intervals. Clear this option to exclude the cumulative probability table from the report. Worksheet. Survival Scale. These are placed into the first empty worksheet columns. from Graph Properties after creating the graph. If you select this then the Y-axis scaling will be from 0 to 1. Time Units. All of these elements will be graphed with the same color as the survival curve. Standard Error Bars. and other graph attributes.

From the menus select: Statistics Current Test Options The Options for Survival Single Group dialog box appears with two tabs: Graph Options. see “Options for Survival Single Group: Graph Options” on page 496. You can select additional statistical graph elements here.498 Chapter 9 Setting Cox Regression Stratified Options Use the Survival Curve Test Options to: Specify attributes of the generated survival curve graph. line and scaling options. For more information. Select Survival Single Group from the Standard toolbar drop-down list 3. Click the Graph Options tab to view the graph symbol. If you are going to analyze your survival curve after changing test options. and want to select your data before you create the curve. Figure 3-1 The Options for Survival Curve Dialog Displaying the Graph Options . To change the Survival Curve options: 1. then drag the pointer over your data. Customize the post-test contents of the report and worksheet. 2.

As such they provide redundant information and need not be displayed. Options for Survival Single Group: Graph Options Status Symbols. Group Color. Censored symbols are graphed by default. To accept the current settings and close the dialog box. symbols. All objects (for example. If you select this then the Y-axis scaling will be from 0 to 1. Clear this option to not display the censored symbols. see “Options for Single Group Survival: Results” on page 497. Failures. Survival Scale. All graph options apply to graphs that are created when the analysis is run. click Run Test. Click the Graph Options tab from the Options for Survival Single Group dialog box to view the status symbols options. To continue the test. The Pick Columns panel appears. Select one of the following: Fraction.499 Survival Analysis Results. confidence interval lines) are changed to the selected color. survival line. These symbols always occupy the inside corners of the steps in the survival curve. 4. Selecting this will result in a Y-axis scaling from 0 to 100. Select Failures to display symbols at the failure times. Note: All options in these dialog boxes are "sticky" and remain in the state that you have selected until you change them. 5. Click the Results tab to specify the survival time units and to modify the content of the report and worksheet. click OK. Additional Plot Statistics. SigmaPlot saves the options settings between sessions. Percent. The color of the objects in a survival curve group may be changed with this option. You can display the survival graph either using fractional values (probabilities) or percents. Censored. Note: The results in the report are always expressed in fractional terms no matter which option is selected for the graph. For more information. You can add two different types of graph elements to your survival curve from the Type drop-down list: .

If you want to select your data before you run the test. Use the Pick Columns panel to select these two columns in the worksheet. then drag the pointer over your data. Select this to place the survival curve upper and lower 95% confidence interval values into the worksheet. Select Survival Single Group from the Standard toolbar drop-down list. from Graph Properties after creating the graph. . see “Setting Single Group Test Options” on page 448. Selecting this will add error bars for the standard errors of the survival probability. All of these elements will be graphed with the same color as the survival curve. This reduces the length of the report for large data sets. Specify any options for your graph and report. These units are used in the graph axis titles and the survival report. Selecting adds the upper and lower confidence lines in a stepped line format. 2. Standard Error Bars. Select a time unit from the drop-down list or enter a unit. 95% Confidence Intervals. For more information. and other graph attributes. Clear this option to exclude the cumulative probability table from the report. 3. To run a Single Group analysis: 1. The Survival Time column must precede and be adjacent to the Status column. Time Units. You may change these colors. Cumulative Probability Table. Running a Cox Regression To run a single group survival analysis you need to select survival time and status data columns to analyze.500 Chapter 9 95% Confidence Intervals. Worksheet. These are placed into the first empty worksheet columns. These are placed at the failure times. Options for Single Group Survival: Results Report.

Figure 4-1 The Pick Columns for Survival Single Group Panel Prompting You to Select Time and Status Columns 5. The number or title of selected columns appears in each row. 6. You can also clear a column assignment by doubleclicking it in the Selected Columns list. select the assignment in the list and then select a new column from the worksheet. The first selected column is assigned to the first row (Time) in the Selected Columns list. the selected columns appear in the Selected Columns list.501 Survival Analysis 4. From the menus select: Statistics Survival Single Group The Pick Columns for Survival Single Group dialog box appears prompting you to select your data columns. select the columns in the worksheet. Select these and click the right arrow buttons to place the event variables in the Event window and the censored variable in the Censored window. or select the columns from the Data for drop-down list. Click Next to choose the status variables. If you selected columns before you chose the test. 7. To assign the desired worksheet columns to the Selected Columns list. and the next selected column is assigned to the next row (Status) in the list. To change your selections. The status variables found in the columns you selected are shown in the Status labels in selected columns window. .

You need not select a censored variable. 8. a table containing the cumulative survival probabilities and summary statistics of the survival curve. then the saved selections appear in the Event and Censored windows. You can have more than one Event label and more than one Censored label. any data associated with cleared status variables will be considered missing. The results you obtain depend on the Test Options that you selected.502 Chapter 9 Figure 7-1 The Pick Columns for Survival Single Group Panel Prompting You to Select the Status Variables. Click the back arrows to remove labels from the Event and Censored windows. For more information. For descriptions of the derivations for survival curve statistics see Hosmer & Lemeshow or Kleinbaum. SigmaPlot saves the Event and Censored labels that you selected for your next analysis. Click Finish to create the survival graph and report. 9. though. . see “Setting Single Group Test Options” on page 448. or if you are reanalyzing your present data set. This places them back in the Status labels in selected columns window. Interpreting Cox Regression Results The Single Group survival analysis report displays information about the origin of your data. You need not select all the variables. You must select one Event label in order to proceed. and some data sets will not have any censored values. If the next data set contains exactly the same status labels.

the time units used are displayed. Also.503 Survival Analysis Figure 9-1 The Cox Regression Report Results Explanations In addition to the numerical results. Report Header Information The report header includes the date and time that the analysis was performed. The data source is identified by the worksheet title containing the data being analyzed and the notebook name. You can also set the number of decimal places to display. The event and censor labels used in this analysis are listed. expanded explanations of the results may also appear. . You can turn off this text on the Options dialog box.

. the number of events that occurred. Failure times are not shown but you can infer their existence from jumps in the Number at Risk data and the summary table immediately below this table You can turn the display of this table off by clearing this option in the Results tab of Test Options. This is different from the other statistical tests where you select a report graph a posteriori. Data Summary Table The data summary table shows the total number of cases. will equal the total number of cases.504 Chapter 9 Survival Cumulative Probability Table The survival probability table lists all event times and. The sum of the number of events. shown below this. Cox Regression Graph Visual interpretation of the survival curve is an important component of survival analysis. The upper and lower 95% confidence limits are not displayed but these may be placed into the worksheet. Statistical Summary Table The mean and percentile survival times and their statistics are listed in this table. The median survival time is commonly used in publications. For this reason SigmaPlot always generates a survival curve graph. for each event time. the cumulative survival probability and its standard error. This is useful for large data sets. censored and missing values. the number of subjects remaining at risk.

) so you have considerable control over the appearance of your graph. After the graph is created you can modify it using SigmaPlot’s Graph Properties. survival curve. failure symbols. etc. censored symbols. upper confidence limit. Each object in the graph is a separate plot (for example.505 Survival Analysis Figure 9-2 A Single Group Survival Curve You can control the graph in two ways: You can set the graph options to become the default values until they are changed. .

506 Chapter 9 .

of a test is the probability that the test will detect a difference or effect if there really is a difference or effect.. or sensitivity. The closer the power is to 1. Use these procedures to determine the power of an intended test or to determine the minimum sample size required to achieve a desired level of power. a 95% confidence when α = 0.001. the more sensitive the test. and sample size computations." The power of a statistical test depends on: The specific test The alpha ( α ). Power less than 0. or acceptable risk of a false positive The sample size 507 .Chapter 1 0 Computing Power and Sample Size SigmaStat provides two experimental design aids: experimental power.001 is noted as "P = < 0. you want to achieve a power of 0.α confidence (i.05). Power and sample size computations are available for: Unpaired and Paired t-tests A z-test comparison of proportions One way ANOVAs Chi-Square Analysis of Contingency Tables Correlation Coefficient About Power The power.e. Traditionally.80. which means that there is an 80% chance of detecting a specified effect with 1.

Expected standard deviation of the groups. For more information. . Use unpaired t-tests to compare two different samples from populations that are normally distributed with equal variances among the individuals. Determining the Power of a t-Test You can determine the power of an intended t-test. see "Unpaired t-Test" in Chapter 4. the greater the power of the test. the larger the sample size.508 Chapter 10 The minimum difference or treatment effect to detect The underlying variability of the data Figure 0-1 The Power Computation Commands Menu About Sample Size You can estimate how big the sample size has to be in order to detect the treatment effect or difference with a specified level of statistical significance and power. To determine the power for a t-test. All else being equal. you need to set the: Expected difference of the means of the groups you want to detect.

as determined from previous samples or experiments. This can be the size you expect to see. 4. or just an estimate.509 Computing Power and Sample Size Expected sizes of the two groups. The t-test Power dialog box appears. Alpha ( α ) used for power computations. Enter the size of the difference between the means of the two groups you want to be able to detect in the Expected Difference of Means box. from the menus select: Statistics Power t-test 2. With the worksheet in view. as determined from previous samples or experiments. or just an estimate. Figure 2-1 The t-test Power Dialog Box 3. Note: t-tests assume that the standard deviations of the underlying normally distributed populations are equal. . Enter the estimated size of the standard deviation for the population your data will be drawn from in the Expected Standard Deviation box. This can be the size you expect to see. To find the power of a t-test: 1.

Click = to see the power of a t-test at the specified conditions. Smaller values of α result in stricter requirements before concluding there is a significant difference. 6. but a greater possibility of concluding there is no difference when one exists (a Type II error).05. or that you are willing to conclude there is a significant difference when P < 0. you can change any of the settings and click= again to view the new power as many times as desired. Figure 8-1 The t-Test Power Report For descriptions of computing the power of a t-test. . An α error is also called a Type I error (a Type I error is when you reject the hypothesis of no effect when this hypothesis is true). but also increase the risk of reporting a false positive (a Type I error). If desired.510 Chapter 10 5. 8. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. If desired. The Power calculation appears at the tip of the dialog box. This indicates that a one in twenty chance of error is acceptable. Click Save to Report to save the power computation settings and resulting power to the current report and click Close to exit from t-test power computation.05. Enter the expected sizes of each group in the Group 1 Size and Group 2 Size boxes. The traditional α value used is 0. Larger values of α make it easier to conclude that there is a difference. change the alpha level in the Alpha box. you can reference an appropriate statistics reference. 7.

To determine the power for a Paired t-test. Use Paired t-tests to see if there is a change in the same individuals before and after a single treatment or change in condition. see "Paired t-Test" in Chapter 6.511 Computing Power and Sample Size Determining the Power of a Paired t-Test You can determine the power of a Paired t-test. The sizes of the treatment effects are assumed to be normally distributed. Figure 1-1 The Paired t-test Power Dialog Box . From the menus select: Statistics Power Paired t-test The Paired t-test Power dialog box appears. you need to set the: Expected change before and after treatment you want to detect Expected standard deviation of the changes Number of subjects Alpha used for power computations To find the power of a Paired t-test: 1. For more information.

This can be size of the treatment effect you expect to see. or that you are willing to conclude there is a significant treatment difference when P < 0. Larger values of α make it easier to conclude that there is an effect. Enter the size of the change before and after the treatment in the Change to be Detected box. Click = to see the power of a Paired t-test at the specified conditions. Smaller values of a result in stricter requirements before concluding there is a significant effect. 4. This can be the size you expect to see. as determined from previous experiments. as determined from previous experiments. 7. but also increase the risk of reporting a false positive (a Type I error). or just an estimate. but a greater possibility of concluding there is no effect when one exists (a Type II error). Enter the expected (or estimated) number of subjects in the Desired Sample Size box. This indicates that a one in twenty chance of error is acceptable. The traditional α value used is 0. . Enter the desired alpha level. 5. or just an estimate. 6. you can change any of the settings and click = again to view the new power as many times as desired.05. 3. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is an effect. The size of the change is determined by the difference of the means.05. If desired. Select Save to Report to save the power computation settings and resulting power to the current report.512 Chapter 10 2. Enter the size of standard deviation of the change in the Expected Standard Deviation of Change box.

Click Close. To find the power of a z-test proportion comparison: 1. To determine the power for a proportion comparison. From the menus select: Statistics Power Proportions . see "Comparing Proportions Using the z-Test" in Chapter 7.513 Computing Power and Sample Size Figure 7-1 The Paired t-test Power Computation Results Viewed in the Report 8. A comparison of proportions compares the difference in the proportion of two different groups that fall within a single category. you can reference an appropriate statistics reference. For more information. For descriptions of computing the power of a Paired t-test. Alpha ( α ) used for power computations. you need to set the: Expected proportion of each group that falls within the category. Size of each sample. Determining the Power of a z-Test Proportions Comparison You can determine the power of a z-test comparison of proportions.

If desired.05. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is an effect. Larger values of α make it easier to conclude that there is a difference. or that you are willing to conclude there is a significant distribution difference when P < 0. or just an estimate. or just an estimate. Smaller values of α result in stricter requirements before concluding there is a significant difference. 5.05. you can change any of the settings and click = again to view the new power as many times as desired. but also increase the risk of reporting a false positive (a Type I error). but a greater possibility of concluding there is no difference in distribution when one exists (a Type II error). as determined from previous experiments. This can be sample sizes you expect to obtain. Enter the desired alpha level. This can be the distribution you expect to see. Figure 1-1 The Proportions Power Dialog Box 2. 3. The traditional α value used is 0. This indicates that a one in twenty chance of error is acceptable.514 Chapter 10 The Proportions Power dialog box appears. . Enter the expected proportions that fall into the category for each group. 4. Click = to see the power of a proportion comparison at the specified conditions. Enter the sizes of each group.

Figure 6-1 The Proportion Power Computation Results Viewed in the Report 7. Click Save to Report to save the power computation settings and resulting power to the current report.515 Computing Power and Sample Size Note: SigmaStat uses the Yates correction factor if this option is selected in the Options for z-test dialog box. To determine the power for a One Way ANOVA. Estimated size of a group. Estimated number of groups. you need to specify the: Minimum difference between group means you want to detect. Determining the Power of a One Way ANOVA You can determine the power of a One Way ANOVA (analysis of variance). For more information. see "Setting z-test Options" in Chapter 7. see "One Way Analysis of Variance (ANOVA)" in Chapter 4. 6. For more information. Click Close to exit from proportion comparison power computation. you can reference an appropriate statistics reference. Standard deviation of the population from which the samples were drawn. . For descriptions of computing the power of a z-test. Use One Way ANOVAs to see if there is a difference among two or more samples taken from populations that are normally distributed with equal variances among the individuals.

as determined from previous experiments. Figure 1-1 The ANOVA Power Dialog Box 2. From the menus select: Statistics Power ANOVA The ANOVA Power dialog box appears. 4. To find the power of a One Way ANOVA: 1. as determined from previous experiments. Enter the estimated standard deviation of the population from which the samples will be drawn.516 Chapter 10 Alpha ( α ) used for power computations. Enter the minimum size of the expected difference of group means in the Minimum Difference in Group Means to be Detected box. The minimum detectable difference is the minimum difference between the largest and smallest means. Enter the expected number of groups and the expected size of each group. 3. This can be the size you expect to see. . or just an estimate. This can be the size of a difference you expect to see. or just an estimate.

you can change any of the settings and click = again to view the new power as many times as desired. but a greater possibility of concluding there is no difference when one exists (a Type II error). Select Save to Report to save the power computation settings and resulting power to the current report. Enter the desired alpha level. The power calculation appears at the top of the dialog.. Smaller values of α result in stricter requirements before concluding there is a significant difference. 7. For descriptions of computing the power of a One Way ANOVA. i. This indicates that a one in twenty chance of error is acceptable. Click Close to exit from ANOVA power computation. 6.e. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is an effect.05. but also increase the risk of reporting a false positive (a Type I error).05. If desired. The traditional α value used is 0.517 Computing Power and Sample Size 5. Larger values of α make it easier to conclude that there is a difference. Figure 7-1 The ANOVA Power Computation Results Viewed in the Report 8. Click = to see the power of a One Way ANOVA at the specified conditions. you can reference an appropriate statistics reference. you are willing to conclude there is a significant difference when P < 0. .

you need to enter a contingency table in the worksheet containing the estimated pattern in the observations before you can compute the estimated proportions. Because SigmaStat uses numbers of observations to compute the estimated proportions. Figure 8-1 The Contingency Table with Expected Numbers of Observations of Two Groups in Three Categories Note: You only need to specify the pattern (distribution) of the number of observations. These observations are used to compute the estimated proportions. The power of a χ2 analysis contingency tables is determined by the estimated relative proportions in each category for each group. Enter a contingency table into the worksheet by placing the estimated number of observations for each table cell in a corresponding worksheet cell.518 Chapter 10 Determining the Power of a Chi-Square Test You can determine the power of a chi-square (χ2) analysis of a contingency table. . only their relative values. The absolute numbers in the cells do not matter. A χ2 test compares the difference between the expected and observed number of individuals of two or more different groups that fall within two or more categories. To find the power of a chi-squared test: 1.

From the menus select: Statistics Power Chi-Square The Pick Columns for Chi-Square Power dialog box appears. Select the columns of the contingency table from the worksheet as prompted.519 Computing Power and Sample Size Figure 1-1 Contingency Table Data Entered into the Worksheet The worksheet rows and columns correspond to the groups and categories. Figure 2-1 The Chi-square Power Dialog Box 3. 4. Click Finish when you’ve selected the desired columns. The number of observations must always be an integer. Note: The order and location of the rows or columns corresponding to the groups and categories is unimportant. 2. .

Figure 8-1 The Chi-square Power Computation Results Viewed in the Report . Smaller values of α result in stricter requirements before concluding there is a significant difference. you need to click Cancel. and then click Cancel to exit from chi-square test power computation. then repeat the sample size computation.520 Chapter 10 The Chi-Square Power dialog box appears. 5. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. or that you are willing to conclude there is a significant difference when P < 0. 6. If desired.05. This can be the number of observations you expect to see. if you want to change the number of observations per category.05. you can change any of the settings and click = again to view the new power as many times as desired. but also increase the risk of reporting a false positive (a Type I error). The traditional α value used is 0. Enter the total number of observations in the Sample Size box. Click = to see the power of a chi-square test at the specified conditions. 7. or just an estimate. as determined from previous experiments. but a greater possibility of concluding there is no effect when one exists (a Type II error). 8. Larger values of α make it easier to conclude that there is a difference. edit the table. Select Save to Report to save the power computation settings and resulting power to the current report file. However. This indicates that a one in twenty chance of error is acceptable. Enter the desired alpha level.

From the menus select: Statistics Power Correlation The Correlation Power dialog box appears. To determine the power of a correlation coefficient. you can reference an appropriate statistics reference. Figure 1-1 The Correlation Power Dialog Box . Alpha ( α ) used for power computations. A correlation coefficient of 1 means that as one variable increases.521 Computing Power and Sample Size For descriptions of computing the power of a chi-square analysis of contingency tables. A correlation coefficient quantifies the strength of association between the values of two variables. A correlation coefficient of -1 means that as one variable increases. the other decreases exactly linearly. For more information. Desired sample size. the other increases exactly linearly. To find the power to detect a correlation coefficient: 1. you need to specify the: Correlation coefficient you want to detect. Determining the Power to Detect a Specified Correlation You can determine the power to detect a given Pearson Product Moment Correlation Coefficient R. see "Pearson Product Moment Correlation" in Chapter 8.

4. you can change any of the settings and click = again to view the new power as many times as desired. 3. Larger values of α make it easier to conclude that there is an association. Click = to see the power of a correlation coefficient at the specified conditions.05. Enter the desired alpha level. 5. . and then click Close to exit from correlation coefficient power computation. Enter the expected correlation coefficient. Enter the desired number of data points. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is an association. as determined from previous experiments.522 Chapter 10 2. Figure 6-1 The Correlation Power Dialog Box For descriptions of computing the power to detect a correlation coefficient.05. Click Save to save the power computation settings and resulting power to the current report. or that you are willing to conclude there is an association when P < 0. or just an estimate. The power calculation appears at the top of the dialog box. Smaller values of α result in stricter requirements before concluding there is a true association. This can be the sample size you expect to obtain. or just an estimate. you can reference an appropriate statistics reference. but also increase the risk of reporting a false positive (a Type I error). but a greater possibility of concluding there is no relationship when one exists (a Type II error). If desired. This indicates that a one in twenty chance of error is acceptable. The traditional α value used is 0. This can be the correlation coefficient you expect to see. 6.

see "Unpaired tTest" in Chapter 4. Unpaired t-tests are used to compare two different samples from populations that are normally distributed with equal variances among the individuals. you need to specify the: Expected difference of the means of the groups you want to detect. For more information. To determine the sample size for a t-test. From the menus select: Statistics Sample Size t-test The t-test Sample Size dialog box appears. Alpha level ( α ) used for determining the sample size.523 Computing Power and Sample Size Determining the Minimum Sample Size for a t-Test You can determine the minimum sample size for an intended t-test. To determine the sample size of a t-test: 1. Expected standard deviation of the underlying populations. Desired power of the t-test. Figure 1-1 The t-test Sample Size Dialog Box .

6. Larger values of α make it easier to conclude that there is a difference. Enter the size of the difference between the means of the two groups to be detected in the Expected Difference in Means box. The sample size is the size of each of the groups. you want to achieve a power of 0. 3. Enter the estimated standard deviation of the underlying population in the Expected Standard Deviation box. This indicates that a one in twenty chance of error is acceptable.80. which means that there is an 80% chance of detecting a difference with 1. This can be the size you expect to see.α confidence (i. Note: t-tests assume that the standard deviations of the underlying normally distributed populations are equal.05. a 95% confidence when α = 0. This can be the size you expect to see. as determined from previous samples or experiments. The closer the power is to 1. or just an estimate. or just an estimate. 5. but also increase the risk of reporting a false positive (a Type I error).05). The sample size calculation appears at the top of the dialog. Power is the probability that the t-test will detect a difference if there really is a difference.e. .524 Chapter 10 2. but a greater possibility of concluding there is no difference when one exists (a Type II error). 4. The traditional α value used is 0. 7. you can change any of the settings and click = again to view the new sample size as many times as desired. or that you are willing to conclude there is a significant difference when P < 0. If desired. Click = to see the required sample size for a t-test at the specified conditions. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference.. Click Save to save the sample size computation settings and resulting sample size to the current report. Smaller values of α result in stricter requirements before concluding there is a significant difference. Traditionally. Enter the desired power. or test sensitivity in the Desired Power box. the more sensitive the test. Enter the desired alpha level in the Alpha box.05. as determined from previous samples or experiments.

525 Computing Power and Sample Size Figure 7-1 The t-test Sample Size Results Viewed in the Report 8. Click Close to exit from t-test sample size computation. Estimated standard deviation of the changes in the underlying population. The sizes of the treatment effects are assumed to be normally distributed. To determine the sample size for a Paired t-test. you can reference an appropriate statistics reference. Desired power or sensitivity of the test. . see "Paired t-Test" in Chapter 6. For descriptions of computing the sample size for a t-test. Alpha ( α ) used to determine the sample size. Use Paired t-tests to see if there is a change in the same individuals before and after a single treatment or change in condition. For more information. you need to estimate the: Difference of the means you wish to detect. Determining the Minimum Sample Size for a Paired t-Test You can determine the sample size for a Paired t-test.

Alpha ( α ) is the acceptable probability of incorrectly concluding that there is an effect.. From the menus select: Statistics Sample Size Paired t-test The Paired t-test Sample Size dialog box appears.05). or just an estimate. or just an estimate. This indicates . Figure 1-1 The t-test Sample Size Results Viewed in the Report 2.e. a 95% confidence when α = 0. Traditionally. 5. Power is the probability that the paired ttest will detect an effect if there really is an effect.526 Chapter 10 To find the sample size for a Paired t-test: 1.80. Enter the desired alpha level. 3. This can be the size of the treatment effect you expect to see. which means that there is an 80% chance of detecting an effect with 1. the more sensitive the test. or test sensitivity. as determined from previous experiments. Enter the size of the change before and after the treatment in the Change to be Detected box.05. 4. Enter the desired power.α confidence (i. This can be the size you expect to see. as determined from previous experiments. The traditional α value used is 0. The closer the power is to 1. Enter the size of the standard deviation of the change in Expected Standard Deviation of Change. you want to achieve a power of 0.

The sample size calculation appears at the top of the dialog box.527 Computing Power and Sample Size that a one in twenty chance of error is acceptable. Smaller values of α result in stricter requirements before concluding there is a significant effect. or that you are willing to conclude there is a significant treatment difference when P < 0. If desired. 7. Click Close to exit from paired t-test sample size computation. Click Save to save the sample size computation settings and resulting sample size to the current report. Determining the Minimum Sample Size for a Proportions . Figure 7-1 The Paired t-test Sample Size Dialog Box 8. Click = to see the required sample size for a Paired t-test at the specified conditions.05. but a greater possibility of concluding there is no effect when one exists (a Type II error). you can change any of the settings and click = again to view the new sample size as many times as desired. you can reference an appropriate statistics reference. 6. but also increase the risk of reporting a false positive (a Type I error). For descriptions of computing the sample size for a paired t-test. Larger values of α make it easier to conclude that there is an effect.

. To determine the sample size for a proportion comparison. see "Comparing Proportions Using the z-Test" in Chapter 7.528 Chapter 10 Comparison You can determine the sample size for a z-test comparison of proportions. Desired power or sensitivity of the test. To find the sample size for a z-test proportion comparison: 1. you need to specify the: Proportion of each group that falls within the category. A comparison of proportions compares the difference in the proportion of two different groups that falls within a single category. From the menus select: Statistics Sample Size Proportions Figure 1-1 The Proportions Sample Size Dialog Box The Proportions Sample Size dialog box appears. Alpha ( α ) used to determine the sample size. For more information.

or test sensitivity.529 Computing Power and Sample Size 2. Enter the expected proportions that fall into the category for each group in the Group 1 and 2 Proportion boxes.05. but also increase the risk of reporting a false positive (a Type I error). The closer the power is to 1. This can be the distribution you expect to see. you want to achieve a power of 0. a 95% confidence when α = 0. This indicates that a one in twenty chance of error is acceptable. see "Setting z-test Options" in Chapter 7. or that you are willing to conclude there is a significant distribution difference when P < 0. 6. but a greater possibility of concluding there is no difference in distribution when one exists (a Type II error). The estimated sample size is the sample size for each group.80. as determined from previous experiments. you can change any of the settings and click = again to view the new sample size as many times as desired. Click Save to save the sample size computation settings and resulting sample size to the current report. Enter the desired power. the more sensitive the test.05. Smaller values of α result in stricter requirements before concluding there is a significant difference. 4. If desired. For more information. 3. The traditional α value used is 0. Enter the desired alpha level. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is an effect. Power is the probability that the proportion comparison will detect a difference if there really is a difference in proportion. 5.05). which means that there is an 80% chance of detecting an difference with 1.e.α confidence (i. Traditionally. Larger values of α make it easier to conclude that there is a difference. .. or just an estimate. Note: The Yates correction factor is used if this option was selected in the Options for z-test dialog box. Click = to see the required sample size for a proportion comparison at the specified conditions. The calculated sample size appears at the top of the dialog.

Estimated standard deviation of the underlying populations. see "One Way Analysis of Variance (ANOVA)" in Chapter 4. Determining the Minimum Sample Size for a One Way ANOVA You can determine the group sample size for a One Way ANOVA (analysis of variance). One Way ANOVAs are used to see if there is a difference among two or more samples taken from populations that are normally distributed with equal variances among the individuals. Alpha ( α ) used to determine the sample size. you can reference an appropriate statistics reference. . For more information. Click Close to exit from proportion comparison sample size computation. Desired power or sensitivity of the ANOVA.530 Chapter 10 Figure 6-1 The Proportions Sample Size Results Viewed in the Report 7. you need to specify the: Minimum difference in between group means to be detected. Number of groups. To determine the sample size for a One Way ANOVA. For descriptions of computing the sample size for a z-test.

as determined from previous experiments. the more sensitive the test. 4. Power is the probability that the ANOVA will detect a difference if there really is a difference among the groups. or just an estimate. The closer the power is to 1. 3. Enter the desired power. Note that one way ANOVA assumes that the standard deviations of the underlying normally distributed populations are equal. Figure 1-1 The ANOVA Sample Size Dialog Box 2. as determined from previous experiments. Traditionally. This can be the size you expect to see. or just an estimate. This can be the size of a difference you expect to see. Enter the size of standard deviation of the residuals. The minimum detectable difference is the minimum difference between the largest and smallest means.531 Computing Power and Sample Size To find the sample size for a One Way ANOVA: 1. Then enter the expected number of groups. Enter the size of the minimum expected difference of group means in the Minimum Detectable Difference box. From the menus select: Stastics SampleSize ANOVA The ANOVA Sample Size dialog box appears. or test sensitivity. you want to achieve a power .

Click = to see the required sample size for a One Way ANOVA at the specified conditions. but a greater possibility of concluding there is no difference when one exists (a Type II error). . Smaller values of α result in stricter requirements before concluding there is a significant difference. you can change any of the settings and click = again to view the new sample size as many times as desired. Select Save to save the sample size computation settings and resulting sample size to the current report.α confidence (i.80. or that you are willing to conclude there is a significant difference when P < 0. 7. Enter the desired alpha level. If desired.532 Chapter 10 of 0. The sample size calculation appears at the top of the dialog. 6.. This indicates that a one in twenty chance of error is acceptable. The sample size is the size of each group. which means that there is an 80% chance of detecting a difference with 1. you can reference an appropriate statistics reference. but also increase the risk of reporting a false positive (a Type I error).e. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is an effect. Larger values of α make it easier to conclude that there is a difference.05.05. The traditional α value used is 0. and then click Close. Figure 7-1 The ANOVA Sample Size Results Viewed in the Report For descriptions of computing the sample size for a One Way ANOVA. a 95% confidence when α = 0. 5.05).

A Chi-square test compares the difference between the expected and observed number of individuals of two or more different groups that fall within two or more categories. you need to enter a contingency table in the worksheet containing the estimated number of observations before you can compute the estimated proportions. For more information. The sample size for a chi-square analysis contingency table is determined by the estimated relative proportions in each category for each group.533 Computing Power and Sample Size Determining the Minimum Sample Size for a Chi-Square Test You can determine the sample size for a chi-square R2 analysis of a contingency table. Because SigmaStat uses numbers of observations to compute these estimated proportions. Enter a contingency table into the worksheet by placing the estimated number of observations for each table cell in a corresponding worksheet cell. To find the sample size for a Chi-square test: 1. Figure 1-1 Contingency Table Data Entered into the Worksheet The worksheet rows and columns correspond to the groups and categories. The number of observations must always be an integer. . see "Chi-square Analysis of Contingency Tables" in Chapter 7.

or vice versa. Select the columns of the contingency table from the worksheet as prompted. Click Finish when you have selected all three columns. Figure 4-1 The Chi-square Sample Size Dialog Box . The Chi-Square Sample Size dialog box appears.534 Chapter 10 Note that the order and location of the rows or columns corresponding to the groups and categories is unimportant. From the menus select: Statistics Sample Size Chi-Square The Pick Columns for Chi-Square Sample Size dialog box appears. 4. You can use the rows for category and the columns for group. 2. Figure 2-1 The Pick Columns for Chi-square Dialog Box 3.

but increase the possibility of concluding there is no effect when one exists (a Type II error). Enter the desired alpha level. Click = to see the required sample size for a Chi-Square test at the specified conditions. If desired. Smaller values of α result in stricter requirements before concluding there is a significant difference. you need to select Close. a 95% confidence when α = 0. which means that there is an 80% chance of detecting an difference with 1.α confidence (i. Power is the probability that the chi-square test will detect a difference in observed distribution if there really is a difference. edit the table.05. or test sensitivity. The sample size calculation appears at the top of the dialog. The traditional α value used is 0. Figure 8-1 The Chi-square Sample Size Computation Results Viewed in the Report . 7. However. Click Save to save the sample size computation settings and resulting sample size to the current report. you can change any of the settings and click = again to view the new sample size as many times as desired.80. The closer the power is to 1.535 Computing Power and Sample Size 5. but also increase the possibility of concluding there is an effect when none exists. This indicates that a one in twenty chance of error is acceptable. if you want to change the number of observations per category. 6. Larger values of α make it easier to conclude that there is a difference.e. then repeat the sample size computation.. Enter the desired power. the more sensitive the test. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is a difference. or that you are willing to conclude there is a significant difference when P < 0.05). 8. Traditionally. you want to achieve a power of 0.05.

Desired power or sensitivity of the test. For descriptions of computing the sample size required for a Chi-Square analysis of contingency tables. Click Close to exit from Chi-Square test sample size computation. . see "Pearson Product Moment Correlation" in Chapter 8. From the menus select: Statistics Sample Size Correlation The Correlation Sample Size dialog box appears. A correlation coefficient of 1 means that as one variable increases. the other increases exactly linearly. To determine the sample size necessary to detect a specified correlation coefficient. A correlation coefficient quantifies the strength of association between the values of two variables. Alpha ( α ) used to determine the sample size. the other decreases exactly linearly. To find the sample size required for a specific correlation coefficient: 1. For more information. A correlation coefficient of -1 means that as one variable increases. you can reference an appropriate statistics reference.536 Chapter 10 9. Determining the Minimum Sample Size to Detect a Specified Correlation You can determine the sample size necessary to detect a specified Pearson Product Moment Correlation Coefficient R. you need to specify the: Expected value of the correlation coefficient.

or that you are willing to conclude there is an association when P < 0. the more sensitive the test.05). Smaller values of α result in stricter requirements before concluding there is a true association. Alpha ( α ) is the acceptable probability of incorrectly concluding that there is an association. Traditionally. or just an estimate.e. Click = to see the required sample size of a correlation coefficient at the specified conditions. Enter the desired power. you can change any of the settings and click = again to view the new sample size as many times as desired. If desired. Larger values of α make it easier to conclude that there is an association.80. 3. The traditional α value used is 0. Power is the probability that the correlation coefficient quantifies an actual association.. you want to achieve a power of 0. The sample size calculation appears at the top of the dialog. This can be the correlation coefficient you expect to see. which means that there is an 80% chance of detecting an association with 1. The closer the power is to 1.05. or test sensitivity. but also increase the risk of reporting a false positive (a Type I error). 4.537 Computing Power and Sample Size Figure 1-1 The Correlation Sample Size Dialog Box 2. as determined from previous experiments.α confidence (i. Enter the expected correlation coefficient in the Correlation Coefficient box. but a greater possibility of concluding there is no relationship when one exists (a Type II error). a 95% confidence when α = 0. Enter the desired alpha level. This indicates that a one in twenty chance of error is acceptable.05. . 5.

Figure 6-1 The Correlation Coefficient Sample Size Results Viewed in the Report 7. you can reference an appropriate statistics reference.538 Chapter 10 6. For descriptions of computing the sample size required to detect a correlation coefficient. . Click Save to save the sample size computation settings and resulting sample size to the current report. Click Close to exit from correlation coefficient sample size computation.

539 . then click OK. and Multiple Logistic reports. rates and proportions tests. 2. Tip: The Create Result Graph button and Create Result Graph command are dimmed if no report is selected or if the selected report does not generate a graph. Select the report graph you want to create. Best Subset and Incremental Polynomial Regression. From the menus select: Graph Create Result Graph The Create Result Graph dialog box appears displaying the available graphs for the selected report. To generate a report graph: 1.Chapter 1 1 Generating Report Graphs You can generate graphs for all test reports except Two Way Repeated Measures ANOVA. or double-click the graph in the list.

540 Chapter 11 Figure 2-1 The Create Graph Dialog Box for a Report Graph If you are generating a 2D graph or a 3D graph for a Multiple Linear or a Polynomial Regression with more than two independent variables. The selected graph appears in a graph page window with the name of the page in the window title bar. The graph page is assigned to the test section of its associated report. Graph pages are named according to the type of graph created and are numbered incrementally. Select the desired variables. Figure 2-2 The Select Independent Variable Dialog Box 3. Bar Charts of the Column Means Bar charts to the column means are available for the following tests: . then click OK. a dialog box appears asking you to specify the independent variables to plot.

see "Describing Your Data with Basic Statistics" in Chapter 3. The t-test bar chart plots the group means as vertical bars with error bars indicating the standard deviation. If the graph data is in raw or statistical format. and the column titles are used as the X and Y axis titles. For more information. The Descriptive Statistics bar chart plots the group means as vertical bars with error bars indicating the standard deviation. see "Unpaired t-Test" in Chapter 4.541 Generating Report Graphs Descriptive Statistics. Figure 3-1 A Bar Chart of the Result Data for a t-test One Way ANOVA. If the graph data is indexed. For more information. the levels in the factor column are used as the tick marks for the bar chart bars. t-test. see “One Way Repeated Measures Analysis of Variance (ANOVA)” on page 200. Scatter Plot The scatter plot is available for the following tests: . For more information. the column titles are used as the tick marks for the bar chart bars and default X Data and Y Data axis titles are assigned to the graph.

One Way ANOVA. see “"Describing Your Data with Basic Statistics" in Chapter 3. see "Unpaired t-Test" in Chapter 4. . For more information. For more information.542 Chapter 11 Descriptive Statistics. the levels in the factor column are used as the tick marks for the scatter plot points. see "Describing Your Data with Basic Statistics" in Chapter 3. see “One Way Repeated Measures Analysis of Variance (ANOVA)” on page 200. t-test. Point Plot The point plot is available for the following tests: Descriptive Statistics.For more information. If the graph data is indexed. the column titles are used as the tick marks for the scatter plot points and default X Data and Y Data axis titles are assigned to the graph. For more information. and the column titles are used as the X and Y axis titles. Figure 3-2 The scatter plot graphs the group means as single points with error bars indicating the standard deviation. If the graph data is in raw or statistical format.

If the graph data is in raw or statistical format. see "Describing Your Data with Basic Statistics" in Chapter 3. the column titles are used as the tick marks for the plot points and default X Data and Y Data axis titles are assigned to the graph. see "Unpaired t-Test" in Chapter 4. the levels in the factor column are used as the tick marks for the plot points. For more information. ANOVA on Ranks. For more information. . Rank Sum Test.543 Generating Report Graphs t-test. If the graph data is indexed. and the column titles are used as the X and Y axis titles. Figure 3-3 A Point Plot of the Result Data for an ANOVA on Ranks Point Plot and Column Means The point and column means plot is only available for Descriptive Statistics. The point and column means plot graphs all values in each column as a point on the graph with error bars indicating the column means and standard deviations of each column.

544 Chapter 11 Figure 3-4 A Point and Column Means Plot of the Result Data for a Descriptive Statistics Test Box Plot The Rank Sum Test box plot graphs the percentiles and the median of column data. . the column titles are used as the tick marks for the box plot boxes. If the graph data is indexed. with a line at the median and error bars defining the 10th and 90th percentiles. The ends of the boxes define the 25th and 75th percentiles. and no axis titles are assigned to the graph. If the graph data is in raw format. the levels in the factor column are used as the tick marks for the box plot boxes. and the column titles are used as the axis titles.

and the horizontal lines running across the graph represent the standard deviations of the data. The scatter plots of the residuals plot the raw residuals of the independent variables as points relative to the standard deviations. Anova on Ranks. see "Describing Your Data with Basic Statistics" in Chapter 3. see "Prediction and Correlation" in Chapter 8. The X axis represents the independent variable values. For more information. see “Friedman Repeated Measures Analysis of Variance on Ranks” on page 239. For more information. For more information.545 Generating Report Graphs Figure 3-5 A Box Plot of the Result Data for the Rank Sum Test The box plot is available for the following tests: Descriptive Statistic. . the Y axis represents the residuals of the variables. Repeated Measures ANOVA on Ranks. Rank Sum Test. Scatter Plot of the Residuals The 2D scatter plot of the residuals is available for all of the regressions except the Multiple Logistic and the Incremental Polynomial Regressions.

see "Prediction and Correlation" in Chapter 8. They plot the standardized residuals of the data in the selected independent variable column as points relative to the standard deviations. .546 Chapter 11 Figure 3-6 Scatter Plot of the Simple Linear Regression Residuals with Standard Deviation Bar Chart of the Standardized Residuals Bar charts of the standardized residuals are available for all regressions except the Multiple Logistic and the Incremental Polynomial Regressions. For more information.

547 Generating Report Graphs Figure 3-7 A Multiple Linear Regression Bar Chart of the Standardized Residuals with Standard Deviations Using One Independent Variable Histogram of Residuals The histogram plots the raw residuals in a specified range. using a defined interval set. The residuals are divided into a number of evenly incremented histogram intervals and plotted as histogram bars indicating the number of residuals in each interval. The X axis represents the histogram intervals. and the Y axis represents the number of residuals in each group. .

Two Way Repeated Measures ANOVA. see "Unpaired t-Test" in Chapter 4. For more information. see "Three Way Analysis of Variance (ANOVA)" in Chapter 4. Three Way ANOVA. For more information. One Way Repeated Measures ANOVA.For more information. For more information. see "Two Way Analysis of Variance (ANOVA)" in Chapter 4. . Two Way ANOVA. see "One Way Analysis of Variance (ANOVA)" in Chapter 4. Linear Regression. see “Two Way Repeated Measures Analysis of Variance (ANOVA)” on page 218. For more information. For more information.548 Chapter 11 Figure 3-8 A Histogram of the Residuals for a t-Test The histogram of residuals graph is available for the following tests: t-test. Paired t-Test. For more information. see “One Way Repeated Measures Analysis of Variance (ANOVA)” on page 200. Multiple Linear Regression. see “Paired t-Test” on page 177. see "Simple Linear Regression" in Chapter 8. For more information. see "Multiple Linear Regression" in Chapter 8.For more information. One Way ANOVA.

see "Testing Normality" in Chapter 3. The residuals are sorted and then plotted as points around a curve representing the area of the GaussianSigmaPlot plotted on a probability axis. The X axis is a linear scale representing the residual values. Figure 3-9 Normal Probability Plot of the Residuals The normal probability plot is available for the following test reports: . For more information. Stepwise Regression. Nonlinear Regression. see "Stepwise Linear Regression" in Chapter 8. Normality Test. Plots with residuals that fall along Gaussian curve indicate that your data was taken from a normally distributed population. see "Polynomial Regression" in Chapter 8. For more information. The Y axis is a probability scale representing the cumulative frequency of the residuals.549 Generating Report Graphs Polynomial Regression. Normal Probability Plot The normal probability plot graphs the frequency of the raw residuals. For more information.

They plot the observations of the regressions as a line/scatter plot. For more information. Three Way ANOVA. see “One Way Repeated Measures Analysis of Variance (ANOVA)” on page 200. For more information. 2D Line/Scatter Plots of the Regressions with Prediction and Confidence Intervals The 2D line and scatter plots of the regressions are available for all of the regression reports. For more information. For more information. and the dashed lines represent the prediction and confidence intervals. One Way Repeated Measures ANOVA. The points represent the data dependent variables plotted against the independent variables. Polynomial Regression. For more information. Linear Regression. For more information. For more information. The X axis represents the independent variables and the Y axis represents the dependent variables. the solid line running through the points represents the regression line. see "Stepwise Linear Regression" in Chapter 8. Multiple Linear Regression.For more information. see “Paired t-Test” on page 177. see "Polynomial Regression" in Chapter 8.550 Chapter 11 t-test. see "Prediction and Correlation" in Chapter 8. Nonlinear Regression. One Way ANOVA. Two Way Repeated Measures ANOVA. For more information. Two Way ANOVA. . except Multiple Logistic and Incremental Polynomial Regressions.For more information. see "Unpaired t-Test" in Chapter 4. see "One Way Analysis of Variance (ANOVA)" in Chapter 4. For more information. see "Simple Linear Regression" in Chapter 8. see "Three Way Analysis of Variance (ANOVA)" in Chapter 4. For more information. For more information. see "Multiple Linear Regression" in Chapter 8. see "Two Way Analysis of Variance (ANOVA)" in Chapter 4. Stepwise Regression. see "Testing Normality" in Chapter 3. Normality Test. Paired t-Test. see “Two Way Repeated Measures Analysis of Variance (ANOVA)” on page 218.

) Multiple Linear Regression.551 Generating Report Graphs Figure 3-10 A Line/Scatter Plot of the Linear Regression Observations with a Regression and Confidence and Prediction Interval Lines 3D Residual Scatter Plot The 3D residual scatter plots are available for the following test reports: Two Way ANOVA Report Graphs. and the Z axis represents the residuals. The X and the Y axes represent the independent variables. . Two Way Repeated Measures ANOVA. For more information. (See Multiple Linear Regression Report Graphs) Stepwise Regression (see Stepwise Regression Report Graphs) They plot the residuals of the two selected columns of independent variable data. see “Two way repeated measures ANOVA report graphs” on page 238. see “Two Way ANOVA Report Graphs” on page 122. For more information.

The levels in the first factor column are used as the X axis tick marks. and the title of the first factor column and the data column are used as the X and the Y axis titles.552 Chapter 11 Figure 3-11 A Multiple Linear Regression 3D Residual Scatter Plot of the Two Selected Independent Variable Columns Grouped Bar Chart with Error Bars This graph is available for the Two Way ANOVA. . The first bar in the group represents the first level of the second factor column and the second bar in the group represents the second level in the second factor column. It plots the data means with error bars indicating the standard deviations for each level of the factor columns. see “Interpreting Two Way ANOVA ResultsInterpreting Two Way ANOVA Results” in Chapter 1Chapter 5. For more information.

The 3D Category Scatter plot graphs the two factors from the independent data columns along the X and Y axes against the data of the dependent variable column along the Z axis. . and the tick marks for the Z axis represent the data from the dependent variable column. The tick marks for the X and Y axes represent the two factors from the independent variable columns.553 Generating Report Graphs Figure 3-12 A Two Way ANOVA Grouped Bar Chart with Error Bars 3D Category Scatter Graph This graph is available for the Two Way ANOVA and the Two Way Repeated Measures ANOVA.

the lines represent the levels in the subject column. the data is used as the tick marks for the Y axis. If the graph plots raw data. . the column titles are used as the tick marks for the X axis and the data is used as the tick marks for the Y axis.554 Chapter 11 Figure 3-13 A Two Way ANOVA 3D Category Scatter Plot Before and After Line Plots The before and after line plot uses lines to plot a subject’s change after each treatment. If the graph plots indexed data. the lines represent the rows in the column. and the treatment and data column titles are used as the axis titles. the levels in the treatment column are used as the tick marks for the X axis.

For more information. . one graph appears. If there is one significant factor reported. One Way Repeated Measures ANOVA. see “Wilcoxon Signed Rank Test” on page 190. For more information. a graph for the factor does not appear. They plot significant differences between levels of a significant factor. For more information. There is one graph for every significant factor reported by the specified multiple comparison test. if there are two significant factors. Signed Rank Test. see “Paired t-Test” on page 177. Repeated Measures ANOVA on Ranks. Multiple Comparison Graphs The multiple comparison graphs are available for all ANOVA reports. and so on. see “One Way Repeated Measures Analysis of Variance (ANOVA)” on page 200. two graphs appear.For more information.555 Generating Report Graphs Figure 3-14 A Before and After Plot Displaying Data for a Paired t-Test The before and after line plot is available for the: Paired t-test. If a factor is not reported as significant. see “Friedman Repeated Measures Analysis of Variance on Ranks” on page 239.

The X data for the graphs in the third row of the matrix is taken from the second column of tested data. For example. and the Y data is taken from the second column of tested data. The first row of the matrix represents the first set of variables or the first column of data. and the Y data is taken from the first column of tested data. The . the X data for the graphs in the first row of the matrix is taken from the second column of tested data.556 Chapter 11 Figure 3-15 A Multiple Comparison Graph Scatter Matrix The matrix of scatter graphs is available for all the Pearson and the Spearman Correlation reports. The X data for the graphs in the second row of the matrix is taken from the first column of tested data. The X and Y data for the graphs correspond to the column and row of the graph in the matrix. The matrix is a series of scatter graphs that plot the associations between all possible combinations of variables. the second row of the matrix represents the second set of variables or the second data column. and the third row of the matrix represents the third set of variables or third data column. and the Y data is taken from the third column of tested data.

determine how the data is affected by that factor and its interaction with other factors. Profile plots are useful for when you want to compare the least square means. or effects. The least square means have the same scale as the data and are positioned relative to the data axis for each factor level on the horizontal axis.557 Generating Report Graphs number of graph rows in the matrix is equal to the number of data columns being tested. Profile plots provide a quick qualitative assessment of the various treatment effects so that the investigator can determine the impact of . Figure 3-16 A Scatter Matrix for a Pearson Correlation Profile Plots A profile plot is a line plot where the horizontal axis represents the levels of one factor and the vertical axes represents the experiment’s data. Differences in the means. among the levels of a specified factor. when computed over a range of levels of the remaining factors. in a multifactor ANOVA model. also called estimated marginal means.

This is how the main effects are computed in Two-Way ANOVA and the two-way interaction effects are computed in Three-Way ANOVA. The hypothesis testing in ANOVA reports quantifies these effects to determine if any of the differences are statistically significant. Three Way Analysis of Variance (ANOVA) Profile Plots . Generally. the main effects for a given factor in a Three-Way ANOVA are determined by averaging the cell means over all levels of the remaining two factors while fixing each level of the given factor.2Way Effects graphs are available for the following tests: Two Way Analysis of Variance (ANOVA). A cell is defined as the collection of observations made for a particular combination of levels. The cells means determine the two-way interaction effects in a TwoWay ANOVA and the three-way interaction effects in a Three-Way ANOVA. Three Way Analysis of Variance (ANOVA) Profile Plots . see "Describing Your Data with Basic Statistics" in Chapter 3. Profile Plots . the cell means are obtained as the predicted values in a regression model that is associated with the ANOVA model. For more information. If the cell means are averaged over all levels of one factor while fixing the levels of the remaining factors. Finally. where one level is selected from each factor. you obtain lower-order effects. the least square means are first computed for the individual cells.558 Chapter 11 each factor on the data.Main Effects Profile Plots .3Way Effects graphs are available for the following test: Three Way Analysis of Variance (ANOVA) . see "Describing Your Data with Basic Statistics" in Chapter 3. In ANOVA analysis.Main Effects graphs are available for the following tests: Two Way Analysis of Variance (ANOVA). For more information.2Way Effects Profile Plots .3Way Effects Profile Plots .

8 compare many groups procedure ANOVA on ranks. 5 advisor. 28 calculating. 508 calculating power: determining test to use. 5 advisor. 9 t-test. 32 one way ANOVA. 24 normality test. 7 starting. 47 sample size. 41 chi-Square test calculating power/sample size. 41 compare groups procedures determining test to use. 26 Coefficient of determination best subset regression results. 48 chi-square test when to use. 4 determining sensitivity. 38 comparing groups choosing group comparison. 8. 11 data format. 31. 9 ANOVA on ranks when to use. 9. 8 repeated observations. 32 compare two groups procedure when to use. 11 defining your goals. 35 best subset regression when to use. 40 bar charts descriptive statistics results. 32 arranging data descriptive statistics. 31. 31 same group before and after multiple treatments.559 Index Adjusted R2 best subset regression results. 36 . 11 calculating sample size. 29 many. 8. 27 N statistic. 5 independent variables. 431 Advisor Wizard calculating power. 9. 40 box plots descriptive statistics results. 28 before & after procedures paired t-test. 339 coefficients correlation. 15. 11 determining test to use. 8 calculating sample size. 45 backward stepwise regression when to use. 5. 35 signed rank test. 27 power. 3 alpha value in power. 3 viewing. 22 choosing column data descriptive statistics. 8. 11 determining test to use. 38 choosing appropriate procedure. 14 measuring data. 38 cCompare many groups procedure when to use. 3 using. 431 Coefficients standardized. 48 ANOVA. 30 comparing categories. 47 calculating power. 32 two way ANOVA. 31 cCorrelation procedures Spearman Rank Order. 9 categories comparing. 6 number of treatments.

11 continuous scale measuring data. 11 correlation coefficients calculating power/sample size. 45 data: Fisher exact test when to use. 30 computing. 41 creating descriptive statistics report graph. 47 curve fitting through data. 24 data format. 40 functions nonlinear. 15 data arranging. 13 polynomial. 14 fitting curve through. 27 setting options. 5. 13 indexing for a Two-Way ANOVA. 26 results. 27 viewing. 48 correlation procedures Pearson Product Moment. 25 graphing data. 15. 13 observing. 28 contingency table data format. 28 group comparison test . 13 descriptive statistics arranging data for. 6 correlation. 28 interpreting results. 28 descriptive statistics results: scatter plot. 27 picking column data. 25 descriptive statistics results. 39 fitting curve through data. 24 confidence interval. 5 descriptive statistics results bar chart.560 Index same group before and after one treatment. 35 two groups. 5 plotting residuals. 14 goals defining. 14 removing independent variables. 28 box plot. 13 forward stepwise regression when to use. 28 for the mean. 28 equations adding independent variables. 45 observed proportions. 47 data format contingency table. 11 determining. 11 normality test. 24. 12 graphs descriptive statistics. 45 describing. 12 raw data. 5. 5 correlation coefficient calculating power. 4 predicting. 5. 7 dependent variables predicting. 29 normality test report graph. 28 point and column means plot. 28 point plot. 15 nonlinear. 8 confidence interval descriptive statistics. 100 measuring. 508 conditions number of. 27 calculating.

42 when to use. 30 number of. 41 nonparametric tests signed rank test. 27 multiple comparison options setting. 8 histogram of residuals normality test results. 28 normality test results. 27 mean descriptive statistic results. 29 groups comparing many. 42 . 27 measuring data. 100 interpreting results descriptive statistics. 29 when to use. 39 when to use. 6 nominal/ordinal scale. 39 removing from equations. 40 non-normal populations testing. 13. 46 kurtosis descriptive statistics results. 6 median descriptive statistics results. 14 indexing data for a Two-Way ANOVA. 42 normality test data format. 33. 5 continuous scale. 39 N statistic descriptive statistic results. 40 Multiple linear regression results standardized coefficient (beta). 28 linear regression predicting variables. 47 independent variables adding to equations. 339 multiple logistic regression when to use. 35 normality procedure normality test. 46 normality procedure. 15 selecting. 8 group comparison tests choosing appropriate. 39 maximum value descriptive statistics results. 6 nonlinear equation describing data. 14 fitting curve through data. 40 K-S distance descriptive statistics results. 12. 45 descriptive statistics results. 28 interpreting results. 15 specifying. 14 nonlinear regression when to use. 31 comparing two. 27 minimum value descriptive statistics results. 5. 27 missing values descriptive statistic results.561 Index which to use. 35. 14 when to use. 31. 15 predicting dependent variables. 27 McNemar’s test when to use. 27 nominal (category) scale measuring data. 38 multiple linear regression when to use.

35 Pearson Product Moment Correlation when to use. 167 one way ANOVA calculating power/sample size. 33. 47 P value. 37 one way repeated measures ANOVA when to use. 46 creating graphs. 32. 6 P value normality test results. 7 repeated. 27 performing normality test. 39 probability plots normality test results. 9. 32. 11. 17 power. 5 variables/trends. 12 variables and trends. 41 Pearson Product Moment correlation: when to use. 45 setting P-value. 42 picking data columns. 508 when to use. 31 compare two groups. 5. 35. 6 observations data. 36. 507 repeating. 9. 41 percentiles descriptive statistics results. 9. 7 one sample t-test.562 Index performing. 47 normally distributed populations testing. 47 procedures choosing appropriate. 13. 12. 40 power alpha value. 13 polynomial regression when to use. 43 . 42 power/sample size procedures. 41 numeric values measuring data. 508 t-test. 43 when to use. 36 options descriptive statistics. 48 when to use. 46 paired t-test when to use. 6 P-value normality test. 47 K-S distance. 14. 30 multiple comparison. 33 normality. 507 sample size. 28 point plots descriptive statistics results. 47 calculating. 46 report graphs. 47 histogram of residuals. 46 normal probability plot of residuals. 35 parametric tests paired t-test. 507 proportions measuring data by. 507 procedure. 31. 48 predicting goals. 24 multiple comparison. 30. 17 point and column means plots descriptive statistics results. 47. 42 performing. 47 performing procedure. 47. 38 ordinal (rank) scale measuring data. 21 sample size. 28 polynomial curve fitting through data. 42 normality test results. 45 running. 22 compare many groups.

27 standard error descriptive statistic results. 38 signed rank test when to use. 8 rank. 28 selecting data columns descriptive statistics. 339 Standardized coefficients beta. 508 performing procedure. 14 polynomial. 46 running descriptive test. 48 defined. 48 calculating for correlation coefficients. 33. 6 nominal (category). 507 results descriptive statistics. 36 repeated observations. 26 normality test. 24 multiple comparison options. 45 procedures. 507 when to use. 27 Standardized coefficient (beta) multiple linear regression. 8 skewness descriptive statistics results. 45 regression best subset. 48 scale continuous. 28 Spearman Rank Order Correlation when to use. 48 calculating for z-tests. 47.563 Index ranges descriptive statistics results. 15 repeated measures ANOVA on ranks when to use. 47 settings descriptive statistics options. 48 calculating for unpaired t-tests. 6 scatter plots: descriptive statistics results. 17 sample size alpha value. 39 forward stepwise. 47 probability plots. 11. 47 rests measuring sensitivity. 48 calculating. 507 sample size. 12 nonlinear. 35 signed rank test: when to use. 47 probability plots. 339 statistics descriptive. 15 linear. 9. 26 normality test. 47 calculating for Chi-Square test. 507 . 14 stepwise. 48 calculating for one way ANOVA. 5. 7 repeating procedures. 6 raw data in normality tests. 15 defined. ordinal scale measuring data. 45 sensitivity alpha value. 6 ordinal (rank). 21 report graphs normality test results. 27 rank sum test: when to use. 27 normality test. 41 standard deviation descriptive statistics results. 5 Statistics menu power. 47 plotting residuals. 47 residuals defined. 9. 39 plotting.

27 sum of squares descriptive statistics results. 31. 37 Two-Way ANOVA indexing data. 508 two way ANOVA when to use. 41 normally distributed populations. 29 measuring effect. 41 values alpha. 41 when to use. 15. 39 quantifying strength of association. 47 variables measuring strength. 8 trends predicting. 48 . 40 sum descriptive statistics results. 5. 21 signed rank. 40 when to use. 32 treatments number of. 5 t-tests paired. 4 predicting. 35. 36. 30.564 Index statistics menu compare many groups. 42 rank sum. 36 two way RM ANOVA: when to use. 31 stepwise regression backward. 508 test goals defining. 41 selecting independent. 14 viewing descriptive statistics. 27 survival analysis: when to use. 32. 35 z-test calculating power/sample size. 5 normality. 12. 4 group comparison. 5 tests choosing appropriate. 32 two way repeated measures ANOVA Wilcoxon signed rank test signed rank test. 8 three way ANOVA when to use. 13 predicting. 5. 10. 8 repeating. 10. 15 specifying independent. 48 power. 40 forward. 35 power. 22 defining goals. 31. 100 unpaired t-test calculating power/sample size. 35 testing non-normal populations. 12 Testing non-normal populations.