Sie sind auf Seite 1von 6

CBGSM&EScience

StudentResearch

HowToRunStatisticalTestsinExcel
Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting simple mathematicaloperationsonyournumbers.ItcanalsorunthefivebasicStatisticalTests. Itdoeshave some limitations,however,and forcertaintests you may havetoturntoa morepowerfulstatisticalprogramlikeSPlusorMinitab. NOTE: ThestatisticaltestsareundertheToolsmenuDataAnalysis If youdonotseeDataAnalysisanywhere,youwillhavetoaddin the Analysis ToolPak, as follows: Tools AddIns Analysis ToolPak.IfatRCC,yourcomputershouldfinditviathenetwork. Ifathome,itwillprobablyaskforyourMicrosoftOfficeCD.

Alert!TheexampleDataSetsgivenbelowwerefabricatedtofitthe exampleexperimentsdescribedinStatisticalTestingForDummies
DataOrganizationandDescriptiveStats Initially youll want to organize your raw data by treatment groups, each in its own column, as shown below. Later, however, for certain tests youll have to stack the columns(e.g.,forRegressionandTwoWay ANOVA).This is easytodoinExcel by copyingandpastingcells.
Untrimmed Trimmed HighMarsh MidMarsh LowMarsh HighMarsh MidMarsh LowMarsh 12 9 7 6 7 6 15 8 12 2 3 1 7 16 15 7 5 8 3 5 4 8 9 6 11 13 10 4 4 3 5 9.6 4.7 2.1 5 10.2 4.3 1.9 5 9.6 4.3 1.9 5 5.4 2.4 1.1 5 5.6 2.4 1.1 5 4.8 2.8 1.2

Raw Data

N Mean StdDev S.E.

After organizing your raw data this way, youll want to calculate Descriptive Statistics foreachcolumn.Excelhasareadymadefunctionforeachoftheseexceptthelast.Use COUNT for Sample Size (N), AVERAGE for the Mean, and STDEV for the Standard Deviation. The final stat is the Standard Error in the Mean, which you calculatesimplyasthestandarddeviationdividedbythesquareroot(SQRTinExcel) ofthesamplesize: S.E.=StdDev/ N Thisisanimportantstat,asitsprobablywhatyoullusefor ErrorBarsonyour graphs! Hey!Dontforgetthelittleblackboxtrick!Onceyoupluginallthestatformulas underthefirstdatacolumn,youcansimplyhighlightthosecells,grabthelittleblack boxinthelowerrightcorner,anddragtotheright.Itcarriestheformulasacross!
1

CBGSM&EScience

StudentResearch

Standardttest 1. Runningthistestiseasy.Excelwantsyourdataintwocolumns,one foreachgrouportreatmentlevel.Giveeachcolumnaheading.See exampletotheright. 2. Under the Tools menu select Data Analysis and choose tTest: TwoSampleAssumingEqualVariances.OK.

3. Excelasksyoutospecifytherangeofcellscontainingthedata.Click the first red, white, & blue icon, then highlight your first column of cells,includingitsheading. Enter.Nowclickthesecondred,white, &blueicon,andhighlightyoursecondcolumn,includingtheheading.Enter.

Control Experimental 12 18 9 24 14 15 20 19 17 19 11 13 10 22 14 20

4. Check the Labels box, so Excel knows you included headings atop each column. OK. 5. Excel whips out an Output table. You can quickly resize the columns by double clicking up top between the A & B, between the B & C, and between the C & D. Thereslotsofinfohere,butallyourereallyafterarethosePvalues.Usethetwo tailedpvalueifyouroriginalhypothesispredictedthatthemeanswouldmerelybe different().Usually,however,youwillhavespecificallypredictedonemeanhigher thantheother(<or>).Inthatcase(andifinfactthemeansmatchyourpredictionof greaterthanorlessthan),gowiththesmalleronetailedpvalue. Pairedttest 1. Youcanusethepowerfulpairedttestif(andonlyif)yourstudyemployedapaired designinwhichapairofdatawerecollectedinparallelfrom eachindividual,mirror image style such as leftversusright or beforeversusafter. Here again, Excel wants yourdata intwocolumns,one foreachtreatment level.Giveeachcolumna heading. 2. Under the Tools menu select Data Analysis and choose tTest: Paired Two SampleforMeans.OK. 3. Excel asks youtospecifytherangeofcellscontainingthedata.Clickthe firstred, white, & blue icon, then highlight your first column of cells, including its heading. Enter. Now click the second red, white, & blue icon, and highlight your second column,includingtheheading.Enter. 4. Check the Labels box, so Excel knows you included headings atop each column. OK. 5. ExcelwhipsoutanOutputtable.Youcanquicklyresizethecolumns by doubleclicking up top between the A & B, between the B & C, andbetweentheC&D.Thereslotsofinfohere,butallyourereally afterarethosePvalues.Usethetwotailedpvalue if youroriginal hypothesis predicted that the means would merely be different (). Usually, however, you will have specifically predicted one mean higherthantheother(<or>).Inthatcase(andif in factthe means matchyourpredictionofgreaterthanorlessthan),gowiththesmaller onetailedpvalue.
Portside 537 241 77 427 220 96 625 395 Starboard 570 234 84 411 282 92 700 450

CBGSM&EScience

StudentResearch

OneWayANOVA(SingleFactorANOVA) 1. Here,too,Excelwants yourdatain sidebyside columns, oneforeachgrouportreatmentlevel.Giveeachcolumna heading. 2. UndertheToolsmenuselectDataAnalysisandchoose ANOVA:SingleFactor.OK. 3. Excel asks you forasinglerangeofcellscontaining ALL thedata.Clickthered,white,&blueicon,thenhighlight all three (or more) columns of cells, including their headings. Enter. 4. Check the Labels box, so Excel knows you included headingsatopeachcolumn.OK.

Red 5.1 4.9 5.3 4.4 5.5 5.6 3.9 4.2 4.7 5.6

Yellow 2.9 3.4 3.7 2.7 2.5 3.4 2.1 2.3 4.1 2.1

Blue 5.4 5.9 6.2 5.2 5 5.9 4.6 4.8 6.6 4.6

5. Excel whips out an Output table. You can quickly resize the columns by double clickinguptopbetweentheA&B,betweentheB&C,etc.Thereslotsofinfohere, but all youre really after is that Between Groups pvalue. All data is naturally variable or noisy. The ANOVA test attempts to detect a signal of genuine difference amidst all that noise. More precisely, it partitions the natural variance within thegroups(thenoise)fromthevariancebetween thegroups(thesignal).Ifthe differences between the groups are substantially greater than the differences within thegroups,thenwesaythattheresastrongsignaltonoiseratio.Andthestronger thesignaltonoiseratio,thelowerthepvalue! Important Note! All an ANOVA test can tell you is whether there are statistically significantdifferencessomewhereinthedataasawhole.Butitcannottellyoujust where those differences lie. For example, run an ANOVA on the data above, and youll get a very low pvalue. This means that the independent variable (color of light) does affect the response variable (phytoplankton growth). But it doesnt tell youwhichcolorsaffectgrowthdifferentlyfrom whichothercolors.Youcanplainly seethattheyellowmeanisdifferentfromtheredandbluemeans,thusgivingusour low pvalue. But are the red and blue means different from each other (at 95%+ confidence)???TheANOVAitselfcanonlytellyouthatatleastonegroupinthereis differentfromsomeothergroupintherebutnotwhichones.ThereforeIF(and only if) yourBetweenGroupspvalue falls below0.05,then youwillwanttoruna second test called a Multiple Comparisons test (like Tukeys test) in order to pinpoint just where the real differences lie. Unfortunately, this is something that Excelcantdoforyou,soyouwillhavetoturntosomeotherprogramsuchasSPlus orMinitab.Consultteacherforhelp.

CBGSM&EScience

StudentResearch

LinearRegression Depth(X) Fish(Y) 1. Torunaregression,youfirstneedtostackyourdataasshowntothe 1 43 right.Independentvariablegoesontheleftresponsevariableonthe 1 55 right.Thisprobably isnttheway youoriginallyarranged yourdata, 1 58 butitseasytostackitbycopyingandpasting.Ineffect,yoursetting 1 79 yourdataupinorderedpairs(X,Y). 1 53 2. Under the Tools menu select Data Analysis and choose Regression.OK. 3. Excelasksyouforatworangesofcells,onecontainingtheYvalues (i.e.,yourresponsevariable),andonecontainingtheXvalues(i.e., yourindependentvariable).Clickeachred,white,&blueicon,then highlight the appropriate columns of cells, including their headings. Enter. 4. Check the Labels box, so Excel knows you included headings atop each column. Also, check the Line Fit Plots togenerate a graph of yourdataandabestfitline.OK. 5. Youcanquicklyresizethecolumnsbydoubleclickinguptopbetween the A & B, between B & C, etc. Theres lots of info here, but only fourpiecesofinteresttoyou: o The slope coefficient (identified by the response variable in this case Depth) and the intercept coefficient. These respectively correspondtotheslope(m)andtheyintercept(b)ofyourbestfit line,andyoucanplugthemintoy=mx+btogettheequationof thatline.
1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 49 60 33 44 39 41 50 38 34 19 29 24 31 18 16 5 25 17 19 0 2 7 4 0 5

o The pvalue for the slope (not the pvalue for the yintercept, which you usually dont care about). If p < .05, then you can reject the null hypothesis that the independent variable has no effect on the response variable. After all, a positive or negative slopeiswhatyouwereafter,andthesteeperitis,thestrongertherelationship.

o TheRSquarevalue.Thisisanumberrangingfrom0to1,andisameasureof howtightlyyourdatapointsfitthebestfitline.AnRsquareof1.0isaperfectfit, with every point falling right on the line, and zero means theres absolutely no patternorfitwhatsoever.Intheexamplehere,theregressionreturnsanRsquare of 0.86, or 86%. A scientist would say that the independent variable (depth) explains86%ofthevariationintheresponsevariable(fish). 6. Excel also gave you a graph of the data and the best fit line, but its probably all scrunchedtogether.Grabacorneranddragtomake itbigger.Towiden yourplot evenmore,goaheadanddeletethelegend(clickit,thenhitdelete).Finally,double click one of the best fit points (probably pink), then give it a solid line under the Patternstab.Howsitlook?

CBGSM&EScience

StudentResearch

Note: The data above come from a replicated experiment where fish were repeatedly sampled at a handful of depths at fixed, regular intervals. Regression also works fine when the treatments are spaced at irregular intervals. For example, the study mighthaveinsteaduseddepthsof1,2,4,& 7. And you can also use a regression to analyze data from a nonreplicated study. Supposeyoureinterestedinwhetherfiddler crabsavoidtheedgesofamarshduetothe threat of predation. You count the number of burrows per square meter at randomly chosen distances from the waterline. You can nowrunaregressiontosee iftheresa statistically significant correlation here. Once again, just stack your data in XY pairs,asinthetabletotheright.

DisttoEdge(X) CrabBurrows(Y) 4.7 3 18.6 6 17.9 7 7.7 4 18.7 6 21.7 7 11 4 4.7 0 22.3 6 4.5 2 12.2 5 20.5 6 0.2 1 12.6 4 1.1 3 1.7 1 24.4 8

UntrimmedTrimmed TwoWayANOVA 6 1. To run a TwoWay ANOVA, you first need toorganize HighMarsh 12 15 2 your data as shown to the right, with one independent 7 7 variables treatments across the top, and the other IVs 3 8 treatmentsstackedatoponeanother.(Note:Thenumbers 11 4 are staggered horizontally within the cells here some 9 7 left, some centered, some right for visual purposes MidMarsh 8 3 onlythisisnotsomethingyouhavetodoinordertorun 16 5 thetest)

2. Under the Tools menu select Data Analysis and chooseANOVA:TwoFactorWithReplication.OK. LowMarsh 3. Excel asks you for a single ranges of cells containing your data. Click the red, white, & blue icon, then highlight ALL the cells containing your data, including thelabelsandheadings. Enter.

5 13

9 4

7 12 15 4 10

6 1 8 6 3

4. In theRowspersamplebox,enteryoursamplesizepergroup.Intheexamplehere, N = 5. Note: to run a 2way ANOVA in Excel, you must have balanced data, meaningthatverygrouphasthesamenumberofnumbers(noNAs).Ifyourdatais unbalanced,consultyourteacher. 5. OK.Excelkicksoutlotsofinfo.Whatyouremainlyafterarethepvaluesdownat thebottom.Therearethreeofthem.TheSamplepvaluetellsyouwhetherornot therearestatisticallysignificantdifferencesbetweenlevelsoftheyourfirstIVthe one you have organized horizontally by rows in this case, High vs. Mid vs. Low

CBGSM&EScience

StudentResearch

marsh. The Columns pvalue tells you whether or not there are statistically significant differences between levels of the your second IV the one you have organized vertically incolumnsinthiscase,Untrimmed vs.Trimmedgrass.The Interactionpvaluetellsyouwhethertherewasastatisticallysignificantinteraction betweenthetwoIVs.Thisisoneofthegreatthingsabouta2wayANOVA:itnot onlycananalyzetheinfluenceofeachIVontheRV,butalsocansniffoutinteractive effects between those two IVs. For example, does trimming the grass affect the snailsmoreathighelevationthanitdoesatlowelevation?Doeselevationhavemore ofaneffectonsnaildensityfortrimmedgrassthanforuntrimmedgrass?Theability to detect an Interaction is one of the most powerful advantages of a 2way experimentaldesign. Important Note! All an ANOVA test can tell you is whether there are statistically significantdifferencessomewhere inthedata.Butitcanttell youjust wherethose differenceslie. IF(andonlyif)eitheryourSampleorColumnspvaluefallsbelow 0.05,andifyouhave3ormoretreatmentlevelsundereachIV,thenyouwillwantto runasecondtestcalledaMultipleComparisons testinordertopinpoint justwhere therealdifferenceslie.Unfortunately,thisissomethingthatExcelcantdoforyou, soyouwillhavetoturntosomeotherprogramsuchasSPlus.Consultteacherfor help.