Sie sind auf Seite 1von 17

Statistics for Ecologists

Using R and Excel

Data collection, exploration,


analysis and presentation

Mark Gardener

DATA IN THE WILD SERIES

Pelagic Publishing | www.pelagicpublishing.com


Published by Pelagic Publishing
www.pelagicpublishing.com
PO Box 725, Exeter, EX1 9QU

Statistics for Ecologists Using R and Excel


Data collection, exploration, analysis and presentation

ISBN 978-1-907807-12-1 (Pbk)


ISBN 978-1-907807-13-8 (Hbk)

Copyright 2012 Mark Gardener

All rights reserved. No part of this document may be produced, stored in a retrieval sys-

recording or otherwise without prior permission from the publisher.

of the information presented, the information contained in this book is sold without
warranty, either express or implied. Neither the author, nor Pelagic Publishing, its agents
and distributors will be held liable for any damage or loss caused or alleged to be caused
directly or indirectly by this book.

more information visit www.apple.com.

British Library Cataloguing in Publication Data


A catalogue record for this book is available from the British Library.

Cover image istockphoto.com/dulezidar


About the author
Mark began his career as an optician but returned to science and trained as an ecologist.
His research is in the area of pollination ecology. He has worked extensively in the UK as
well as Australia and the United States. Currently he works as an associate lecturer for the
Open University and also runs courses in data analysis for ecology and environmental
science.

Acknowledgements
I am especially grateful to Nigel Massen at Pelagic Publishing for his help and persever-
ance throughout the production of this book.

Thanks go to Anne Goodenough for patiently and thoroughly reviewing the manuscript,
your comments and views were most helpful.

With a book of this nature data examples are always useful. Some of the data illustrated

for allowing me to use these data as examples.

process.

Software used
spreadsheet were used in the preparation of this

although other versions may also be illustrated (including Excel X for Apple Macintosh).

Several versions of the R program were used and illustrated including 2.8.1. for Windows
Downloading free code examples
statistics can be found at:

Reader feedback
We welcome feedback from readers please email us at info@pelagicpublishing.com and

of your email.

Publish with Pelagic Publishing

with a particular focus on ecology, conservation and environment. Pelagic Publishing pro-
duces books that set new benchmarks, share advances in research methods and encourage
and inform wildlife investigation for all.

If you are interested in publishing with Pelagic please contact editor@pelagicpublishing.

statement describing the impact you would like your book to have on readers.
Contents

Introduction viii

1. Planning 1

2. Data recording 23

3. Beginning data exploration using software tools 29

4. Exploring data looking atnumbers 57


vi | Contents

5. Exploring data which test isright? 91

6. Exploring data using graphs 95

7. Tests for differences 103


t-test 103
U-test 112

8. Tests for linking data correlations 123

9. Tests for linking data associations 147

10. Differences between more than two samples 161

11. Tests for linking several factors 195

12. Reporting results 239


Contents | vii

13. Summary 315


Glossary 317
Index 322
Introduction

here, this book is about the processes involved in looking at data. These processes involve
planning what you want to do, writing down what you found and writing up what your
analyses showed. The statistics part is also in there of course but this is not a course
in statistics. By the end I hope that you will have learnt some statistics but in a practical
way, i.e. what statistics can do for you

as well) and a computer program called R. The spreadsheet will allow you to collect your
data in a sensible layout and also do some basic analyses (as well as a few less basic ones).
The R program will do much of the detailed statistical work (although we will also use

get the job done.

and may be summarised by four main headings:


Planning
Data recording
Data exploration
Reporting results
The book is arranged into these four broad categories. The sections are rather uneven in
size and tend to focus on the analysis. The section on reporting also covers presentation
of analyses (e.g. graphs).

Although the emphasis is on ecological work and many of the data examples are of that
sort, I hope that other scientists and students of other disciplines will see relevance to what
they do.

Mark Gardener 2011


6. Exploring data using graphs
Graphs are useful for several reasons. They can help us to visualise the data and decide
-

tackle the data, and secondly to present results. We will look at details of graphs and how
to produce them in Excel and R in Section 12.4 where we examine ways to present

analytical methods to examine our data. Indeed we have already seen some examples in
Chapter 4. In this short chapter we will summarise the graphs we might use to help us
explore our data.

6.1 Exploratory graphs


One of the most common analysis of sample of data is to determine if they are normally

data. There are several ways we can illustrate the distribution of a data sample. We may
use a simple tally plot or a stemleaf plot; we can even do this right from our notebook in

1 | 679
2 | 112334
2 | 5666678899
3 | 01124
3 | 6

In this example, the data are sorted in numerical order in each row but we can still gain
insights into the data distribution if the numbers are not sorted.

1 | 967
2 | 143123
2 | 9568667869
3 | 40121
3 | 6

A simpler version of a stemleaf plot is the tally plot, and in this case we enter the data as
a simple tally mark. In Table 28, we see a tally plot of the same data as our stemleaf plot.
96 | Statistics for Ecologists Using R and Excel

Table 28. A tally plot to show data distribution

Tally Bin
x 16
x 18
x 20
xxx 22
xxx 24
xxxxx 26
xxx 28
xxx 30
xxx 32
x 34
x 36

These are simple plots but nevertheless can be extremely helpful. When we return from

Figure 78. A histogram to illustrate the distribution of a data sample

dataset that lie within each size class, represented on the x-axis. We may decide to use a
6. Exploring data using graphs | 97

Figure 79. A density plot to illustrate the distribution of a data sample


Some types of graph are useful because they show a lot of information in a compact man-

Figure 80. A boxwhisker plot can be used to illustrate data distribution as well as provid-
ing other information, e.g. median, inter-quartiles and max/min

symmetrical about the median stripe. We can use the boxwhisker plot to look at several
98 | Statistics for Ecologists Using R and Excel

Another way we can visualise our data is by using a line graph to show the running average
(mean or median). We met this earlier in Section 4.7 where we used the idea to help deter-

Figure 81. A line graph illustrating the running mean

decision.

6.2 Graphs to illustrate differences

illustrate the situation using bar charts or boxwhisker plots. We met the boxwhisker plot

Figure 82. A boxwhisker plot illustrating differences between three samples


6. Exploring data using graphs | 99

gain some insight into the distribution. A common alternative to the boxwhisker plot is

within each sample.

Figure 83. A bar chart illustrating differences between three samples

6.3 Graphs to illustrate links


When we think of ways to link data together there are two main approaches. In one
approach, we have two sets of values, both are numeric and one represents a dependent
variable and the other an independent variable. We are looking for a correlation. In the
other kind of approach, we have categories of items and we are looking to associate one
set of categories with the other.

6.3.1 Graphs to illustrate correlations

speed of the water in which it lives.


100 | Statistics for Ecologists Using R and Excel

Figure 84. A scatter plot illustrating a correlation


In this case, it appears as though as the water speed increases so does the abundance of the

Figure 85. Multiple scatter plots showing one dependent variable plotted against several
independent variables

than the others; one shows a positive correlation and the other a negative one (although at
6. Exploring data using graphs | 101

6.3.2 Graphs to illustrate associations


When we have categorical variables, we have various choices. We can display the data for

pie charts to be produced (one for each row or column category, depending on how we
want to look at the data). The pie chart shows the data proportionally, each slice of pie
shows the contribution as a proportion of the total.

Figure 86. A pie chart illustrating categorical data. The proportions of common bird species
in a garden habitat

When we have this kind of data we can always represent it in the form of a bar chart
instead. The advantage of the bar chart is that we can show several categories at one time

Figure 87. A bar chart illustrating categorical data. The number of common garden birds in
various habitats
102 | Statistics for Ecologists Using R and Excel

included a legend on the graph so the reader can identify the various bars more easily.

6.4 Graphs a summary

and make important decisions about the analytical approach (Table 29). We should also
use graphs to illustrate our data, which can make them more comprehensible to readers.
When we present graphs we should ensure they are fully labelled and as clear as possible.
Even when we use graphs for our own use it is good practice to label and title them fully.

Label axes and include the units.

-
sary produce two graphs rather than one.

Give a main title explaining what the graph shows. Usually this is done as a caption in a
word processor. The caption should enable a reader to understand what the graph shows

sure you describe the graph so that someone else can understand it.

Table 29. Summary of graph types to use for different purposes

Purpose Types of graph


Illustrating distribution Stemleaf plot, tally plot, histogram, density
chart, boxwhisker plot
Illustrating differences between samples Bar chart, boxwhisker plot
Illustrating correlations Scatter plot
Illustrating associations Pie charts, bar charts
Illustrating sample sizes Line plot of running average (mean or median)

We will examine graphs in more detail in Chapter 12, which will also cover he presentation
of results. Sections 12.4.1 and 12.5 will deal with producing graphs in R and Section 12.4.3
will cover producing graphs in Excel. We will also make some references to graphs in each
of the sections dealing with the details of the various analytical methods. It is important to
remember that our graphical analysis should go alongside the mathematical one.