Sie sind auf Seite 1von 26

CrunchIt! 2.

0® Quick Start Guide


Version 1.2

Alan Dabney
Texas A&M University
©2011 by W.H. Freeman and Company

ISBN: 1-4292-6208-7

All rights reserved.

Printed in the United States of America

W.H. Freeman and Company


41 Madison Avenue
New York, NY 10010
Houndmills, Basingstoke
RG21 6XS England

www.whfreeman.com
What is CrunchIt!?
CrunchIt! Version 2 is a web-based statistical calculator and data analysis tool. It can
perform most of the statistical functions described in your W. H. Freeman and Company
text.

About this Document


This document is designed to get you up and running quickly with CrunchIt! There are
four main sections: Test Drive (for a brief run through of some of the key features of
CrunchIt! using an example dataset), Detailed Specifications (for detailed coverage of all
CrunchIt! features), Technical Requirements (for setting up your computer to run
CrunchIt!) and Technical Support.

Test Drive
Let’s take a quick test drive to get a glimpse into what CrunchIt! can do. First of all—
here’s what CrunchIt! looks like:

It looks a lot like spreadsheets you’re familiar with, right? On the left are links to
different datasets. In the middle are links to the main CrunchIt! tools: ‘Data’ for loading
and playing with data, ‘Statistics’ for computing summary statistics and specific
statistical operations, ‘Graphics’ for making pictures, and so on. Let’s load some data. If
we click on ‘Chapter 4’, we’ll see a bunch of datasets. Here’s the data from ‘Exercise
33’:
These data come from a study of which colors best attract beetles to sticky board; we’d
like to use the color that tends to bring in (and catch) the most bugs. Let’s save a copy
of these data to a comma-delimited (CSV) text file by clicking ‘Data’ then ‘Save Data
Table’:
This brings up a window that prompts us to select a location for the file:

We’ll just use the default and save to the Desktop. Now, let’s make a set of box plots,
with one box per color. Go to ‘Graphics’, then ‘Box Plot’:

This brings up the following window:


How do we create separate boxes for each level of ‘Color’? We need to “unstack” the
data, creating separate columns of beetle counts for each color. Click on the command
‘Unstack Data’,

which will bring up this window:


If we specify that the variable ‘Color’ contains the different levels in which we’re
interested (the different colors, in our case) and the variable ‘Beetles’ contains the actual
numbers in which we’re interested (beetle counts, in our case), then...

And, we’ve got separate columns of counts for each color. Now, let’s make the box
plots:
Shift-click to highlight the new columns...

and click ‘OK’:


It looks like blue and white are pretty much worthless. Green brings in a fair amount of
bugs, but yellow apparently outperforms them all, with a median count (the line inside
the box) between 45 and 50. Yellow also has less variability than the others (its box is
shortest in height), although there are a couple of extreme values (the two individual
dots). To save a copy of this picture, we can simply right-click the image and choose
‘Save Image As’.

Let’s look at some summary numbers by selecting ‘Statistics’ > ‘Summary Statistics’,
then ‘Column’, to indicate that we want to compute summaries on one column:
The following window will appear:

I’ve selected the ‘Yellow’ column; by default, all the possible summaries are selected,
but we can turn some of them off if we’d like. Here’s what we get:
If you recall, we could tell from the box plot that the median was between 45 and 50; it
turns out that it’s actually 46.5. We also saw that, while the overall variability was low
(the standard deviation came out as about 6.8), there were a couple of extreme values.
These are shown by the minimum and maximum values of 38 and 59, respectively. We
can recompute the data summaries without these two extreme values by using the
‘Filter data’ feature:

In the ‘Select’ box, we specify the ‘Yellow’ column. We then need to filter the data so
that only numbers greater than 38 and less than 59 are included in the calculations. To
do this, we need two filtering criteria. We create an extra filter criterion by clicking the
‘+’ sign:

This is what you will see:

And here are the updated summary statistics:

The median didn’t change (because we removed one large and one small value, we
knew it wouldn’t), the mean dropped just a bit, and the standard deviation dropped a lot
(s.d.’s are pretty sensitive to extreme values).

Going back to the box plots, it looks pretty clear that there are some substantial
differences in beetle counts by wood color. Even so, let’s do a formal test for whether
the mean counts with each color are the same. We’ll use one-way ANOVA:
This brings up the following window:

Notice that we’ve got the option of specifying our data by ‘Columns’ or ‘Factored’. The
‘Columns’ choice means the counts for each color appear in different columns, whereas
‘Factored’ puts all the counts in one column, with a separate “factor” column that
indicates color. Since we’ve unstacked our data, we can go either route. Here we’ll use
‘Factored’:
Our “response” is the ‘Beetles’ column containing the counts, whereas the “treatment”
is the ‘Color’ column containing the color for each count. Here’s what we get (note that I
didn’t do any filtering):

The last column contains the p-value, which is extremely small, indicating strong
evidence against the assumption of equal counts for the different board colors (as we
expected, based on the box plots).

There’s a lot more to explore in CrunchIt! (see the ‘Detailed Specifications’ section
below), but this should give you a feel for the main features. Enjoy!

Detailed Specifications
Data are entered manually or loaded from publisher-created files into a spreadsheet,
then statistical graphs may be created from the data, or computations may be
performed on them. Results of computations and graphs will typically appear in a new
results window on your screen. To close the results window, click the button at the top
right. There are four main menu options shown on the left of the spreadsheet that will
be used typically. They are
Data
Statistics
Graphics
Preferences

The Data Table

The Data Table is the spreadsheet into which you enter or load data. Each column
(labeled as Col 1, Col 2, etc.) should contain the values for a single variable in your
dataset. For this reason, we may refer to columns and variables interchangeably. It is
recommended that you name each column by entering text just below the numbered
column header, to the right of the #.

Often, each row of the data table will correspond to a single observation, subject, or run
of an experiment.

Opening Textbook Data Sets

To open and use a data set from your text, double-click on the chapter in the list at left,
then double-click on the appropriate example, exercise, or table number.

Entering Data

Each cell may hold a numeric value or text. Numeric values must be entered in decimal
format. No mathematical expressions will be evaluated. Do not enter commas.

Examples of valid numeric entries:

• 3
• −2.7
• 0.00000956
• 200000

Examples of entries that will be treated as text, not as numbers:

• five
• 2.4e4
• 1,250
• 1+1

Data

Clicking this menu option opens a menu that allows you to clear the data table, save it,
or load data from a file.
Data: Clear Data Table

This menu item erases all entries and labels from the Data Table.

Data: Load Data from File

This menu item allows you to load a data file that is stored on your computer. CrunchIt!
will recognize data files in comma-separated value (.csv) and data (.dat) formats. The
first line of the data file is taken to be the variable name, so be sure your files are
labeled this way.

Data: Load Data from URL

This menu item allows you to load a data file that is stored on the World Wide Web.

Link to Data

This option opens CrunchIt! with the current data pre-loaded, but any changes will be
erased. This is a convenient option if you have made changes to a data set you
wish to discard.

Data: Save Data Table

This menu item allows you to store the current data table to your computer in comma-
separated value (.csv) format. You may then load this file into CrunchIt! at a later time,
or import it into a spreadsheet program.

Data: Save Result

This menu item allows you to save the table of results returned by one of CrunchIt!’s
statistical functions. Statistics output is saved in a comma-separated format that can be
opened with a spreadsheet program like Excel. If you add the “.doc” extension, the
results can be opened with Word. Graphics normally will save as JPEG (.jpg) files; results
of computations and tests will be saved as files that can be opened with a spreadsheet
program like Excel.

Data: Random Number Generator

CrunchIt! has a built-in random-number generator that allows you to fill cells in the Data
Table with Uniform random data. This is mainly useful for trying out statistical tests and
graphs on made-up data.

Data: Add/Remove Columns/Rows

This allows you to add additional columns (variables) into a dataset, remove columns,
and add or delete rows. Added columns will be on the right side of the spreadsheet and
added rows will be at the bottom (the default is for fifty (50) available data rows). NOTE:
There is a minimum spreadsheet size of 11 columns and 50 rows; you cannot delete
rows or columns so that the dataset falls below these dimensions.

Data: Stack Data

This function will take two or more columns and stack them on top of one another. The
result will have one column with the original column name (labeled “ind” to identify the
“group”) and data in another column.

Data: Unstack Columns

This function takes a single column of values, together with a separate column of group
"labels", then adds a separate column for the values of each label. For example, ex04-
33 in BPS has beetle counts in one column and wood color in another; there are six
counts for each of the four colors. The unstack command would create four new
columns of beetle counts, one for each of the four colors.

Statistics

In general, statistical calculations and tests are performed by clicking on the desired
item in the Statistics menu, choosing one or more columns/variables, and entering
appropriate parameters. The results of the test will appear in a separate results window.
To copy the results into another program (like Word), right-click in the results window
and click Select All; right-click again and select Copy. You can then paste the contents
of the results window into your document. To directly print the results, select Print with
the first right click.

The dialog for each menu item has a check box labeled "Insert results into main grid."
Activating this option causes the results of the statistical calculation to be entered into
the data grid itself, rather than appearing in the results pane at the right.

Some statistical tests can be performed only on numeric columns/variables. These are
columns containing only numbers, and no text.

Alternative Hypotheses

Many of the statistical tests are hypothesis tests and consequently require you to select
one of the following options:

• Two-sided (not equal)


• Less than
• Greater than

The selected value indicates the nature of the alternative hypothesis. For instance, in the
case of a one-sample Z test, the null hypothesis could be that the underlying distribution
has mean 0. If ‘Two-sided’ were selected, the alternative hypothesis would be that the
true mean is any value other than 0; if less were chosen, the alternative hypothesis
would be that the true mean is less than 0. A confidence interval for the parameter will
also be created that corresponds to the alternate hypothesis selected. NOTE: For the
most part, your text only discusses two-sided confidence intervals (that is, the parameter
lies between values a and b), with some amount of confidence (specified using the input
box); one-sided intervals place all the “error” (α) in a single tail of the distribution. These
can be interpreted as “the parameter is at most b” or “the parameter is at least a.”

Statistics: Summary Statistics: Columns

This function will calculate a variety of standard statistics on any given numeric column.
The available statistics are:

• n: Number of entries in the column


• mean (arithmetic average)
• variance
• std.dev: Sample standard deviation
• std.err: Standard error
• median (50th percentile)
• range: The difference between the maximal and minimal values
• min
• max
• q1: The first quartile (25th percentile)
• q3: The third quartile (75th percentile)
• mode: The mode (most often occurring) of the values, if one exists.

Additionally, any percentile of the data may be requested by entering a comma-


separated list of values between 0 and 1. For example, entering .33, .66 would yield the
33rd and 66th percentiles.

Statistics: Summary Statistics: Correlation

This function allows you to calculate the correlation matrix (direction and strength of
linear relationships) for any number of numeric columns in your data grid. You should
always plot the data to ensure the relationships are linear and not curved in some
manner.

Statistics: Summary Statistics: Covariance

This function allows you to calculate the covariance matrix for any number of numeric
columns in your data grid. The covariance matrix has the variances of the variables (s2)
on the diagonal and the product of the correlation and standard deviations (rsxsy) in off-
diagonal entries. (This concept is usually not covered in an Introductory Statistics
course.)
Statistics: Tables: Frequency

This function generates a frequency table for any column of data. The resulting table will
include the number of times each unique value appears, as well as the corresponding
percentage.

Depending on the selected option, the table may be ordered in one of three ways:

• count: by the number of times each value appears


• value.asc: by value, in ascending order
• value.desc: by value, in descending order

A cutoff value may also be specified. Use the slider to enter a percentage; any value that
occurs in the column with a frequency less than that percentage will be grouped into an
"Other" category. For example, entering .1 results in values whose frequency is less than
ten percent being grouped together as "Other."

Statistics: Tables: Contingency

Given any pair of categorical columns, this function will generate a contingency (two-
way) summary table for those columns, along with the results of a Chi-squared test of
the independence of the selected columns.

Statistics: Z Statistics

The Z Statistics functions perform one- and two-sample tests based on the standard
normal distribution and a “known” (or assumed) population standard deviation. These
tests may only be applied to numeric columns. For the one-sample test, a single variable
and the standard deviation of the population from which it is drawn are both required.
The null hypothesis will be that the mean of the population from which the variable is
drawn is equal to a hypothesized value.

For the two-sample test, two numeric variables are required, along with the standard
deviation for the population from which each is drawn. The null hypothesis will be that
the mean of the two populations differ by a hypothesized value.

Statistics: Proportions

The Proportions functions will perform one- and two-sample tests on both raw data and
summarized data.

For the one-sample test, select a column, a success criterion, and a null proportion. A
success criterion is one of the values contained in the column. For instance, the column
might be labeled "Color," with each cell containing either "Red," "Green," or "Blue." Your
success criterion might be "Blue," and your null proportion might be 0.33. In this case,
the null hypothesis is that the proportion of "Blue" values is 0.33 in the underlying
population.

With the "One Sample with Summary" option, a one-sample proportion test may also be
calculated based on a summary of the data; instead of selecting a variable and providing
a success criterion, you simply supply the number of successes and total number of
observations.

Statistics: T Statistics

CrunchIt! will perform one-sample, two-sample, and paired T-tests on numeric columns.
For the one-sample T-test, the null hypothesis is that the distribution has the specified
mean. For the two-sample and paired T-tests, the null hypothesis is that the means of
the distributions differ by the specified amount.

The two-sample T-test requires you to indicate whether to use the pooled variance to
estimate the variance of the difference (typically this box is left unchecked).

The paired T-test requires that both columns have the same number of entries.

Statistics: Variances: Two Sample

Given two numeric columns, this function tests the hypothesis that the ratio of variances
of the underlying populations is a specified value. Note: use of this test assumes Normal
populations and can give misleading results for any departures from Normality.

Statistics: Regression

CrunchIt! can perform both linear and logistic regression with multiple independent
variables. On the left side of the dialog select one or more columns containing
explanatory variables (also called "independent variables" or "treatment variables"). On
the right side of the dialog select a single column containing a response variable (also
called a "dependent variable").

For linear regression, all the columns must be numeric. For logistic regression, the
explanatory variables must be numeric, but the response variable may contain text.
Also, for logistic regression the value of the response variable that corresponds to
success must be specified.

Statistics: ANOVA

Crunchit! will perform one- and two-way ANOVA. One-way ANOVA will accept data in
separate columns for each “treatment” (Columns) or with all data in a single column with
a separate column to indicate the treatment (group). Two-way ANOVA requires all data
in a single column with two group indicator variables and a balanced design—that is,
each treatment combination has the same number of observations.

Statistics: Nonparametrics: Sign Test

Given a numeric column, the Sign Test One-Sample function performs a sign test to test
the null hypothesis that the underlying median is the specified value.

The Sign Test Two-Sample function for paired data takes two numeric columns of the
same length and tests the null hypothesis that the median of the differences in each row
is the specified value.

Statistics: Nonparametrics: Chi Squared

This function performs a Chi-squared goodness of fit test. Do the data agree with a
specified discrete distribution?

Statistics: Nonparametrics: Wilcoxon

This function performs a Wilcoxon signed rank test on a single numeric column of data.
The null hypothesis is that the underlying distribution has the specified median.

Statistics: Nonparametrics: Wilcoxon Paired

This function performs a Wilcoxon signed rank test on two numeric columns. The null
hypothesis is that the medians of the underlying distributions differ by the specified
amount. (If this amount is 0, the null hypothesis amounts to both distributions being the
same.)

Statistics: Nonparametrics: Mann-Whitney

Given a pair of numeric columns, this function performs a Mann-Whitney test (the
nonparametric equivalent to a two-sample t test). The null hypothesis is that the
locations (medians) of the underlying distributions differ by the specified amount; if the
difference is 0, this amounts to a null hypothesis that the distributions are the same.

This procedure requires the two samples to be in different columns; if the data are in a
single column with another column indication "group membership," use Kruskal-Wallis to
perform the test.

Statistics: Nonparametrics: Kruskal-Wallis

These functions perform a Kruskal-Wallis rank sum test of the null hypothesis that the
location parameters (medians) of the underlying distributions of two or more groups are
the same.
If each column contains a group, use the Kruskal-Wallis function. If one column contains
the group labels ("factor specification variables"), and one column contains the response
variable, use the Kruskal-Wallis Factored function.

Statistics: Distribution Calculators

The distribution calculators built into CrunchIt! allow you to calculate the probability that
a random variable will take on certain values, given a wide variety of both continuous
and discrete distributions.

For both the Continuous Distribution Calculator and the Discrete Distribution Calculator,
first choose a specific distribution, such as normal or binomial. Then enter appropriate
parameters for the distribution. For example, the normal distribution has two
parameters: the mean and the standard deviation (sd), whereas the binomial distribution
requires the number of trials (n), and the probability of success for each trial (p).

Once the distribution and its parameters have been selected, choose a comparison and a
value of X. Click "Submit," and CrunchIt! will calculate the probability of a random
variable with the specified distribution taking on a value that satisfies the selected
comparison with respect to X.

For instance, if you were to make the following selections:

• Distribution: normal
• mean: 1
• sd: 0.5
• Comparison: less than or equal to
• X: 0

CrunchIt! would inform you that the probability that a value sampled from the Normal
distribution with mean −1 and standard deviation 0.5 would be less than or equal to 0 is
0.977.

Graphics

Just like the statistical tests, plots are generated by clicking on the desired menu item in
the Graphics menu, choosing one or more columns/variables, and entering appropriate
parameters. The resulting image will appear in a results window. For most plots, the title
and x- and y-axis labels may be specified.

To copy the graph into a document, right-click in the graph and Select all, then right-
click again and select Copy. You can then paste the graph into the document.
Alternately, you can select Print from a right click.

Graphics: Bar Plot


One-dimensional bar plots may be generated in two ways. The Get Frequencies option
takes a single column and generates a bar plot of that column's frequency table. The
With Data option should be used if you already have data in the form of a table of labels
and counts that you want to turn into a bar plot.

The two-way option creates either side-by-side or stacked bar plots. Here, you specify
the Group factor (the outer category—a bar will be formed for each value of this
variable), the Series Factor (the inner category label), and a column of counts.

Graphics: Pie Chart

Pie charts may be generated in two ways. The Get Frequencies option takes a single
column and generates a pie chart of that column's frequency table. The With Data option
should be used if you already have the table that you want to turn into a pie chart; it
takes one column of category or group labels and one column of counts.

Graphics: Histogram

The Histogram function generates a histogram from a single numeric column. Three
kinds of histograms may be generated:

• Frequency (freq): frequency is the number of observations in the bar


• Relative Frequency (relative.freq): frequency is in decimal form as the fraction of
the whole
• Density (density): the y axis is adjusted so the total area of the histogram is 1

Graphics: Stem and Leaf

This function generates a stem-and-leaf diagram from a single numeric column.

Graphics: Box Plot

This function generates a box-and-whisker plot from a numeric column. If more than one
column is specified, side-by-side box plots will be created.

Graphics: Dot Plot

This function generates a Cleveland dot plot from a single numeric column.

Graphics: Scatter Plot

Provided one numeric column of X-values and one numeric column of Y-values, this
function generates a scatter plot. The plot may consist of dots, lines, or both. If lines are
drawn, they are connected in row order. Use this option for time plots (with numeric time
indices) with the line option to connect the points.

Graphics: Pairs Plot

Given any number of numeric columns, this function produces a matrix of scatter plots.
In each plot, one column provides the X-values, and one column provides the Y-values.

Graphics: QQ Plot

This function generates a normal QQ plot of the values in a numeric column. The line
passes through the first and third quartiles.

Graphics: Parallel Coordinates

This function generates a parallel coordinates plot for any number of numeric columns.
This is used to graph high-dimension data, and is not typically used in Introductory
Statistics courses.

Graphics: Stars Plot

This plot is used to display multivariate data. Each “spoke” represents a variable, and
each star will represent an observation. This plot is not typically used in Introductory
Statistics courses.

Preferences

This function allows you to set the number of decimal places shown in output—either full
precision or, using the slider, a specified number of places.

Miscellaneous

Filtering Data

In the interface for Statistics and Graphics functions, there is an option to ‘Filter Data’.
This is used to select for display or analysis only a subset of the data. There are three
components to the ‘Filter Data’ widget. The first is a drop-down menu for selecting the
variable for which you wish to create filtering criteria. The second is a drop-down menu
for specifying the operation to be applied to the selected variable. The third is a text box
for entering the number to use as the filtering threshold. There is also a ‘+’ button on
the right, with which you create additional filtering widgets for more complicated filtering
criteria. See the ‘Test Drive’ section at the top of this document for an example.

Multi-column Analysis with Factors

With functions that involve multiple columns, there will sometimes be the option of
carrying out the analysis by ‘Columns’ or ‘Factored’. The ‘Columns’ option is appropriate
when the values of the dependent variable are in separate columns for each value of the
independent variable. The ‘Factored’ option is appropriate when all values of the
dependent variable are in a single column, and the different values of the independent
variable are in their own column. See the ‘Test Drive’ section at the top of this
document for an example.

Technical Requirements
If you have or can install the recommended browser on your computer, you can run
CrunchIt!

Internet Access

While a high-speed connection is recommend, CrunchIt! will work on a dial-up


connection. Certain operations, such as analyzing large data sets, are not recommended
on a dial-up connection.

Operating Systems

CrunchIt! runs on any of the three major platforms (Mac, PC, Unix). The only requirement
is a compatible browser and version of the Flash plug-in. Virtually all computers
purchased in the last five years will have this software pre-installed.

Recommended Browsers

We recommend that you use:

• Microsoft Internet Explorer: v6.0 or above


• Firefox/Mozilla: v1.0 or above

We do not currently support the Safari browser.

To download the latest available browser versions for your operating system, click the
link(s) below.

You will need to disable any pop-up blocking software. For more information on disabling
different pop-up blocking software visit:

http://www.safetyontheweb.com/support/disablepb.asp

The Macromedia Flash 6.0 (or above) plug-in is also required. While this is likely to be on
your computer already, to download Flash go to:

http://get.adobe.com/flashplayer/otherversions/

If you are uncertain about whether your system can run CrunchIt!, go to:
http://courses.bfwpub.com/syscheck/

Technical Support

For Students:

http://www.bfwpub.com/newcatalog.aspx?
page=support/studenttechsupport.html#topform

For Instructors:

http://www.bfwpub.com/newcatalog.aspx?
page=support\instructortechsupport.html#topform

Das könnte Ihnen auch gefallen