Beruflich Dokumente
Kultur Dokumente
Volume 11
2007
Weve got it in the bag Predictive Analytics Leading food retailer creates helps a leading retailer Top Shelf Promotions PG5 create protable promotions PG 5
DIRECTIONS
2007 North American Conference
T U R N I N G DATA INTO DECISIONS
Join us at the SPSS Directions North American User Conference. The most educational conference for the predictive analytics industry. Learn the latest trends and best practices in statistics, data mining, market research, and business intelligence technology. Keynote Speakers
Tom Davenport
n n
Leading business strategy analyst Author, Competing on Analytics: The New Science of Winning
Former U.S. secretary of defense, senator, and congressman Thought leader on defense, economic, and international issues
and integration across SPSS product lines helps trigger ideas and opportunities we hadnt previously considered.
Dear valued SPSS customer: Welcome to our 2007 Product Catalog! Weve titled this catalog Decisions because we feel SPSS products and solutions are perfectly positioned to help your organization make the best decisionsbased on predictive analytics. SPSS customers are amazing peopleand theyre using SPSS analytic software tools and solutions to do some amazing things. Whether its identifying their most protable customers, ghting crime, performing clinical research, detecting fraud, or improving graduation and retention ratesthey are doing amazing things with predictive analytics. I know when it comes to the importance of using analytics for better decision making, Im preaching to the choir. But now it appears that the rest of the world is nally catching on to the importance of this message. Thomas Davenport, management guru and author of the Harvard Business School Press book, Competing on Analytics: The New Science of Winning, says that organizations are nally getting it and making good use of their massive amounts of data. These high-performing enterprises are using data-driven insights to build their competitive strategiesstrategies that are generating solid results. The bottom line: You can successfully compete on analyticssophisticated quantitative and statistical analysis and predictive modeling. In this catalog youll also notice many new product innovations that will help you compete on predictive analytics. One of the biggest announcements is that SPSS 16.0 and most of the add-on modules now run on the most popular operating platforms: Windows, Mac, and Linux. While SPSS 16.0 may look similar to earlier versions, the programming code underneath is radically differentallowing us to be much more exible in continually upgrading this and other products. This is just one example of how SPSS is committed to staying ahead of the curveand helping our customers to do the same.
Table of contents
SPSS 16.0......................................................................................................8
Analyze data using comprehensive statistical software Discover complex relationships in your data
SPSS Complex Samples 16.0...................................................................22 SPSS Conjoint 16.0 .................................................................................24 SPSS Data Preparation 16.0....................................................................26 SPSS Exact Tests 16.0 ............................................................................28
Reach accurate conclusions with small samples or rare occurrences
SPSS Regression Models 16.0 ................................................................32 SPSS Tables 16.0 .....................................................................................34 SPSS Trends 16.0.....................................................................................36
Build expert time-series forecastsin a ash Maximize productivity with SPSS Server
SPSS Stand-alone ProductsComplement SPSS with these products to form a complete analytical system
ShowCase Suite 7.0..................................................................................40
Attention System i5 users: get more intelligence from your data
Sincerely,
Editorial Departments
Cover story....................................................................................................5
Predictive analytics helps Stop & Shop create protable promotions
ome companies leave it to chance or a gut feeling about which product promotions will attract the most customers. Not Stop & Shop. The large New England food retailer wanted to understand and predict its customers behavior to zero in on promotions that increase revenueand customer loyalty. They turned to predictive analytics from SPSS Inc. to help create timely, compelling promotional offers. SPSS predictive analytics will help our organization better understand our diverse group of customers and their unique needs with relation to products and promotions, said Ed Garabedian, manager of BI analysis and support at Stop & Shop. This multibillion-dollar supermarket corporation also sees predictive analytics as the key to identifying specic product attributes most important to shoppers predicting product demand to better manage production and distribution. Jack Noonan, SPSS president and CEO, said, Stop & Shop is one of many advanced retailers that recognize the importance of understanding and predicting consumer behavior in order to create customer loyalty and drive revenue. SPSS predictive analytics is clearly a cutting-edge competitive advantage in all customerdriven industries.
Stop & Shop is the largest food retailer in New England, employing 59,000 associates in its network of stores, distribution centers, manufacturing plants and ofces, which stretch across more than 180 communities in Connecticut, Massachusetts, New Hampshire, Maine, New York, New Jersey and Rhode Island. Sixteen of the top 20 retailers worldwide use SPSS software for such applications as customer understanding, operational efciency and product development.
1. Planning
Beginning with the end result in mind, the first steps are to set objectives, identify the data sources, and carefully craft the process. Products: n SamplePower n SPSS Conjoint n SPSS Complex Samples
3. Data access
Data is brought in from available sources, using ODBC or direct file input into SPSS. Product: n SPSS Base
5. Data analysis
This is where the data gets examined, tested, explored, and transformed. Patterns are identified, hypotheses are tested, and information is extracted. Products for understanding data: n SPSS Base n SPSS Data Preparation n SPSS Complex Samples Products for predicting numerical outcomes: n SPSS Base n SPSS Regression Models n SPSS Advanced Models n SPSS Complex Samples n SPSS Neural Networks n Amos
2. Data collection
Data is collected through surveys, online activity, call centers, and more. Products: n Dimensions n SPSS Data Entry
Products for identifying groups: n SPSS Base n SPSS Regression Models n SPSS Advanced Models n SPSS Complex Samples n SPSS Tables n SPSS Categories n SPSS Classification Trees n SPSS Exact Tests n SPSS Neural Networks Product for forecasting time-series data: n SPSS Trends
6. Reporting
Data is summarized, put in tables and charts, and ready for consumption. Products: n SPSS Base n SPSS Tables n Dimensions
8. Success!
Take a moment to celebrateyouve done it! But then its back to planning how to maintain your competitive advantage once again using SPSS products to help you reach the next level of success.
7. Deployment
Data, reports, and procedures are distributed to end users globally, but with central management of interaction and access. Product: n SPSS Predictive Enterprise Services
SPSS 16.0
Analyze data using comprehensive statistical software
With SPSS you can:
Save time with easy data access and management n Use an even broader range of statistics for better analysis n Report results in easy-to-understand formats
n
Specications
Symbol indicates a new feature Data access and data export Open multiple data les simultaneously in a single SPSS session Stata data le import/export Dimensions data model, enabling you to import/export data to/from Dimensions products Ability to import from and export to OLE DB data sources without having to go through ODBC Database Wizard Import SAS data Text Wizard Import/export data in Excel format Import/export data in Excel 2007 format Easily write back to databases from SPSS by using the Database Wizard. For example, you can: Create a new table and export it to your database Add new rows to an existing table Add new columns to an existing table Export data to existing columns in a table Save comma-separated value (CSV) text les from SPSS data les Export output to PowerPoint, Word, and Excel Data management and preparation Support the use of Unicode data Prepare continuous-level data for analysis with the Visual Binner Create your own custom programs with the Output Management System. Turn output from SPSS procedures into data and create your own programs for: Bootstrapping; Jackkning and Leaving One Out methods; and Monte Carlo simulations Create custom routines in SPSS with the OMS Control Panel Improved Data Editor Congure attributes, so that some can be hidden Spell check value labels and variable labels Sort by variable name, type, format, etc. Use Find and Replace functionality Easily eliminate duplicate records with the Identify Duplicate Cases tool Make sense and keep track of your data les by adding notes to them with the Data File Comments command Create read-only datasets More accurately describe your data using longer variable names (up to 64 bytes) Create value labels up to 120 characters Clone or duplicate datasets Apply an extended Variable Properties command to customize properties for individual users SPECIFICATIONS CONTINUED ON PG 10 and PG 11
dataset. The Identify Duplicate Cases tool and the Restructure Data Wizard help make sure your data is clear and organized properly for analysis.
SPSS, the premier statistical software product for data analysis and data management, helps solve your business and research problems. It is a modular, tightly integrated, full-featured product line that allows you to add modules and products to ensure you meet all your analytical needs. Compared to other data analysis packages, SPSS is easier to use, has a lower total cost of ownership, and comprehensively addresses the entire analytical process. Underlying this offering are more than 38 years of SPSS analytical expertise, assuring users that the included statistics and procedures are tried, tested, and proven as among the best in the eld.
Create, distribute, and manipulate information for ad-hoc decision making featuring SPSS award-winning pivoting technology.
5
Quickly access massive amounts of data from numerous database sources with SPSS Database Wizard. SPSS gives you direct access to Excel, SAS, and text data. You will never have to waste time re-keying data for analysis.
8.59% 16.02%
19.53%
17.58%
38.28%
Eliminate the time-consuming task of labeling all your data by using the Dene Variables Properties tool. Create your labels once, and the Dene Variable Properties tool copies and presents your labels to your entire dataset.
Export your SPSS results directly into Microsoft Word, PowerPoint, and Excel or as a PDF document. For continuous-level data, the Visual Binner lets you easily create bands (e.g., break ages or income into specic ranges). A data pass provides you with a histogram that helps you specify logical cutpoints.
SPSS 16.0
New features and capabilities
Greater flexibility Expanded programmability functionality
n
Specications
Longer text stings (up to 32,000 bytes) Dene Variables Properties tool Copy Data Properties tool Data Restructure Wizard Dene missing values and value labels for strings of any length Aggregate data to external or to the active data le Automatically convert string variables to numeric with Autorecode Use an autorecode template to append existing recode schemes Recode a set of variables that has a single scheme at one time Autorecode blank strings so they are dened as user-missing Date and Time Wizard: Easily work with data containing time and dates in SPSS Create a time/date variable from a string containing a date variable Create a time/date variable from variables that include individual date units, such as month or year Calculate times and dates Round instead of truncating date/ time information, if desired Add decimal places to time data, if desired Separate a date unit from a time/ date variable Apply splitters in the Data Editor for easier viewing of wide or long data les Create your own dictionary information for variables by using Custom Attributes. For example, create a custom attribute describing transformations for a derived variable with information explaining how it was transformed. Customize the viewing of extremely wide les with Variable Sets. You can instantly reduce the variables shown in the Variable View and Data View windows to a subset while keeping the entire le loaded and available for analysis. Use syntax to change string length and basic data type Set a permanent default working directory
A new interface enables you to resize dialog boxes and drag and drop variables from one pane to another. Because it supports Unicode, you can work more easily with data in multiple languages. Plus, for the rst time, you can use SPSS on Windows, Mac, or Linux platforms.
You can write scripts in Python to automate repetitive tasks in the user interface, such as customizing output Access procedures written in R to expand the breadth of statistical options available Create a new data sourceincluding the simultaneous creation of variables and caseswithout having to import the original data source in its entirety into SPSS
Better reporting
n
A new visualization engine replaces IGRAPH, making graph editing faster and easier The enhanced Chart Editor delivers a similar level of functionality as the previous IGRAPH editor Search-and-replace capability added to the Output Viewer
Using the new add-on module, SPSS Neural Networks, you can explore your data and uncover unexpected connections. Neural networks are non-linear data modeling tools with input and output layers plus one or more hidden layers of interconnected nodes.
n n
Enhanced Date/Time Wizard Round instead of truncating date/time information, if desired Add decimal places to time data, if desired Enhanced Data Editor Use Find and Replace in both the Variable View and the Data View Spell-check for value labels and variable labels Sort by variable name, type, and format Congure the data so that only certain attributes are seen Dene missing values and value labels for data strings of any length Change string length, or change the basic data type through SPSS syntax Suppress the number of active datasets in the user interface Set a permanent default working directory Import and export data in Excel 2007 format
Transformations Easily nd and replace text strings in your data using the nd/replace function Recode string or numeric value Recode values into consecutive integers Create conditional transformations using DO IF, ELSE IF, ELSE, and END IF statements Use programming structures, such as do repeat-end repeat, loop-end loop, and vectors Compute new variables using arithmetic, cross-case, date and time, logical, missing-value, random-number, statistical, or string functions
Multithreaded algorithms improve performance on machines containing multiple processors or multiple cores. The following algorithms are now multithreaded: in SPSS Base, Linear Regression, Correlation, Partial Correlation, and Factor Analysis; and, in SPSS Complex Samples, Complex Samples Select. Continued integration with SPSS Predictive Enterprise Services Store and manage SPSS output les, syntax les, and Python script les Gain performance benets
10
Specications continued
Count occurrences of values across variables Make transformations permanent or temporary Execute transformations immediately, batched or on demand
Linear Regression Ordinal regressionPLUM Multithreaded algorithms: Correlation, partial correlation, linear regression, factor analysis
Descriptive statistics Crosstabulations Frequencies; Descriptives; Explore; Descriptive ratio statistics Bivariate statistics Means; t tests; ANOVA; Correlation (Bivariate, Partial, Distances); and Non-parametric tests Prediction for numerical outcomes and identifying groups Factor Analysis K-means Cluster Analysis Hierarchical Cluster Analysis TwoStep Cluster Analysis Discriminant
Reporting Reports OLAP cubes Case summaries Report summaries Graphs Categorical charts 3-D Bar: Simple, cluster, and stacked Bar: Simple, cluster, stacked, dropped shadow, and 3-D Line: Simple, multiple, and drop-line Area: Simple and stacked Pie: Simple, exploding, and 3-D effect High-low: High-low-close, difference area, and range bar
Box plot: Simple and clustered Error bar: Simple and clustered Error bars: Add to bar, line, and area charts; and condence level, S.D, or S.E. Dual-Y axes and overlay Scatterplots Simple, grouped, scatterplot matrix, and 3-D Fit lines: Linear, quadratic, or cubic regression; Lowess smoother; condence interval control; and for total or subgroups, display spikes to line Bin points by color or marker size to prevent overlap Density charts Population pyramids: Mirrored axis to compare distributions; with or without normal curve Dot charts: Stacked dots show distribution; symmetric, stacked, and linear Histograms: With or without normal curve; custom binning options Quality control charts Pareto, X-Bar, range, Sigma, individual chart, or moving range chart Automatic agging of points that violate Shewhart rules, the ability to turn off rules, and the ability to suppress charts Diagnostic and exploratory charts Caseplots and time-series plots Probability plots Autocorrelation and partial autocorrelation function plots Cross-correlation function plots Receiver-Operating Characteristics Multiple use charts 2-D line charts (with 2 scale axes) Charts for multiple response sets Custom charts Graphics Production Language (GPL), a custom chart creation language, enables advanced users to attain a broader range of chart and option possibilities than the interface supports to create mixed charts and more Editing options Automatically sort and reorder categories by label, value, or statistic Data value labels: Drag and drop, add connecting lines, and match color to subgroup Select and edit specic elements directly within a chart: Colors, text, and styles Choose from a wide range of line styles and weights Display gridlines, reference lines, leg ends, titles, footnotes, and annotations Y=X reference line Layout options Paneled charts: Create a table of subcharts, one panel per level or condition; multiple row and columns 3-D effects: Rotate, modify depth, and display backplanes Chart templates Save selected characteristics of a chart and apply them to others automatically Apply the following attributes at creation or edit time: Layout, titles, footnotes, and annotations; chart element styles; data element styles; axis scale range; axis scale settings; t and reference lines; and scatterplot point binning
Tree-view layout and ner control of template bundles Export SPSS output to PDF Choose to optimize the PDF for Web viewing Control whether PDF-generated bookmarks correspond to Navigator Outline entries in the Output Viewer. Bookmarks facilitate navigation of large documents. Control whether fonts are embedded in the document. Embedded fonts ensure that the reader of your document sees the text in its original font, preventing font substitution. Easily open/save and create new output les through syntax System requirements For SPSS for Windows Operating System Microsoft Windows XP (32-bit versions) or Vista (32-bit or 64-bit versions) Hardware Intel or AMD x86 processor running at 1GHz or higher RAM: 512MB or more; 1GB recommended 450MB of available hard-disk space CD-ROM drive Super VGA (800x600) or a higher resolution monitor For connecting with an SPSS Server, a network adapter running the TCP/ IP network protocol Browser Internet Explorer 6 or Mozilla Firefox 1.0.4 or Netscape 7.1 For SPSS for MAC OS X
Operating system
Apple Mac OS X 10.4 (Tiger) Hardware PowerPC or Intel processor RAM: 512MB of RAM or more; 1GB recommended 800MB of available hard-disk space CD-ROM drive Super VGA (800x600) or a higher resolution monitor Browser Safari 1.3.1 or Firefox 1.5 or Netscape 7.2 SPSS 16.0 for Linux Operating system* Any Linux OS that meets the following requirements: Kernel 2.4.33.3 or higher glibc 2.3.2 or higher XFree 86-4.0 or higher ibstdc++5 Hardware Processor: Intel or AMD x86 processor running at 1GHz or higher RAM: 512MB or more; 1GB recommended 450MB of available hard-disk space CD-ROM drive Super VGA (800x600) or a higher resolution monitor Browser Konqueror 3.4.1 or Firefox 1.0.6 or Netscape 7.2
11
Specications
Multilayer Perceptron (MLP) procedure Fits an MLP neural network, which uses a feedforward architecture Can have multiple hidden layers One or more dependent variables may be speciedscale, categorical, or a combination of these Predictors can be factors or covariates EXCEPT subcommand excludes selected variables RESCALE subcommand rescales covariates or scale dependent variables. PARTITION subcommand species the method of partitioning the active dataset into training, testing, and holdout samples ARCHITECTURE subcommand is used to specify the neural network architecture The number of hidden layers in the neural network The activation function to use for all units in the hidden layers (Hyperbolic tangent or Sigmoid) The activation function to use for all units in the output layer (Hyperbolic tangent, Sigmoid, or Softmax) CRITERIA subcommand is used to specify computational resources STOPPINGRULES subcommand species the rules that determine when to stop training the neural network MISSING subcommand controls whether user-missing values for categorical variablesthat is, factors and categorical dependent variables are treated as valid values PRINT subcommand indicates the tabular output to display and can be used to request a sensitivity analysis. Various display options are available. PLOT subcommand indicates the chart output to display. Various choices are available. SAVE subcommand writes optional temporary variables to the active dataset. You can save the predicted value or category or the predicted pseudo-probability OUTFILE subcommand saves XML-format les containing the synaptic weights Radial Basis Function (RBF) procedure
SPSS Neural Networks provides a complementary approach to the statistical techniques available in SPSS Base and its modules. From the familiar SPSS interface, you can mine your data for hidden relationships, using either the Multilayer Perceptron (MLP) or Radial Basis Function (RBF) procedure. With either of these approaches, the procedure operates on a training set of data and then applies that knowledge to the entire dataset, and to any new data.
A new addition to the SPSS product family, SPSS Neural Networks offers techniques that enable you to explore your data in new ways and, as a result, build more accurate and effective predictive models. A computational neural network is a set of non-linear data modeling tools consisting of input and output layers plus one or two hidden layers. The connections between neurons in each layer have associated weights, which are iteratively adjusted by the training algorithm to minimize error and provide accurate predictions. You set the conditions under which the network learns and can nely control the training stopping rules and network architecture, or let the procedure automatically choose the architecture for you.
Fits an RBF neural network, which uses a feedforward, supervised architecture Has only one hidden layer Trains the network in two stages and, in general, is faster than an MLP network Subcommands listed for the MLP procedure (above) perform similar functions for the RBF procedure, except that:
subcommand, users can specify the Gaussian radial basis function used in the hidden layer: either Normalized RBF or Ordinary RBF
subcommand, users can specify the computation settings for the RBF procedures, specifying the hidden-unit overlapping factor that controls how much overlap occurs among the hidden units
12
The results of exploring data with neural network techniques can be shown in a variety of graphic formats. This simple bar chart is one of many options.
From the Multilayer Perceptron (MLP) dialog, you select the variables and covariates that you want to include in your model.
Senior Research and Planning Associate, Institutional Research and Planning Cornell University
[I am] very impressed. [SPSS Neural Networks] runs quickly with large datasets...this is denitely a feature I will be using. Daniel Robertson
In an MLP network like the one shown here, the data feeds forward from the input layer through one or more hidden layers to the output layer.
13
Specications
GENLIN and GEE GENLIN procedures provide a unifying framework that includes classical linear models with normally distributed dependent variable, logistic, and probit models for binary data, and loglinear models for count data, as well as various other nonstandard regression-type models. GEE procedures extend the generalized linear model to correlated longitudinal data and clustered data. Particularly, GEE model correlations within subjects. Provides a common framework for the following outcomes: continuous outcomes, count data, event/trial data, claim data, ordinal outcomes, combination of discrete and continuous outcomes, and correlated responses within subjects Specify model effects, an offset or scale weight variable if either exists, the probability distribution, and the link function Include or exclude the intercept Specify an offset variable or x the offset at a number Specify a variable that contains Omega weight values for the scale parameter Choose from probability distributions: Binomial, Gamma, inverse Gaussian, negative binomial, normal, multinomial ordinal, Tweedie, and Poisson Choose link functions: Complementary log-log, identity, log, log complement, logit, negative binomial, negative log-log, odds power, probit, and power Control statistical criteria for generalized linear models and specify numerical tolerance for checking singularity. Specify: Type of analysis for each model effect: Type I, Type III, or both A value for starting iteration for checking complete and quasi-complete separation The condence interval level for coefcient estimates and estimated marginal means Parameter estimate covariance matrix: Model-based estimator or robust estimator Hessian convergence criterion Initial values for parameter estimates Log-likelihood convergence criterion Form of the log-likelihood function M aximum number of iterations for parameter estimation and log-likelihood Maximum number of steps in step-halving method Model parameters estimation method: Fisher scoring method or Newton-Raphson method Parameter convergence criterion Method of tting the scale parameter: Maximum likelihood, deviance, Pearson Chi-square, or xed at a number Tolerance value used to test for singularity Specify the working correlation matrix structure used by the GEE to model correlations within subjects, and control statistical criteria in the non-likelihood-based iterative tting algorithm. Specify: on the following platforms: Available The within-subject or time effect n Mac n Linux Windows Correlation matrix structure: Independent working, AR(1) working, exchangeable working, xed working, m-dependent working, and unstructured working Whether to adjust the working correlation matrix estimator by the number of nonredundant parameters Whether to use robust or the model-based estimator or parameter estimate covariance matrix for GEE The Hessian convergence criterion for the GEE Maximum iterations Relative or absolute parameter convergence criterion The number of iterations between updates of the working correlation matrix Display estimated marginal means of the dependent variable for all level combinations of a set of factors. Specify: The cells for which estimated marginal means are displayed The covariate values to use when computing the estimated marginal means Whether to compute estimated marginal means based on the original scale of the dependent variable or on the link function transformation The factor or set of crossed factors, the levels or level combinations which are compared using the contrast type specied The type of contrast to use for the levels of the factor, or level combinations of the crossed factors. The following contrast types are available: Pairwise, deviation, difference, Helmert, polynomial, repeated, and simple. The method of adjusting the signicance level used in tests of the contrasts: Least signicant difference, Bonferroni, Sequential Bonferroni, Sidak, and Sequential Display the following: Correlation matrix for parameter estimates, covariance matrix for parameter estimates, case processing summary, descriptive statistics, goodness of t, general estimable function, iteration history, Lagrange multiplier test, set of contrast coefcient (L) matrices, model information, parameter estimates and corresponding statistics, model summary statistics, and working correlation matrix Save to the working data le: Predicted value of the linear predictor, estimated standard error of the predicted value of the linear predictor, predicted value of the mean of the response, condence interval for the mean of the response, leverage value, raw residual, Pearson residual, deviance residual, standardized Pearson residual, standardized deviance residual, likelihood residual, and Cooks distance Save to an external le: The parameter correlation matrix and other statistics to an SPSS dataset, the parameter covariance matrix and other statistics to an SPSS dataset, and the parameter estimates and the parameter covariance matrix to an XML le MIXED Expands the general linear model used in the GLM procedure so that data can exhibit correlation and non-constant variability Fit the following types of models: SPECIFICATIONS CONTINUED ON PG 16
Created to provide you with more statistical power, SPSS Advanced Models enables you to reach more accurate conclusions. Consider what it would be like to harness sophisticated univariate and multivariate analytical techniques and unleash them on your data. Break through the barrier between general analysis and advanced modeling, and begin reaping the rewards today.
Require multiple outcomes Want to measure outcomes over time Analyze data with hierarchical structure Estimate length of time until an event
14
A marketing group tests three campaigns to determine which promotion has the greatest effect on sales.
The random effects variance is signicant and larger than the residual variance. Most of the variability unaccounted for by the xed effects is due to the market-to-market sales variation.
15
SPSS Advanced Models specications continued Fixed effects ANOVA model, randomized complete blocks design, split-plot design, purely random effects model, random coefficient model, multilevel analysis, unconditional linear growth model, linear growth model with person-level covariate, repeated measures analysis, and repeated measures analysis with time-dependent covariate Opt to apply frequency weights or regression weights Use one of six covariance structures offered Select from 11 non-spatial covariance types Choose CRITERIA to control the iterative algorithm used in estimation and to specify numerical tolerance for checking singularity Specify the mixed model xed effects: No intercept, Type I sum of squares, and Type III sum of squares Specify the random effects: Identify the subjects and covariance structure (rstorder autoregressive, compound symmetry, Huynh-Feldt, identity, and unstructured variance components) Depending on the covariance type specied, random effects specied may be correlated Estimation methods: Maximum likelihood and restricted maximum likelihood Print covariance matrix of residual Specify the residual covariance matrix in the mixed effects model Save xed predicted values, predicted values, and residuals
Customize hypotheses tests by specifying null hypotheses as linear combinations of parameters Save standard error of prediction Means subcommand for xed effects, which displays the dependent variables estimated marginal means in the cells and its standard errors for the specied factors
GLM Describe the relationship between a dependent variable and a set of independent variables Select univariate and multivariate lack-of-t tests Regression model Fixed effect ANOVA, ANCOVA, MANOVA, and MANCOVA Random or mixed ANOVA and ANCOVA Repeated measures: Univariate or multivariate Doubly multivariate design Four types of sums of squares Full-parameterization approach to estimate parameters in the model General linear hypothesis testing for parameters in the model Write a covariance or correlation matrix of the parameter estimates in the model in a matrix data le Plots: Spread vs. level, residual, and prole Post hoc tests for observed cell means Estimated population marginal means for predicted cell means Save variables to the active le: Unstandardized predicted values, weighted unstandardized predicted values, unstandardized residuals, weighted unstandardized residuals, deleted residuals, standardized residuals, Studentized residuals, standard errors of predicted value, Cooks distance, and uncentered leverage values
Pairwise comparisons of expected marginal means Linear hypothesis testing of an effect vs. a linear combination of effects Option to save design matrices Contrasts: Deviations, simple, difference, Helmert, polynomial, repeated, and special Print: Descriptive statistics, tests of homogeneity of variance, parameter estimates, partial Eta2, general estimable function table, lack-of-t tests, observed power for each test, and a set of contrast coefcient (L) matrices
Criteria specication: Convergence, maximum iterations, probability of Chi-square for model, and maximum steps Specied cell weights and maximum order of terms Plots of standardized residuals vs. observed and expected counts Normal probability plots of standardized residuals
VARCOMP Variance component estimation Estimation methods: ANOVA MINQUE, maximum likelihood, and restricted maximum likelihood Type I and Type III sums of squares for the ANOVA method Choices of zero-weight or uniform-weight methods Choices of ML and REML calculation methods Save variance components estimates and covariance matrices Print: Expected mean squares, iteration history, and sums of squares LOGLINEAR (For a full description, see www.spss.com/advanced_models) HILOGLINEAR Hierarchical loglinear models for multiway contingency tables Simultaneous entry and backward elimination methods Print: Frequencies and residuals Parameter estimates and partial associations for saturated models
GENLOG Fit loglinear and logit models to count data by means of a generalized linear model approach Model t, using ML estimation under Poisson loglinear model and multinomial loglinear models Accommodate structural zeros Generalized log-odds ratio facility tests the specic generalized log-odds ratios are equal to zero, and can print condence intervals Diagnostic plots: Scatterplots and normal probability plots of residuals Criteria specication: Condence interval, iterations, convergence, Delta, and Epsilon values used as tolerance in checking singularity System requirements SPSS Base 16.0 Other system requirements vary according to platform
ccording to the 2005 AGA Survey of Casinos Entertainment, a record 54.1 million people visited casinos in 2004 nationwide. A large gaming organization needed some key answers to help develop a new market strategy to grow their business. What should the casino look like and what amenities should be featured? n What factors (both gambling and adjunct activities) affect visitation? n What type (prole) of people will the casino attract? n Where will they come from? n What is the predictive estimate of the planned casino users?
n
project, stated Bailey. So he turned to SPSS for Windows and SPSS Advanced Models to help him analyze the data. We created key drivers and factors to determine the viability of the project, and specic suggestions regarding activities beyond gaming, that would encourage visits to the casino, said Bailey. This guidance helped his client design the optimal oor plan and entertainment mix for the casino. Hundreds of pages of numbers and tables were generated from the data. Bailey and his team created dozens of perceptual maps with SPSS. From this data they were able to make their ndings more understandable for the client.
The client very favorably received our presentation, partly because SPSS allowed us to give them answers in an easy-tounderstand format. Even though this was such an enormous job, I knew SPSS for Windows with SPSS Advanced Models could easily handle the project, said Bailey.
To assist it in getting the answers to these questions, the organization hired principal statistician William Bailey from WMB & Associates. This was a huge, complex
16
One-size-fits-all education
probably not the best way to maximize your software investment
SPSS Education Services offers training to meet your specic needs and help you get long-term value from your analytics solutions
Predictive analytics software from SPSS can help you use your data to competitive advantage. The right education will ensure you get the highest return on your software investment. We understand that not everyones educational needs are the same. So we offer programs to suit every skill level and learning style. Choose from beginning to advanced courses in almost every product line. Or let us analyze your needs and design a custom learning plan for your environment. Learn online or in the classroom, on-site or at one of our locations worldwide. Our expert Education Consultants will show you how to tap into the power and functionality of SPSS products. So you can gain maximum value from your software. SPSS education offers: n 60+ public courses, available at more than 20 locations worldwide n Education needs analysis to identify your organizations education requirements n Private education fully customized to your needs n On-site education at your location n One-to-one, in-person education n Online instruction for individuals and groups n Web-based, on-demand learning
Visit www.spss.com/training today for course descriptions, dates, times, and locations. Visit www.spss.com/training/special.htm to check out our many money-saving education discounts.
17
Specications
PREFSCAL Multidimensional unfolding analysis Read one or more rectangular matrices of proximities Read weights, initial congurations, and xed coordinates Optionally transform proximities with linear, ordinal, smooth ordinal, or spline functions Specify multidimensional unfolding with identity, weighted Euclidean, or generalized Euclidean models Specify xed row and column coordinates to restrict the conguration Specify initial conguration (classical triangle, classical Spearman, Ross-Cliff, correspondence, centroids, random starts, or custom), iteration criteria, and penalty parameters Specify plots for multiple starts, initial common space, stress per dimension, nal common space, space weights, individual spaces, scatterplot of t, residuals plot, transformation plots, and Shepard plots Specify output that includes the input data, multiple starts, initial common space, iteration history, t measures, stress decomposition, nal common space, space weights, individual spaces, tted distances, and transformed proximities Write common space coordinates, individual weights, distances, and transformed proximities to a le PROXSCAL Statistics: Iteration history, stress measures, stress decomposition, coordinates of the common space, object distances within the nal conguration, individual space weights, individual spaces, transformed proximities, and transformed independent variables Plots: Stress plots, common space scatterplots, individual space weight scatterplots, individual spaces scatterplots, transformation plots, Shepard residual plots, independent variables transformation plots, and correlations plots CATPCA Statistics: Frequencies, missing values, optimal scaling level, mode, variance accounted for by centroid coordinates, vector coordinates, total per variable and per dimension, component loadings for vector-quantied variables, category quantications and coordinates, iteration history, correlations of the transformed variables and eigenvalues of the correlation matrix, correlations of the original variables and eigenvalues of the correlation matrix, and object scores
With SPSS Categories sophisticated procedures in your toolbox, you are no longer hampered by categorical or highly-dimensional data. These techniques ensure you have all the tools you need to easily analyze and interpret your multivariate data and its relationships more completely.
The data are a 2x5x6 table containing information on two genders, ve age groups, and six products. This plot shows the results of a twodimensional multiple correspondence analysis of the table. Notice that products such as A and B are chosen at younger ages and by males, while products such as G and C are preferred at older ages.
Plots: Joint category plots, transformation plots, residual plots, projected centroid plots, object plots, biplots, triplots, and component loadings plots
CORRESPONDENCE Statistics: Correspondence measures; row and column proles; singular values; row and column scores; inertia, mass, row, and column score condence statistics; and singular value condence statistics Plots: Transformation plots, row point plots, column point plots, and biplots CATREG Statistics: Frequencies, regression coefcients, ANOVA table, iteration history, and category quantications Plots: Correlations between untransformed predictors, correlations between transformed predictors, residual plots, and transformation plots
MULTIPLE CORRESPONDENCE Statistics: Model summary; history statistics, descriptive statistics; discrimination measures; category quantications; inertia of the categories; contribution of the categories to the inertia of the dimensions and contribution of the dimensions to the inertia of the categories; iteration history, correlations of the transformed variables, and the eigenvalues of this correlation matrix; correlations of the original variables and the eigenvalues of this correlation matrix; and object scores Plots: Object points, category points (centroid coordinates), discrimination measures, transformation (optimal category quantications against category indicators), residuals per variable, objects and variables (centroids), and joint plot of the category points for the variables in the varlist
OVERALS Statistics: Frequencies, centroids, iteration history, object scores, category quantications, weights, component loadings, and single and multiple t Plots: Object scores plots, category coordinates plots, component loadings plots, category centroids plots, and transformation plots System requirements SPSS 16.0 Other system requirements vary according to platform
18
Unleash the full potential of your data through optimal scaling and dimension reduction techniques
Correspondence analysis (CORRESPONDENCE): Describe the relationships between two nominal variables in a low-dimensional space, while simultaneously describing the relationships between categories for each variable. Categorical regression (CATREG): Predict the values of a categorical dependent variable from a combination of categorical independent variables. Multiple correspondence analysis (MULTIPLE CORRESPONDENCE): Analyze a categorical multivariate data matrix when all the variables are analyzed at the nominal level. Similar to correspondence analysis except it doesnt limit you to only two variables. Categorical Principal Components Analysis (CATPCA): Use alternating least squares to generalize principal components analysis to accommodate variables of mixed measurement levels. Specify a transformation type of nominal, ordinal, or numeric on a variable-by-variable basis. Nonlinear canonical correlation (OVERALS): Use alternating least squares to generalize canonical correlation analysis. It allows more than one set of variables to be compared to one another on the same graph. Proximity scaling (PROXSCAL): Takes a matrix of similarity and dissimilarity distances between observations in a high-dimensional space and assigns them to a position in a low-dimensional space in order for you to gain spatial understanding of how objects relate. Preference scaling (PREFSCAL): Set up the Preference Scaling procedure (PREFSCAL) in syntax to perform multidimensional unfolding on two sets of objects in order to nd a common quantitative scale.
Use correspondence analysis to easily display and analyze differences between categories. In this example, researchers present and analyze the two-dimension table relating staff group to smoking category in a particular workplace.
Easily incorporate supplementary information on additional variables in SPSSCategories. Here, additional information on alcohol consumption by staff group is known. These additional columns can be projected into the staff group by smoking category space.
Visualize how rows and columns of large tables of counts or means relate Determine how closely customers perceive your products to others in your offering set or your competitors Understand what characteristics consumers relate to most regarding product or brand Work with and understand ordinal and nominal data with procedures similar to conventional regression, principal components, and canonical correlation Perform regression analysis with a categorical dependent variable
19
Specications
Key features Create tree-based classication models for: Segmentation Stratication Prediction Data reduction and variable screening Interaction identication Category merging and discretizing continuous variables Classify cases into groups or predict values of a dependent (target) variable based on values of independent (predictor) variables Validation tools for exploratory and conrmatory classication analysis View nodes using one of several ways: Show bar charts of your target variables, tables, or both in each node Collapse and expand branches without deleting the model Generate syntax automatically from the UI Re-run tree building using syntax in production mode Score data based on results or use results in further analysis using other SPSS procedures Algorithms Four powerful tree-modeling algorithms: CHAID by Kass (1980) Exhaustive CHAID by Biggs, de Ville and Suen (1991) Classication & Regression Trees (C&RT) by Breiman, Friedman, Olshen and Stone (1984) QUEST by Loh and Shih (1997) Evaluation Evaluation graphs enable visual representation of gains summary tables Misclassication functionality Gains chart: Identify segments by highest (and lowest) contribution Deployment Export output objects to any of SPSS available output formats Generate rules that dene selected segments in SQL to score databases or SPSS syntax to score SPSS les Export XML models to score cases using the scoring engine feature in SPSS Server (version 13.0 and higher) System requirements SPSS Base 16.0 Other system requirements vary according to platform
SPSS Classication Trees creates classication and decision trees directly within SPSS to identify groups, discover relationships, and predict future events. By creating visual trees, you are able to present results in an intuitive mannerso you can more clearly explain results to non-technical audiences.
Access directly within SPSSyou never leave the SPSS environment Identify groups, segments, and patterns in a highly visual manner with classication trees Choose from CHAID, Exhaustive CHAID, C&RT, and QUEST to nd the best t for your data Present results in an intuitive manner perfect for non-technical audiences Save information from trees as new variables in data (information such as terminal node number, predicted value, and predicted probabilities).
20
ith more than one million donors, UNICEF Germany nances its projects to help children in less-developed countries through donations and sales of greeting cards. Faced with an increasingly tight budget and a continuously growing number of competitors, UNICEF turned to Ogilvy & Mather Dataconsult (O&MDC) to better segment its target donors. The goal was to uncover particular donor characteristics to maximize returns, said Matthias Singer-Fischer, senior consultant with Ogilvy & Mather Dataconsult. Using ve years of historical data from the UNICEF database, O&MDC gathered approximately 30 variables, including standard demographics, donation frequency, date of last donation, sum of all recent donations, and several variables specic to UNICEF, such as preferred causes.
Because of the host of variables and different scale levels, O&MDC used a Chi-square-based (CHAID) segmentation method. The CHAID procedure is currently one of four powerful decision-tree algorithms in SPSS Classication Trees. The results gave UNICEF Germany a clearer understanding of its donors and they were able to: n Increase direct mail response rates up to 80 percent n Raise the return on investment more than 65 percent n Target donors who are partial to the featured topic, thus ensuring a better distribution among UNICEF donors n Decrease mailing volume dramatically without affecting revenue from donations
Segment and group cases directly within the data Since you are creating classication trees directly within SPSS, you can use the results to segment and group cases directly within the datawithout ever leaving the SPSS environment. Additionally, you can generate selection or classication/prediction rules in the form of SPSS syntax, SQL statements, or simple text. Display these rules in the Viewer and save them to an external le for later use to make predictions about individual and new cases.
Create tree models in SPSS using CHAID, Exhaustive CHAID, C&RT, or QUEST.
Directly select cases or assign predictions in SPSS from the model results, or export rules for later use.
21
Specications
Key features Complex Samples Plan (CSPLAN): Provides a common place to specify the sampling frame to create a complex sample design or analysis design used by procedures in SPSS Complex Samples. To sample cases, use a sample design created by CSPLAN as input to the CSSELECT procedure. To analyze sample data, use an analysis design created by CSPLAN as input to the CSDESCRIPTIVES, CSTABULATE, CSGLM, CSLOGISTIC, CSORDINAL, or CSCOXREG procedures. Complex Samples Selection (CSSELECT): CSSELECT selects complex, probability-based samples from a population. It chooses units according to a sample design created through the CSPLAN procedure. Complex Samples Descriptives (CSDESCRIPTIVES): CSDESCRIPTIVES estimates means, sums, and ratios, and computes their standard errors, design effects, condence intervals, and hypothesis tests for samples drawn by complex sampling methods. Complex Samples Tabulate (CSTABULATE): CSTABULATE displays one-way frequency tables or two-way crosstabulations and associated standard errors, design effects, condence intervals, and hypothesis tests for samples drawn by complex sampling methods. Complex Samples General Linear Model (CSGLM): Enables you to build linear regression, analysis of variance (ANOVA), and analysis of covariance (ANCOVA) models for samples drawn using complex sampling methods. Complex Samples Ordinal (CSORDINAL): CSORDINAL performs regression analysis on a binary or ordinal polytomous dependent variable using the selected cumulative link function for samples drawn by complex sampling methods. Complex Samples Logistic Regression (CSLOGISTIC): Performs binary logistic regression analysis, as well as multinomial logistic regression analysis, for samples drawn by complex sampling methods. Complex Samples Cox Regression (CSCOXREG): Applies Cox proportional hazards regression to analysis of survival timesthat is, the length of time before the occurrence of an event for samples drawn by complex sampling methods.
If youre working with complex sample designs, such as stratied, clustered, or multistage sampling, you need specialized statistical techniques to account for the sample design and its associated standard error. The SPSS Complex Samples add-on module for SPSS gives you everything you need for working with complex samplesfrom the planning stage and sampling through to the analysis stage.
Get a clearer view of what your data holds with complex sampling
marketing manager wants to know whether big-ticket customers (organizations that spend more than $100,000) are more satised than smaller-value customers. He initially decided to survey the customer database and build a model to predict customer satisfaction. But he soon realized that implementing a survey to the entire database would not be cost effective. Instead, he chose 1,000 customers to be the sample of the population. If a simple random sample of 1,000 customers was pulled, not enough big-ticket customers would be provided to build a reliable model, as big-ticket customers are rare. Due to the variability of characteristics, it is necessary to apply scientic sample designs in the sample selection process to reduce the risk of a distorted view of the population. To build an accurate predictive model for the complex sample, the manager used SPSS Complex Samples. It was necessary to stratify the customers into big-ticket and smaller-value customer groups. Then he drew a random sample of 500 within each of these groups to conduct the survey.
SPSS Complex Samples enabled the marketing manager to produce a more accurate model to predict customer satisfaction. As a result, the organization was able to make better business decisions.
22
Use the following types of sample design information with SPSS Complex Samples: n Stratied sampling: Increase the precision of your sample or ensure a representative sample from key groups by choosing to sample within subgroups of the survey population. For example, subgroups might be a specic number of males or females or contain people in certain job categories, people of a certain age group, and so on. n Clustered sampling: Select clusters, which are groups of sampling units, for your survey. Clusters can include schools, hospitals or geographic areas with sampling units that might be students, patients, or citizens. Clustering often helps make surveys more cost-effective. n Multistage sampling: Select an initial or rst-stage sample based on groups of elements in the population; then create a second-stage sample by drawing a sub-sample from each selected unit in the rst-stage sample. By repeating this option, you can select a higher-stage sample. For example, in a face-to-face survey, you might sample individuals within households and city blocks.
The accurate analysis of survey data is easy in SPSS Complex Samples. Start with one of the wizards (which one depends on your data source) and then use the interactive interface to create plans, analyze data, and interpret results.
23
Specications
Key features OrthoplanGenerates orthogonal main effects fractional factorial designs Specify the desired number of cards for the plan Generate holdout cards to test the tted conjoint model Orthoplan can mix the training and holdout cards or can stack the holdout cards after the training cards Save the plan le as an SPSS catch le PlancardsA utility procedure used to produce printed cards for a conjoint experiment; the printed cards are used as stimuli to be sorted, ranked, or rated by the subjects Specify the variables to be used as factors and the order in which their labels are to appear in the output Choose a format Listing-le format Card format Write the cards to an external le or the listing le ConjointPerforms an ordinary least squares analysis of preference or rating data Work with the plan le generated by Plancards or a plan le input Work with individual level rank or rating data Provide individual level and aggregate results Treat the factors in any of a number of ways; conjoint indicates reversals Experimental cards have one of three scenarios: Training, holdout, and simulation Three conjoint simulation methods: Max utility; Bradley-Terry-Luce (BTL); and logit Write utilities to an external le Print results Attribute importance Utility (part-worth) and standard error Graphical indication of most to least preferred levels of each attribute Counts of reversals and reversal summary Pearson R for training and holdout data Kendalls tau for training and holdout data simulation results and simulation summary System requirements SPSS 16.0 Other system requirements vary according to platform
Identify product features important to new customers n Discover which product attributes are most important to current customers n Determine the influence product attributes have on customer preferences
n
Thoroughly understand consumer preferences, tradeoffs, and price sensitivity with SPSS Conjoint. By using conjoint analysis, you can uncover more information about how customers compare products in the marketplace and measure how individual product attributes affect consumer behavior. Armed with this knowledge, you can design, price, and market products and services tailored to your customers needs.
software rm planned to develop some new training programs in addition to their traditional instructorled training. Since many options were available, they decided to perform a conjoint study to evaluate the proposed product. Six key attributes believed to inuence consumer preference were identied: method of delivery, video content, types of examples, certication test, method of asking questions remotely, and price. Four of these attributes have two levels, while the other two have three. The resulting full-factorial design would have 144 alternative product bundles (2 x 2 x 2 x 2 x 3 x 3), making for an unreasonably large study. Using the Orthoplan process in SPSS Conjoint 16.0, the researcher reduced this down to 16 hypothetical product bundles (such as those in Figure 1) while ensuring that they still received all the information needed to perform a complete analysis. These 16 bundles were then printed with Plancards and given to a sample of target users to rank.
24
How does your product rank? SPSS Conjoint provides the tools you need to nd out
SPSS Conjoint offers the procedures you need to plan, implement, and analyze efcient conjoint surveys. With these techniques, you can discover how respondents rank their preferences and product attributes. Orthoplan: Produces an orthogonal array of product attribute combinations, dramatically reducing the number of questions you must ask while ensuring enough information to perform a full analysis. Plancards: Print cards to elicit respondents preferences. Quickly generate cards that respondents can sort to rank alternative products. Conjoint procedure: Get results you can act on, such as which product attributes are important and at what levels they are most preferred. Plus, perform simulations which will tell you expected market shares for alternative products. Add SPSS Conjoint to your competitive research and develop products and services that are more successful.
Rank preferences
Conjoint quickly guides you through creating plan cards that respondents sort to rank their preferences of alternative products.
When these preference rankings are analyzed with SPSS Conjoint, the results shown in Figure 2 are produced. Two attributes stand out as very important inclusion of video and pricewhile the certication test and types of examples are relatively unimportant. The Utility and Factor columns in Figure 2 show the relative preference for each level of each attribute. When asked about question methods, instant messaging is the most preferred and no support is the least preferred.
Generate charts
After you gather and input the data using your plan cards, SPSS Conjoint performs an ordinary least squares analysis of preference or rating data. It then generates charts to simulate expected market shares. For instance, in this chart you can quickly see that subjects strongly consider package design and price as the most important.
Figure 1
Figure 2
25
Specications
Validate data Use the Validate Data procedure to validate data in the working data le Basic checks: Maximum percent of missing values, single category cases, and cases with a count of 1 Minimum coefcient of variation Minimum standard deviation Flag incomplete IDs, duplicate IDs, and empty cases Standard rules: Describe the data, view single variable rules, and apply them to analysis variables Description of data: Distribution: Shows a thumbnail size bar chart for categorical variables or histogram for scale variables Min./max. data values shown Single-variable rules: Apply rules to identify missing or invalid values User-dened Custom rules: Dene cross-variable rule expressions in which respondents answers violate logic Output: Reports for invalid data Casewise report, specify by case Specify the minimum number of violations needed for a case Specify the maximum number of cases in the report Standard validation rules reports Summarize violations by analysis variable and rule Display descriptive statistics Save: Save variables that record rule violations and use them to clean data and lter out bad cases Summary variables: Empty case indicator Duplicate ID indicator Incomplete ID indicator Validation rule violation Indicator variables that record all validation rule violations Identify unusual cases Use the Anomaly Detection procedure to search for unusual cases, based upon deviations from peer group, and reasons for deviations VARIABLES subcommand: Specify categorical, continuous, and ID variables, and list variables that are excluded from the analysis HANDLEMISSING subcommand: Species the methods of handling missing values in this procedure The CRITERIA subcommand species the following settings: Number of peer groups Adjustment weight on the measurement level Number of reasons in the anomaly list Percentage and number of cases considered as anomalies and included in the anomaly list
SPSS Data Preparation enables you to easily identify suspicious and invalid cases, variables, and data values; view patterns of missing data; and summarize variable distributions. You can streamline the data preparation process so that you can get ready for analysis faster and reach more accurate conclusions.
With the Optimal Binning procedure, you can more accurately use algorithms designed for nominal attributes (such as Nave Bayes and logit models). Optimal Binning enables you to binor set cutpoints forscale variables. Select from three types of optimal binning for preprocessing data prior to model building: n Unsupervised: Create bins with equal counts n Supervised: Take the target variable into account to determine cutpoints. This method is more accurate than unsupervised; however, it is also more computationally intensive. n Hybrid approach: Combines the unsupervised and supervised approaches. This method is particularly useful if you have a large amount of distinct values.
Quickly identify invalid cases so you can inspect them prior to analysis Eliminate labor-intensive manual checks by performing automatic data checks based on each variables measure level, categorical or continuous Prevent outliers from skewing a predictive model using anomaly detection prior to model building Quickly pre-process your data to get it ready for analysis
26
Specications continued Identify suspicious or invalid cases, variables, and data values easily with SPSS Data Preparation
Cutpoint of the anomaly index to determine whether a case is considered as an anomaly Save additional variables to the working data le including: Anomaly index Peer group ID, size, and size in percentage Variable, variable impact measure, variable value, and norm value associated with a reason OUTFILE subcommand: Write a model to a lename as XML PRINT subcommand prints: Case-processing summary Anomaly index list, anomaly peer ID list, and anomaly reason list The Continuous Variable Norms table, for continuous variable, and the Categorical Variable Norms, for categorical variable Anomaly Index Summary Reason Summary Table Optimal Binning Preprocess data with Optimal Binning. Categorizes one or more continuous variables by distributing the values of each into bins. Select from the following methods: Unsupervised binning via the equal frequency algorithm. It uses the equal frequency algorithm to discretize the binning input variables. Guide variable not required. Supervised binning via the MDLP (Minimal Description Length Principle) algorithm. Discretizes binning input variables using the MDLP algorithm without any preprocessing. Ideal for small datasets. Guide variable required. Hybrid MDLP binning. Involves preprocessing via the equal frequency algorithm, followed by the MDLP algorithm. Ideal for large datasets. Guide variable required. Specify the following criteria: How to dene the minimum and maximum cutpoint for each binning input variable, and the lower limit of an interval Whether to force-merge sparsely populated bins Whether missing values uses listwise or pairwise deletion Save new variables with binned values and syntax to an SPSS syntax le PRINT subcommand prints: The binning input variables cutpoint sets Descriptive information for all binning input variables Model entropy for binned variables System requirements SPSS 16.0 Other system requirements vary according to platform
Standard rules: Apply rules to individual variables that identify invalid values such as values outside a valid range or missing values. Save: You can also save variables that record rule violations and can be used to help you clean your data and lter out bad cases. When you press OK, the Validate Data dialog produces reports that summarize invalid values and cases.
Variables tab: The Validate Data dialog is used to validate your data. The Variables tab shows variables in your le. Start by selecting the variables you are interested in and moving them to the Analysis Variables list.
Dene standard rules: The Validate Data dialog lets you create your own rules or apply predened rules.
Basic checks: You can specify basic checks to apply to variables and cases in your le. For example, you can obtain reports that identify variables with a high percentage of missing values or empty cases.
Dene custom rules: Create cross-variable rules in which respondents answers violate logic (such as pregnant males).
27
Specications
Tests and statistics Pearson Chi-square test, Likelihood ratio test, and Fishers Exact test Exact 1-tailed, 2-tailed p-values for 2x2 table Exact 2-tailed p-values for general RxC table Monte Carlo 2-tailed p-value and condence intervals for general RxC table Linear-by-linear association test Exact 1-tailed and 2-tailed p-values and exact point probability Monte Carlo 1-tailed and 2-tailed p-values and CIs Contingency coefcient, Phi, Cramers V, Goodman and Kruskal Tau, Uncertainty coefcientsymmetric or asymmetric, Kappa, Gamma, Kendalls Tau-b and Tau-c, Somers Dsymmetric and asymmetric, Pearsons R, and Spearman correlation Exact 2-tailed p-value Monte Carlo 2-tailed p-value and CIs McNemar test Exact 1-tailed and 2-tailed p-value and point probability Sign test and Wilcoxon signed-rank test Exact 1-tailed and 2-tailed p-value and point probability Monte Carlo 1-tailed and 2-tailed pvalues and CIs Marginal homogeneity test Asymptotic, exact, Monte Carlo 1-tailed and two 2-tailed p-values, and point probability 2-Sample Kolmogorov-Smirnov test Exact 2-tailed p-value, and point probability Monte Carlo 2-tailed p-values and CIs Mann-Whitney U or Wilcoxon Rank-sum W test Exact 2-tailed p-value and exact 1-tailed p-value, and point probability Monte Carlo 1-tailed and 2-tailed pvalue and CIs Wald-Wolfowitz Runs test Exact 1-tailed p-value and point probability Monte Carlo 1-tailed p-value and CIs Jonckheere-Terpstra test Asymptotic, exact, Monte Carlo 1-tailed and 2-tailed p-values, and point probability System requirements SPSS 16.0 Platform: Microsoft Windows
SPSS Exact Tests ensures youll always have the right statistical test for your data. And because SPSS Exact Tests is part of the SPSS integrated product line, you can count on beginningto-end solutions for your modeling and data analysis needs.
SPSS Exact Tests is the add-on module to turn to when youd like to analyze rare occurrences in large databases or work more accurately with small samples. With more than 30 exact tests, youll be able to analyze your data where traditional tests fail. With SPSS Exact Tests, you can use smaller sample sizes and be condent of your results. SPSS Exact Tests has the tests and statistics you need, including: n Exact p-values n Monte Carlo p-values n Pearson Chi-squared test n Linear-by-linear association test n Contingency coefcient n Uncertainty coefcientsymmetric or asymmetric n Wilcoxon signed-rank test n Cochrans Q test n Binomial test
n n
Operating with a small number of cases Working with variables that have a high percentage of response in one category Subsetting your data into ne breakdowns Searching for rare occurrences in large dataset.
SPSS Exact Testslls the deciencies that many of the major statistics programs have in the area of nonparametric inference and hypothesis tests... Vincent C. Arena, Ph.D
University of Pittsburgh, Graduate School of Public Health Department of Biostatistics
In this example, even though there are only 10 cases, SPSS Exact Tests helps you determine that a signicant relationship exists.
28
We needed to drill down into the organization to determine where targets of intervention needed to be implemented to raise our overall level of customer orientation. Without the factor analysis feature in SPSS Advanced Models, I could have never developed a valid survey instrument, tested its reliability (coefcient alphas), etc. Also my degree of t for the model that I developed would have been impossible to ascertain without Amos! We obtained immediate results. I tested the data on our entire salaried workforce and found statistically signicant differences between various groups where I targeted specic interventions via training to raise the respective groups levels of customer orientation.
Dr. Dean Bartles Vice President and General Manager (Fortune 100 Company) St. Petersburg, FL
We respond to over 40,000 emergency medical service calls annually. Using SPSS, I was able to very quickly extract the data points needed from our patient care report (PCR) database and then compare medical care provided vs. patient outcomes for time critical medical conditions for a report that went out to all of our 31 re stations. I then monitored the same parameters three months later and saw behavior had been modied and patient care had improved.
Stew McGehee Fire Department Battalion Chief CA
Most traditional statistics courses still use formulas and small articial data sets. With SPSS, I can use large and realistic data sets, case studies, etc. to give the students experience with real-life situations. Since people in the workplace rarely use formulas, but have statistical software available, this is a more realistic preparation for the future. Using SPSS is a much better way to prepare students for the future.
Barbara Rose
29
Create higher-value data and build better models when you estimate missing data
With SPSS Missing Value Analysis you can:
Overcome missing data issues Reach more statistically significant results when taking missing values into account n Determine the extent of missing data quickly
n n
Even in the best-designed and monitored study or survey, observations can be missinga person inadvertently skips a question, a sample or response is illegible or there are technical malfunctions. SPSS Missing Value Analysis allows you to quickly and easily diagnose your missing data and ll in the blanks to create higher-value data which result in better models. When you ignore or exclude missing data, you risk reaching invalid and insignicant results. Make SPSS Missing Value Analysis a part of your data management and preparation step, and youll enter the data analysis stage using data that takes missing values into account.
Dont risk invalid results! With SPSS Missing Value Analysis 16.0, you can: n Diagnose if you have a serious missing data problem n Replace missing values with estimates n Ensure you enter the data analysis stage using data that takes missing values into account n Improve survey questionsidentify possibly troublesome or confusing questions, based on observed missing data patterns n Draw more valid conclusions and remove hidden bias from your data
consumer goods companys primary source of customer information is a few survey questions on the warranty card returned by customers. The survey collects data on age, occupation, gender, marital status, family size, and income. The marketing department analyzes the data to better understand the demographics of their customer base to more effectively target promotions. They used SPSS Missing Value Analysis to investigate the extent of missing data. First, they produced a summary table, which gave an overview of the responses for each question. The question with the highest rate of missing data is income (34%). Further investigation reveals that 31 customers did not report occupation and income. Missing data appears to be a potential problem.
SPSS Missing Value Analysis offers two methods for maximum likelihood estimation and imputationEM (Expectation-Maximization) algorithm and regression. Both of these highly sophisticated methods can be used to inspect the results. When used by the company, these methods produced a different response than the data with missing values. Compare the results to the data with missing values: Incomplete data (calculated with missing values) n Largest segment: 38% married women n Income range: $18,000$42,000 n Occupation: any
30
Draw more valid conclusions with SPSS Missing Value Analysis Draw
more valid conclusions with SPSS Value Analysis Its easy for you to evaluate the effect of missing data, especiallyMissing with
small datasets, because SPSS Missing Value Analysis offers the features and benets below. Six tailor-made displays: Examine data from several angles using six diagnostic reports to uncover missing data patterns. Diagnose missing data problems: Get a case-by-case overview of your data with the data patterns report. By giving you a snapshot of each type of missing value and any extreme values for each case, you can determine the extent of missing data quickly. Better summary statistics: Adjust for missing values so you get a more accurate description of your data. Choose from four methods: listwise deletion, pairwise deletion, EM, and covariance matrix. Fill in missing data: Easily replace missing values with estimates and increase your chance of reaching statistically signicant results. Choose from the EM and regression algorithms to predict missing values based on data you already have.
Next, explore patterns of missing data. Here, the missing value patterns show a large discrepancy in GDP. The GDP per capita is $3,108 for the 59 countries with no missing data and $16,554 for the 15 counties where Literacy_Male and Literacy_Female are missing.
Estimate means, standard deviations, covariances, and correlations using listwise (complete cases only), pairwise, EM, and/or regression methodsso that all cases are represented in your analysis. Here, the estimated means using the four estimation methods are shown in a summary table.
First, investigate where the missing values are located and how extensive they are. This summary table shows 31.2% of the cases are missing daily calorie intake values.
Complete data n Largest segment: 46% married women n Income range: $17,000$39,000 n Occupation: non-professional While these differences may appear subtle, they translate into signicant cost savings. Consider the case where the marketing department purchases a list for a direct mail promotion based on these target market demographics. More people outside the target market are mailed to, lowering the response rate. Overall, the direct mail campaign based on complete data is more protable. Mailing Cost per piece: $1 100,000 names purchased Total cost for mailing: $100,000
Incomplete data Response rate: 2% Total number of responses: 2,000 Closed sales: 1,000 Average revenue per sale: $100 Total revenue: $100,000 Prot: $0 Complete data Response rate: 4% Total number of responses: 4,000 Closed sales: 2,000 Average revenue per sale: $100 Total revenue: $200,000 Prot: $100,000
31
Specications
Multinomial logistic regression Control the values of the algorithmtuning parameters Include interaction terms Customize hypotheses by directly specifying null hypotheses as linear combinations of parameters Specify a dispersion scaling value Build equations with or without a constant Use a condence interval for odds ratios Save the following statistics: predicted probability, predicted response category, probability of the predicted response category, and probability of the actual response category Find the best predictor from dozens of possible predictors using stepwise functionality Use Score and Wald methods to quickly reach results with a large number of predictors Assess model t Diagnostics for the classication table: Binary logistic regression (BLR) Forward/backward stepwise and forced entry modeling Transform categorical variables by using deviation contrasts, simple comparison, difference (reverse Helmert) contrasts, Helmert contrasts, polynomial contrasts, comparison of adjacent categories, user-dened contrasts, or indicator variables Criteria for model building: probability of score statistic for entry, probability of Wald, or likelihood ratio statistic for removal Save the following statistics: Predicted probability and group, residuals, deviance values, logit, Studentized and standardized residuals, leverage value, analog of Cooks inuence statistic, and difference in Beta Export the model using XML Constrained nonlinear regression Save predicted values, residuals, and derivatives Choose numerical or user-specied derivatives Nonlinear regression (NLR) Specify loss function options Use bootstrap estimates of standard errors Weighted least squares (WLS) Calculate weights based on source variable and Delta values or apply from an existing series Output for calculated weights: Loglikelihood functions for each value of Delta; R, R2, adjusted R2, standard errors, analysis of variance, and t tests of individual coefcient for Delta value with maximized log-likelihood function Display output in pivot tables
SPSS Regression Models gives you an even wider range of statistics so you can get the most accurate response for specic data types. Do you build predictive models but nd ordinary least squares regression too limiting? If so, SPSS Regression Models can make your life easier.
Market research: Study consumer buying habits Medical research: Study response to dosages Loan assessment: Analyze good and bad credit risks Institutional research: Measure academic achievement tests
n Use Score and Wald methods for a faster and more accurate conclusion for variable selection Apply a highly scalable, high-performance algorithm to handle big datasets Save time by specifying the reference category in your outcome variable in the user interface. You no longer need to recode the dependent variable set up in the desired reference category. Use Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) to better assess model t n Binary logistic regression (BLR): Predict dichotomous variables such as buy or not buy, vote or not vote. This procedure offers many stepwise methods to select the main and interaction effects that best predict your response variable. n Nonlinear regression (NLR) and constrained nonlinear regression (CNLR): Get control over your model and your model expression. These procedures give you two methods for estimating parameters of non-linear models. n Weighted least square regression (WLS): Give more weight to measurements within a series n Probit analysis (PROBIT): Analyze potency of responses to stimuli, such as medicine doses, prices, or incentives. Probit evaluates the value of the stimuli using a logit or probit transformation of the proportion responding.
You need to predict a categorical outcome The relationship between the outcome and a set of predictors is thought to be nonlinear Your data violates the assumptions of linear regression
Two-stage least squares (2SLS) Structural equations and instrumental variables Control for correlations between predictor variables and error terms Display output in pivot tables Probit Transform predictors: Base 10, natural, or user-specied base
Natural response rate estimates or specied Algorithm control parameters: Convergence, iteration limit, and heterogeneity criterion probability Statistics: Frequencies, ducial condence intervals, relative median potency, test of parallelism, plots of observed probits, or logits Display output in pivot tables
System requirements SPSS 16.0 Other system requirements vary according to platform
32
Logistic regression helps target the right customers and increase prots
mail-order bookstore is seeking to increase the average dollar value of its orders. By increasing the size of each order, the bookseller can increase prots signicantlyespecially since their xed costs are covered by their current product revenue. Since mailing to their entire customer database is costly, the bookseller wants to identify customers most likely to accept the offer. This way they can target future promotions specically to these (high-value) customers and earn a higher prot while reducing promotional costs. They did this by taking a closer look at customers past transaction history and identifying if they responded to past promotions driving a highvalue purchase. To identify potential high-value customers, the bookseller used the binary logistic regression technique to model customer response to the promotion. They found the following variables to have signicant value for predicting which customers were more likely to respond to the promotion: n Total dollars spent on products in the previous year n Items purchased for children in the past year (none, under $30, more than $30)
Number of best sellers purchased in the past three months (none, one, or more than one)
These variables formed the basis for a model that the bookseller used to score each customer in its database for likelihood to respond to a promotion. The bookseller used these scores to rank customers and identify the best prospects for the promotion. Then they could target a mailing to these individuals. In addition, management was able to review the ndings and discovered that customers who made purchases in the childrens category and those who purchased several best sellers in a short period of time were particularly good prospects for large purchases. By gaining better insight into their customers behaviors and needs, they were able to more accurately plan and target future ads and promotions, increase sales and reduce marketing spend.
Use the binary logistic regression procedure to test the impact of service offers (new or expanded) on customer satisfaction. The classication table indicates that this model can correctly predict satisfaction with 90% accuracy.
Predict the presence or absence of a characteristic/outcome based on values of a set of predictor variables. In this example, a wireless telephone service provider is interested in identifying dissatised customers so they can intervene before they defect and switch to a competitor.
33
Specications
Graphical user interface Simple drag-and-drop table builder interface allows you to preview tables as you select variables and options Single, unied table builder instead of multiple menu choices and dialog boxes for different table types Control contents Create tables with up to three display dimensions: Rows (stub), columns (banner), and layers Nest variables to any level in all dimensions Crosstabulate multiple independent variables in the same table Display frequencies for multiple variables side-by-side with tables of frequencies Display all categories when multiple variables are included in a table even if a variable has a category without responses Display multiple statistics in rows, columns, or layers Place totals in any row, column, or layer Create subtotals for subsets of categories of a categorical variable Custom control over category display order and ability to selectively show or hide categories Statistics Select from over 40 summary statistics Calculate statistics for each cell, subgroup, or table Calculate percentages at any or all levels for nested variables Calculate counts and percentages for multiple-response variables based on the number of responses or the number of cases Select percentage bases for missing values to include or exclude missing responses Formatting controls Sort categories by any summary statistic in the table Hide the categories that make up subtotalsremove a category from the table without removing it from the subtotal calculation Directly edit any table element, including formatting and labels Sort tables by cell contents in ascending or descending order Automatically display labels instead of coded values Specify minimum and maximum width of table columns (overrides TableLooks) Show a name, label, or both for each table variable Display missing data as blank, zero, ., or any other user-dened term Add titles and captions
Quickly create tables with a drag-and-drop interface n Preview tables as you create them to get it right the first time n Customize tables to make it easier for your audience to understand
n
Automate frequent reports: Run large production jobs and complex table structures with ease to automatically build similar tables with new data
SPSS Tables enables you to turn your analysis into high-quality tabular reports. Easily display your data analysis in presentation-quality, production-ready tables. With SPSS Tables, you have the features you need to easily create and work with tabular reports: n Preview tables as you build them: Preview your table as you select variables and table options with a simple drag-and-drop interface, and take the guesswork out of table building n Control your table output: Choose from a variety of formats to represent multi-way information in a two-way table and generate the view you want n Customize your table structure: Exclude specic categories, display missing value cells, and add subtotals to your table n Get in-depth analyses: Run Chi-square, column proportions, and column means tests, and add more insight to your tables to identify differences, changes, or trends in your data
Preview tables as you build them with SPSS Tables drag-and-drop capabilities and a preview pane Display information the way you want with SPSS Tables category management features Give your readers reports that let them dig into the informationand make more informed decisionswith SPSS Tables inferential statistics Share results more easily with interactive pivot tablesquickly export to Word or Excel Save time and effort by automating frequent reports using SPSS Tables syntax and automation in production mode
Output as SPSS pivot tables Specify corner labels Customize labels for statistics Display the entire label for variables, values, and statistics Choose from a variety of numerical formats Apply pre-formatted TableLooks Dene the set of variables that are related to multiple response data and save it with your data denition for subsequent analysis Accepts both long- and short-string elementary variables Imposes no limit on the number of sets that can be dened or the number of variables that can exist in a set
Tests of signicance: Chi-square Column means Column proportions Exclude categories from signicance tests Signicance tests for multiple response variables
System requirements SPSS 16.0 Other system requirements vary according to platform
Syntax and printing formats Simpler, easy-to-understand syntax Syntax converter (for upgrade users) Specify page layout: Top, bottom, left and right margins, and page length Use the global break command to produce a table for each value of a variable when the variable is used in a series of tables
34
atthew Liao-Troth, assistant professor of management, has to juggle a heavy research workload with teaching. But Liao-Troth has found that he can maximize his research time and cover more ground in the classroom by using SPSS and the SPSS Tables module. Liao-Troth describes his experience working with SPSS Tables. really noticed a difference when I added the SPSS Tables Imodule to SPSS. It is so easy to usewhat used to take hours, now takes me less than ve minutes. SPSS Tables has not only helped with my regular research, but its made a big difference in how I teach. example, last quarter I was demonstrating the gender For effects on salary negotiations in one of my core business classes. Thanks to SPSS and the SPSS Tables module, in a one-and-a-half-hour class I was able to run the simulation, enter the data, put it into tables, create graphs, and run the statistical tests. Instead of describing to students what the tables might look like hypothetically, we were able to discuss the actual outcomes and concentrate on what we could learn from the statistical tests.
Start with your SPSS data. SPSS Tables works seamlessly in the SPSS environment so you can easily turn your data into presentation-quality tables in no time.
Drag and drop your variables on the table preview builder and see what your table looks like as you create it.
Add summary statistics, inferential stats, and subtotals to make it easier to understand results.
Add additional variables by dragging and placing them where you want. Once all your variables are in place, push the OK button to create your nal table. Apply the optional TableLooks for a more polished appearance.
35
Specications
TSMODEL Model a set of time-series variables by using the Expert Modeler or specifying the structure of an ARIMA or exponential smoothing model Allow Expert Modeler to select the best tting predictor variables and models Limit search space to only ARIMA or only exponential smoothing models Treat independent variables as events Specify custom ARIMA models Produces maximum likelihood estimates for seasonal and non-seasonal univariate models General or constrained models specied by auto-regressive or moving average order, order of differencing, seasonal autoregressive, or moving average order, and seasonal differencing Two dependent variable transformations: Square root and natural log Automatically detect or specify outliers: Additive, level shift, innovational, transient, seasonal additive, local trend, and additive patch Specify seasonal and nonseasonal numerator, denominator, and difference transfer function orders and transformations for each independent variable Specify custom exponential smoothing models Four non-seasonal model types: Simple, Holts linear trend, Browns linear trend, and damped trend Three seasonal model types: Simple seasonal, Winters additive, and Winters multiplicative Two dependent variable transformations: Square root and natural log Display forecasts, t measures, LjungBox statistic, parameter estimates, and outliers by model Generate tables and plots to compare statistics across all models Eight goodness of t measures available: stationary R2, R2, root mean square error, mean absolute percentage error, mean absolute error, maximum absolute percentage error, maximum absolute error, and normalized BIC Tables and plots of residual autocorrelation function (ACF) and partial autocorrelation function (PACF) Plot observed values, forecasts, t values, condence intervals for forecasts, and condence intervals for t values for each series Filter output to xed number or percentage of best or worst tting models Save predicted values, lower condence limits, upper condence limits, and noise residuals for each series back to the data set Specify forecast period, treatment of user-missing values, and condence intervals
Support time-series analysis n Find the best model for your data using the new Expert Modeler n Apply saved models to what-if scenarios to optimize your decisions
n
Will a change in fees affect the number of new customers we gain? How will tuition increases affect enrollment?
Time-series analysis is the most powerful procedure you can use to analyze historical information, build models, predict trends, and forecast future events. SPSS Trends 16.0 is the best way to quickly create powerful forecasts with condence. With better forecasts, long-term goals can be setwith insight on how to achieve thembased on your organizations past performance and knowledge of your industry. Unlike spreadsheet programs, SPSS Trends has the advanced statistical techniques you need in order to work with time-series data. But you dont need to be an expert statistician to use it. Regardless of your level of experience, you can analyze historical data and predict trends faster, and deliver information in ways that your organizations decision makers can understand and use. SPSS Trends 16.0 will help you nd answers to tough questions: n If I increase my advertising budget, how will it affect sales by product or region? n How will increasing assembly line capacity affect production?
If youre new to building models from time-series data, SPSS Trends helps you by: n Generating reliable models, even if youre not sure how to choose exponential smoothing parameters or ARIMA orders, or how to achieve stationarity n Automatically testing your data for seasonality, intermittency, and missing values, and selecting appropriate models n Detecting outliers and preventing them from inuencing parameter estimates n Generating graphs showing condence intervals and the models goodness of t If youre experienced at forecasting, SPSS Trends allows you to: n Control every parameter when building your data model n Or use SPSS Trends Expert Modeler recommendations as a starting point or to check your work
Develop reliable forecasts quickly, regardless of the size of the dataset or number of variables Update and manage forecasting models efciently Reduce forecasting error by automating appropriate model selection and parameters Gain more control over choices affecting models, parameters, and output Deliver high-resolution graphs and communicate results effectively
TSAPPLY Apply saved models to new or updated data Simultaneously apply models from multiple XML les created with TSMODEL Re-estimate model parameters and goodness of t measures from the data or load from the saved model le Selectively choose saved models to apply Override the periodicity (seasonality) of the active dataset Same output, t measure, statistics, and options as TSMODEL Export re-estimated models to an XML le
SEASON Estimates multiplicative or additive seasonal factors for periodic time series Multiplicative or additive model Moving averages, ratios, seasonal and seasonal adjustment factors, seasonally adjusted series, smoothed trend-cycle components, and irregular components SPECTRA Decomposes a time series into its harmonic components, a set of regular periodic functions at different wavelengths or periods Produces/plots univariate or bivariate periodogram and spectral density estimate Bivariate spectral analysis Smooth periodogram values with weighted moving averages
Spectral data windows available for smoothing: Tukey-Hamming, Tukey, Parzen, Bartlett, equal weight, no smoothing, and user-specied weights High-resolution charts available: Periodogram, spectral and cospectral density estimate, squared coherency, quadrature spectrum estimate, phase spectrum, cross amplitude, and gain
System requirements SPSS 16.0 Other system requirements vary according to platform
36
SPSS Trends 16.0 provides tremendous exibility in creating forecasts with Expert Modeler
Expert Modeler: This feature was specically designed to support forecasting with time-series data. Now you can produce time-series models even if you have little or no experience with time-series data. The Expert Modeler feature enables you to: n Automatically determine the best-tting ARIMA (autoregressive integrated moving average) process exponential smoothing model for your time-series data n Model hundreds of different time series at once, rather than having to run the procedure for one variable at a time n Test your data for seasonality, intermittency, missing values, and outliers Also in SPSS Trends 16.0: n Save models to an XML le so that forecasts can be updated without having to reset parameters or re-estimate the model n Write scripts so that updates can be performed automatically
This screenshot of the time-series modeler shows how it provides you with the ability to model multiple series simultaneously. Because the module presents results in an organized fashion, you can concentrate on the models that need closer examination. This screenshot displaying a forecast for womens apparel shows how you can automatically determine which model best ts your time-series and independent variables.
37
Analyze massive data files faster n Increase productivity n Use capital resources more efficiently
n
large temporary les, which are often associated with time-consuming tasks such as sorting and aggregation.
In addition to analytical tools for the desktop, SPSS Inc. offers you an enterprise-level solution SPSS Server 16.0. With SPSS Server you can help your organization deliver enterprise-strength scalability and enhanced performance. Youll benet from added speed, security, scalability, data centralization, and additional procedures.
Reduce network trafc and improve performance with the data-free client feature Score new data in the scoring engine using previously created models via the interface or syntax Use Feature Selection and Nave Bayes analysis when working with large datasets
Available on the following platforms: Windows n Sun Solaris n Linux n IBM AIX n HP-UX
Process data on the server, rather than on your client machine Access data directly from the server and free up network and desktop resources Ensure security with secure sockets layer (SSL) protocol and by requiring a login and password for server access Increase the tools available for preparing data and creating reports using features unique to the server version of SPSS
irect Wines, the worlds largest mail order wine company, is an experienced database user. However, over time it ultimately found that its existing systems could not cope with the volume and number of mailings. When searching the market, a Direct Wines customer database analyst, said, We were looking for a huge leap in function and greater exibility to do the things we wanted to do without being dictated to by the limitations of the system. We also wanted a system that would be user friendly and not too statistical, so that non-expert users could take advantage of it.
Direct Wines has a server running SPSS Server and three workstations running the client version. SPSS is also used as the data-retrieval system. White stated, We could not have grown as a company without SPSS. He adds, At rst sight, SPSS has an interface with many different statistics options and transformation tools. However, the speed of its transformation and manipulation is staggering, and its ability to store syntax and scripts allows analysis to be repeated many times with ease.
38
in action
See it in SPSS is a free event for SPSS customers, with seminars highlighting the newest SPSS products, plus a complimentary networking lunch.
A coupon for $50 off SPSS training* SPSS Predictive Analytics Orb SPSS T-shirt The SPSS Programming and Data Management book
* With the purchase of SPSS 16.0 or SPSS Text Analysis for Surveys 2.0
39
Why System i?
Information technology, information serviceswhatever the department is called, these professionals play a key role in shaping your organizations strategies and initiatives. Not only do these professionals choose your computer hardware and networking technologies, they also inuence the selection and manage the deployment of key software applications. Thats why many IT organizations nd the System i5 an ideal platform to combine a variety of business applications.
Optimize your use of the hardwares processing and partitioning capabilities n Efficiently design, build, and maintain databases tailored to your business n Make information available widely without compromising security or system performance
n
The ShowCase Suite is a comprehensive software solution for the IBM System i5 (iSeries/ AS/400) computing platform. It provides your organization with a central, reliable source of information, along with analytical tools and the means to deliver information rapidly and cost effectively to hundredseven thousands of decision makers.
Thanks to Query, the sales group can get a daily snapshot to see how close theyre coming to projections. Another benet is that they no longer need to go to IS staff to access sales data, and can generate their own reports via Excel and Access.
David Ashley Business Systems Manager Mississippi Chemical Corporation
40
Role-based data access and security is supported through ShowCase Warehouse Manager, the component that interfaces with System i5 and ERP vendors security controls. With Warehouse Manager, your organization also efciently optimizes system resources.
Data warehousing
ShowCase Warehouse Builder automates and streamlines the process of creating and maintaining relational and multidimensional databases. This helps structure transactional and operational data in ways that make it faster and easier to nd the answers you need and deliver more intelligence from your data.
Information analysis
One of the key technologies for analyzing information is online analytical processing (OLAP). With ShowCase Essbase, your company can build multidimensional databases or cubes that allow you and other decision makers to analyze data from a number of different perspectives. Essbase offers unrivaled speed, analytical sophistication, and scalability. It includes hundreds of built-in calculations to support trending, scenario analysis, and budget development, as well as basic data mining algorithms.
Business users can choose from a variety of interfaces. Financial analysts, for example, can explore and manipulate data using Excel or other multidimensional analysis tools. Sales and marketing managers typically prefer the visual capabilities of ShowCase Analyzer. Analyzer enables them to rank, lter, calculate, and display data in a variety of graphic formats. They dont need to be familiar with database structures or understand programming in order to use Analyzer. It makes it easy for your organizations managers and others to evaluate current conditions, highlight trends, and decide what action to take. And they can do all this through a Web browserwhich can be especially valuable if your company has a widely dispersed workforce.
solutions and allows your organization to leverage data contained in solutions from many other vendors. There are more than 2,200 ShowCase Suite customers worldwide, including small to mid-sized companies, large multinational corporations, not-for-prot organizations, and government agencies. The ShowCase Suite makes companies more competitive in a variety of industries, including retail, wholesale/distribution, manufacturing, consumer packaged goods, transportation, banking, and insurance. For more than 15 years, our company has partnered with IBM to deliver products and solutions that leverage the capabilities of the System i5. Find more details like system requirements, customer stories and other information about ShowCase Suite, and why its the choice for BI on the System i5 at www.BIforSystemi.com
UMA North America, is a major producer of athletic footwear, apparel, and accessories. They have more than 70 independent sales consultants who are required to make important daily business decisions based on data regarding orders, shipments, and product availability. PUMAs data was housed in an ERP system that provided limited reporting capabilities. To view sales data, sales people contacted PUMAs internal database analysts, who sorted through the database, extracted the requested information, and forwarded it to the sales representatives via e-mail. Sales people would then import the information into an application, view it, and make the appropriate decisions. For analysts, the majority of their daily tasks focused on elding requests rather than on proactively managing the database. For sales staff, it took hours to receive the information they needed to make critical business decisions.
PUMA invested in three components of SPSS ShowCase Suite: Enterprise Reporting, Query, and Report Writer. These products allow an unlimited number of employees, business partners, suppliers, and customers to access and share information through a standard browser, as well as create professional-looking printed reports. the past, it would take a day or more to In create a report. Now we can obtain reports in less than an hour, and in some cases as quickly as 10 to 20 minutes. With SPSS, we have completely replaced 80 percent of our ERP system reports! Karen King Database Analyst PUMA
41
Amos 16.0
Take your analysis to the next level
With Amos you can:
Obtain Bayesian estimates of model parameters and other quantities
Bayesian analysis enables you to apply your subject-area expertise or business insight to improve estimates by specifying an informative prior distribution. Markov chain Monte Carlo (MCMC) is the underlying computational method for Bayesian estimation. The MCMC algorithm is fast and the MCMC tuning parameter can be adjusted automatically.
Specications
Modeling capabilities Create structural equation models with observed and latent variables Specify each individual candidate model as a set of equality constraints on model parameters Analyze data from several populations at once Save time by combining factor and regression models into a single model, and then t them simultaneously Bayesian estimation Fit models with ordered-categorical and censored data Markov chain Monte Carlo (MCMC) simulation Computationally intensive modeling Evaluate parameter estimates with normal or non-normal data using powerful bootstrapping options Analytical capabilities and statistical functions Determine probable values for missing or partially missing data values in a latent variable model Use full information maximum likelihood estimation in missing data situations for more efcient and less biased estimates Use a variety of estimation methods, including maximum likelihood, unweighted least squares, generalized least squares, Brownes asymptotically distribution-free criterion, and scalefree least squares Evaluate models using more than two dozen t statistics, including Chi-square; AIC; Bayes and Bozdogan information criteria; Browne-Cudeck (BCC); ECVI, RMSEA, and PCLOSE criteria; root mean square residual; Hoelters critical n; and Bentler-Bonett and Tucker-Lewis indices Data imputation Impute numerical values for orderedcategorical and censored data Impute missing values and latent variable scores Choose from three different methods: Regression, stochastic regression, and Bayesian System requirements Operating system: Microsoft Windows XP or Windows Vista Memory: 256MB RAM minimum Minimum free drive space: 125MB Web browser: Internet Explorer 6
Easily perform structural equation modeling (SEM) n Quickly create models to test hypotheses and confirm relationships among observed and latent variables n Move beyond regression to gain additional insight
n
Perform market segmentation studies Estimate the size of each cluster or segment Perform mixture regression and mixture modeling Perform mixture factor analysis Estimate the probability of group membership for individual cases Train the classication algorithm. Assign some cases to groups ahead of time and allow the program to classify the remaining cases.
n n
Require some model parameters to be equal across groups while allowing other parameters to vary across groups
42
View output
Input data from a variety of le formats (SPSS, Excel, text les, or many others). Select grouping variables and group values. Amos also accepts data in a matrix format if youve computed a correlation or covariance matrix.
5
Use drag-and-drop drawing tools to quickly specify your path diagram model. Click on objects in the path diagram to edit values, such as variable names and parameter values. Or simply drag variable names from the variable list to the object in the path diagram to specify variables in your model.
43
Specications
Key features Context-sensitive help, how-tos, and tips Online tutorials Open multiple surveys or forms at once Status bar Auto-backup, auto-save Form creation Drag-and-drop form design Toolbox with many response options: Text box for long open responses Option button to select one from a list of choices Check box to select all that apply (multiple response) List boxes to easily present a list of options Drop-down lists to save space Combo boxes give space for customized responses (e.g., Other) Matrix to organize similar-style questions in a grid Dene data le and SPSS dictionary as you build the form Dene value labels from response text Dene variable labels from question text Set missing values Dene multiple response sets Enable denition of variable types Long variable names permitted Copy and paste variable properties Drag-and-drop variables to automatically create questions Question Library of over 300 sample questions and responses Automatic question renumbering Spell-checker to catch errors Customizable design with images Annotation text for headings, instructions, or comments Flexible formatting capabilities Property Inspector to control the look of every element of your form, including colors, fonts, size, and borders Efcient entry screens Produce printed surveys Multiple form support in one le Data collection Skip-and-ll rules guide entry Smart navigation features Auto-jump automatically skips to the next eld after the maximum number of characters is entered Go to next case, go to case Find and replace
With SPSS Data Entry you can: n Quickly design paper-based surveys and desktop data entry forms n Eliminate the need to clean and prepare data for analysis n Work directly with SPSS analytical software
SPSS Data Entry products work together to provide an integrated system for collecting and managing survey research data. Reap the rewards of exible, professional-looking survey design, and efcient on-screen entry forms. Plus gain powerful data collection and cleaning capabilitiesand have data thats ready for immediate analysis in SPSS. All SPSS Data Entry products offer complete integration with SPSS so you can move from data collection to analysis in a single step.
The hows and whys of survey research: getting the most value through an effective survey research process
(Excerpt from Step 1: Planning and survey design) Write the questions The key to a successful survey is to ensure that your questions are concise and easy-to-understand. This way, you will get valid and reliable information. No matter how well other features of the survey are designed and executed, poor questions will reduce the value of the data gathered. Use well-written and tested pre-existing questions as much as possible, especially from surveys done in your specic industry or topic area. You can nd well-written questions in question libraries. Some software programs have question libraries built in, which can help guide you through professionally written questions. Keep in mind that no question is usable in every situationso you have to examine the questions for your particular survey research. Pretesting questions is the best method to determine whether a question is correct for your own survey. If you are going to write questions on your own, you might consider taking a training course to learn proven methods for question writing. Design the questionnaire A poorly formatted survey can deter people from responding to your survey. It can also give skewed results. There are two key goals to keep in mind when designing a questionnaire: minimizing measurement error and reducing non-response. Your questionnaire should be constructed so that: n Respondents are motivated to complete it n The questions are all read correctly and thoroughly n Respondents understand how to respond to each question or how to skip, with clear instructions throughout the document n Returning the questionnaire is an easy and straightforward task
44
Specications continued
Data entry verication Powerful data cleaning rules Validation rules, checking rules, and skip-and-ll rules Flexible cleaning methods Interactive rules checking Batch mode rules checking Compare versions verication Checking report by case or rules Rules Wizard for step-by-step rules setup Rules editor with scripting for advanced and customized rules Copy rules between surveys Standard or custom alerts System requirements (SPSS Data Entry Builder only) Microsoft Windows 98, 2000, NT 4.0, Me, XP, or Windows Server Processor: 233MHz or faster Intel Pentium or Pentium-class processor 64MB RAM (128MB recommended) Hard Drive space: 75MB
Begin with speedy form design (1) using preformatted questions (2) from Data Entry Builders Question Librarychoose from more than 300 questions. Or create your own. (3) Enable open-ended responses up to 4,000 characters. Plus, spell-check is available in design mode.
After you design your survey (4) dene the data rules (5) and SPSS Data Entry Builder ensures you collect accurate data (6).
With Data Entry Station, collect cleaner, more accurate data in less timedata entry staff and respondents are guided through relevant questions.
Once your data is collected, its immediately ready for analysis using SPSS.
45
SamplePower 2.0
Save time, effort, and money by identifying the sample size you need
Specications
General features Show power and precision (depends on test) with varied sample sizes, power only, or with varied effect sizes and Alphas Find N for any power Show Cohens effect size conventions for specic tests Working with results Pivot tables New export and print options, including export to Excel Export graphs as WMF, EMF, BMP, and Word or PowerPoint Statistical tests Means One-sample and paired t tests when mean equals zero or equals a specied value Precision t test for two independent groups with variance known and unknown Proportions One-sample test that proportion = 0.50 or specic value 2x2 for independent samples 2x2 for paired samples (McNemar) ANOVA Oneway Analysis of Variance and Analysis of Covariance Regression Templates for study design Error model Logistic regression One continuous predictor or two continuous predictors One categorical predictor with two or more levels Survival analysis Accrual options: Subjects entered prior to rst study interval, subjects entered during study at constant rate, and accrual varies Hazard rate options: Constant and varies Attrition rate options: No attrition, constant rate, and rate varies Equivalence tests Equivalence tests for means and for proportions System requirements Pentium-class processor; Microsoft Windows 95, 98, 2000, or Windows NT 4.0
SamplePower is the front end of a complete line of research and analysis software from SPSS. Whether youre an advanced or beginning statistician or researcher, youll easily identify the appropriate sample sizeevery timefor any research criteria. Strike the right balance among condence level, statistical power, effect size, and sample size using SamplePower.
Simply use the Report tool and a complete report is displayed on the screen. You have the exibility to embed the report in a word processor document and to create convincing proposals.
The Graph tool generates graphs relating power to sample size. Graphs provide a visual aid that help determine how sample size affects power. Even include charts in documents.
46
SamplePowers interactive guide leads you smoothly through your analysis. The guide explains terms and takes you through the steps necessary to determine an effective sample size.
SamplePowers tables and graphs allow you to easily assess how different combinations of your research parameters (such as proposed sample size, Alpha levels, and duration) affect your statistical power.
SamplePowers interactive summary panel gives you concise summaries of power and precision at any point, so you can see how each decision affects your results.
SamplePowers Tool menu provides Cohens effect size conventions, which allow you to determine effect sizes for particular tests by simply clicking on an icon. Cohens effect size provides users with a rule-ofthumb for determining otherwise ambiguous small, medium, and large effect sizes. Plug these effect sizes into the main screen to see how varying the effect size affects power or precision.
The Stored Scenarios tool gives you optimum control over the ow of your research. You can vary Alpha level, power, effect size, or sample size in the main screen and store your results as you continue. This illustration shows how the sample size varies as other settings, such as Alpha, are changed.
I cannot believe the time SamplePower saves and how easy it is to use. Mandel Bellmore, Ph.D.
SamplePowers Find N tool nds the sample size for the default power setting in one click. You also have the exibility to choose different power-size settings to compare results.
President, Block, McGibony, Bellmore & Assoc. Health and Hospital Consultants
47
SPSS created SPSS Text Analysis for Surveys to help you gain full value from text responses without the drudgery and expense associated with manual coding. Specically designed for survey text, SPSS Text Analysis for Surveys is based on our automated natural language processing (NLP) software technologies. Using this software, you can automate the creation of categories and categorization of responses to transform unstructured survey data into quantitative datawithout having to read text responses word-for-word. When you use SPSS Text Analysis for Surveys, you are empowered to gain greater insight from text responses using these capabilities: n Dictionary-based text-extraction technology: This product ships with libraries and resources to automate concept extraction. You can easily customize these libraries by adding topic-specic terms to match your needs.
Proven linguistic technologies: This product is based on SPSS natural language processing (NLP) technologies that enable you to quickly create categories and reliably categorize responses. Create conditional rules to enhance categorization: Use extraction results and Boolean operators to categorize responses based on more complex information and lter erroneous responses. Visualization capabilities aid category renement: Use bar charts, Web graphs, and Web tables to quickly reveal which categories contain co-occurring responses. Then decide whether to combine certain categories or to create new ones that better account for shared responses. Flag responses to monitor coding progress: For example, mark responses as complete or important. These ags can be exported. Reuse and share categories: Export categories for reuse by others in new projects to save time and ensure reliability across the same or similar studies. Export results to SPSS or Excel: Analyze and graph results in other software for use in decision making.
Project Library: Stores dictionary changes for a particular project Core Library: Contains reserved Type Dictionaries for Person, Location, Product, and Organization Budget Library: Contains a built-in type for words or phrases that represent qualiers and adjectives Opinions Library: Contains seven builtin types that group terms for qualiers and adjectives
System requirements Operating system: Microsoft Windows Vista Business and Home Basic (32and 64-bit); Windows XP Professional, Service Pack 2 (32-bit) Hardware: Processor: Intel Pentium-class; 3GHz recommended Monitor: 1024 x 768 (SVGA) resolution
Memory: 256MB RAM minimum; 512MB recommended; 1GB or more for large datasets Minimum free space: 300MB; more recommended for larger datasets
48
Refine categories
Import text responses from a variety of sources, including SPSS, Microsoft Excel, the Dimensions product family, and any ODBC-compliant database program.
Visualization capabilities enable you to quickly see which categories share responses. This can help you to rene categories manually.
Extract key concepts automatically from responses to an open-ended question. The software creates a list of terms, types, and patterns.
A Web graph showing which categories share responses enables the user to decide whether to combine certain categories or to create new ones that better account for shared responses.
5
Simply click the Extract button (lower-left pane) to automatically extract concepts from the text responses. Automatic color coding identies which terms have been extracted and identies their type. Positive items are purple; negative ones are red. The Data pane shows the full text of all responses to the question.
When you are satised with your categories, you can export results either as dichotomies or as categories. These can be used to create tables and graphs, either separately or in combination with other survey data.
Automatically create categories and categorize responses using term derivation, term inclusion, a semantic network, or frequency. Also, categorize responses manually by dragging terms, types, and responses within the interface.
Export results to SPSS to create crosstabs or whatever your analysis requires. Results can also be exported to SPSS to create graphs that communicate survey ndings.
Click the Create Categories based on Linguistics button at upper left to automatically create categories and categorize responses.
49
Clementine 11.1
Harness the power of data mining
With Clementine you can:
Clementine is designed with business users in mind, so you dont need to be an expert in data mining to enjoy its benets. And Clementine can be installed quickly on a personal computer, so you can begin mining your data right away.
Specications
Data understanding Obtain a comprehensive rst look at your data using Clementines data audit node View data quickly through graphs, summary statistics, or an assessment of data quality Create graphs such as histograms, distributions, line plots, and point plots (2-D and 3-D) Edit your graphs to communicate results more clearly Interact with data by selecting a region of a graph and see the selected information in a table; or use the information in a later phase of your analysis Access SPSS statistics and reporting tools from Clementine, including reports and graphics Data preparation Access structured (tabular) data from ODBC data sources, delimited and xed-width text les, SPSS les, and SAS 6, 7, 8, and 9 les, Excel spreadsheets Access unstructured (free-text) data, automatically extracting concepts and links from text using Text Mining for Clementine Automatically extract Web site events from Web server logs using Web Mining for Clementine Directly access survey data stored in the Dimensions Data Model or in the data les of Dimensions survey products Export data to ODBC databases, delimited text, Excel spreadsheets, SPSS les, and SAS 6, 7, 8, and 9 les Data cleaning Remove or replace invalid data Use predictive modeling to automatically impute missing values Automatically generate operations for the detection and treatment of outliers and extreme values Data transformation Field ltering, naming, derivation, binning (including optimized binning), re-categorization, value replacement, and eld reordering Record selection, sampling, merging (inner joins, full outer joins, partial outer joins, and anti-joins), and concatenation; sorting, aggregation, and balancing Data restructuring, including transposition Extensive string functions: string creation, substitution, search and matching, whitespace removal, and truncation Preparing data for time-series analysis Partition data automatically into training, test, and validation sub-sets Transform multiple variables visually and automatically
Solve practical problems quickly and easily n Use business knowledge to guide the process n Support the entire data mining process
n
Data mining enables you to generate new hypotheses from data. If you use SPSS statistical tools, you have the algorithms, computer power, and statistical expertise to dig deeply into a large amount of data. But data mining gives you an even broader analysis. Data mining uncovers previously unknown patterns and connections in data, enabling you to improve business processes and make the right decisions at the right time.
Access data management and transformations performed in SPSS directly from Clementine
Modeling Interactive model and equation browsers and advanced statistical output Combine models through meta-modeling Import PMML models from other tools such as SPSS Use the Clementine External Module Interface for custom algorithms The Clementine Base module includes: C&RT, CHAID & QUEST: Decision tree algorithms including interactive tree building K-means clustering
GRI: Generalized rule induction association discovery algorithm Factor/PCA: Data reduction using factor analysis and principal component analysis Linear Regression: Best-t linear equation modeling
Analyze overall model accuracy with coincidence matrices and other automatic evaluation tools
Evaluation Easily evaluate models using lift, gains, prot, and response graphs Use a one-step process that shortens project time when evaluating multiple models Dene business rules to measure model performance
50
cell phone service provider uses Clementines data mining to discover which customers are most likely to switch to another provider. The company learns that males age 30-40 with a particular model handset are 50 percent likely to churn. This is a totally new insight that contradicts the companys previous conventional wisdom. To ensure that a Clementine model has truly identied something new, company analysts run a number of statistical analyses using SPSS. They measure the statistical signicance of
the discovered pattern and perform a survival analysis to help understand the length of customer relationships (before they churn). These procedures give the company added condence that they are committing their resources wisely to address this churn prole identied by Clementine. Finally, the company uses SPSS to create charts and graphs that are placed in a presentation to communicate the new insights and recommendations to decision makers.
All Clementine models can be browsed, to gain insight into what has been discovered; many provide powerful, graphical, interactive browsers. As with graphs, models can be used interactively to select data or enhance it. Clementine Server offers enterprise-level scalability Clementine provides outstanding performance on the desktopbut sometimes you need to go beyond the desktop, to analyze millions of records, or as part of an enterprise architecture. Clementine Server provides a client/server architecture, in-database mining, and batch processing. Organizations switch to Clementine Server from other data mining platforms which cannot match its outstanding scalability.
Clementine provides a wide range of data mining algorithms for prediction, classication, forecasting, clustering and segmentation, anomaly detection, and association discovery. These include decision trees, neural networks, rule-based proling, Bayesian self-learning, linear and logistic regression, a wide range of statistical algorithms, and many other advanced techniques.
Clementine provides many interactive graphs and reports. Clementine provides powerful interactive model browsers.
51
Specications
The Clementine Classication Module includes: Binary classier: Automate the creation and evaluation of multiple models Decision list: Interactive rule-building for marketing and customer applications Self-learning response model: Bayesian model with incremental learning Time-series: Generate and automatically select time-series forecasting models C5.0 decision tree and rule set algorithm Neural networks: Multi-layer perceptrons with back-propagation learning, and radial basis function networks Binomial and multinomial logistic regression Discriminant analysis Generalized linear models The Clementine Segmentation Module includes: Kohonen Network: Clustering neural network TwoStep Clustering: Select the right number of clusters automatically Anomaly Detection: Detect unusual records through the use of a clusterbased algorithm The Clementine Association Module includes: Apriori: Popular association discovery algorithm with advanced evaluation functions CARMA: Association algorithm which supports multiple consequents Sequence: Sequential association algorithm for order-sensitive analyses
The algorithms in the Clementine base module are powerful tools for making predictions, but when you want the best possible model, you need more options. The Clementine Classication Module provides these options best-of-breed algorithms to make the best possible predictions and forecasts, and automation to help you nd the best model easily.
The Clementine Classication module includes: n Neural networks, a powerful predictive model inspired by biological systems n Decision List, a rule-based predictive algorithm which can be used interactively for tasks such as generating customer proles n Self-learning response model, a Bayesian algorithm which can update existing models as well as create new ones n Time series modeling, including standard methods and an automated modeler which automatically nds the best time series model for the data
Clementines Decision List algorithm creates rule-based customer proles under user control.
52
What kinds of customers do you have, and how does their behavior differ? Can you identify the different segments in your customer base that should drive your product strategy? Are your product strategy and customer processes based on actual customer behavior, or guesswork? Can you tell when a customer, or a transaction, does not t the pattern?
These questions are about segmentation, clustering, and anomaly detection. The Clementine Segmentation module provides powerful algorithms to help you solve problems like these. The clustering models generated with this module come with a browser, which allows you to interact with the model in order to understand the meaning of its clusters.
The cluster viewer shows the relation of clusters to data attributes.
What items, individuals, and events are found together in the data? Which products are purchased by the same customers? What Web pages are viewed in the same visit? And when events are connected, in what order do they occur? Questions of this type are answered by association detection. The Clementine Association Module features three popular, proven algorithms for this purpose. n The Apriori algorithm is well known for its ability to efciently nd links between related items. The Clementine Association Modules Apriori implementation provides a wide range of evaluation functions so that you can select the most likely method of detecting interesting and valuable connections.
The CARMA association detection algorithm allows you to nd more complex relationships by producing rules which have multiple conclusions (or consequents) as well as multiple conditions. This simplies the rules that are discovered, while helping you to nd previously unknown relationships. The Sequence association algorithm detects sequential associations in your datawhich items go together, and in what order. This means that you can nd common sequences of events that lead to outcomes of interest.
Clementines Association models discover links between related items.
With the Clementine Association Module, you can produce scores, enabling you to easily make predictions on the basis of the relationships you have discovered.
53
Specications
Linguistic extraction Extract text data from les, spreadsheets, databases or RSS feeds Manage errors in punctuation and spelling List extracted concepts by type, frequency, document coverage, and other user-dened classications Calculate and highlight synonyms using sophisticated linguistic algorithms and embedded or userspecied linguistic resources Organize concepts by person, organization, term, product, location, and other user dened types Extract non-linguistic entities such as address, currency, time, phone number, and Social Security Number Templates for non-linguistic entities are available Text mining modeling node Create clusters based on term co-occurrence using concept clustering algorithms Intelligently group text documents and records based on content, using text classication algorithms Aggregate concepts from a wide variety of unstructured text data and group them into a small number of categories Save and re-use categories; score any new text documents or data based on the text they contain Accelerate and improve data management Convert selected concepts to structured form for use in Clementine predictive modeling algorithms Text link analysis Identify and extract sentiments (for example, likes and dislikes) from text Identify links and associations between, for example, people and events, or diseases and genes Include opinions, semantic relationships, and linked events in deployable predictive models Reveal complex relationships through interactive graphs that show multiple semantic links between concepts Resource Editor Create and edit custom libraries in the Text Mining for Clementine interface Dene and edit domain-specic terms, non-linguistic entities, synonyms, and concept libraries Edit the CRM, opinion, competitive intelligence, security intelligence, and genomics dictionaries provided with the software
Text Mining for Clementine provides powerful analysis of text data. Customers own words are analyzed and categorized so that positive and negative feelings about issues are automatically identied and can be used for further modeling.
Did you know that text makes up about 80 percent of your organizations data? The customer e-mails, call center notes, openended survey responses, Web forms, and other text sources that your organization captures including content from RSS feeds, such as blogs and news feedscontain roughly four times as much valuable data as the data you store in databases and other structured formats. Text Mining for Clementine enables you to combine this valuable unstructured data with traditional structured data to signicantly increase your understanding of customers, the public, and other groups. This product transforms Clementine into a fully integrated data and text mining workbench. You can perform both text mining and data mining within the interactive, visual Clementine environment. Using a proven natural language processing (NLP) linguistic extraction process, Text Mining for Clementine pulls key concepts from unstructured data and groups them into categories. In addition, Text Mining for Clementine identies and extracts sentiments, such as preferences and opinionsall of which helps you to create more in-depth predictive models and obtain more accurate results.
Text Mining for Clementine can be used in nearly any business or research situation that involves unstructured data. For example, by analyzing customer communication records for problems or complaints that typically precede churn, you can take steps to prevent it. Other applications include streamlining product development and renement, including drug discovery; improving cross-selling and up-selling; making marketing campaigns more cost effective; nding patterns of potentially suspicious behavior; and much more. Text Mining for Clementine can process Dutch, English, French, German, Italian, Portuguese, or Spanish text. By using the Language Weaver option, it can also process text translated into English from 14 languages, including Arabic, Chinese, Persian, and Russian.
54
Traditional Web analytics often fail to provide the insight decision makers need to successfully manage their online businessthey simply count and report Web usage. While an important analytical foundation, these simple statistics offer limited support for answering critical online business questions. Web Mining for Clementine makes it easy for analysts to perform ad hoc, predictive Web analysis within Clementines intuitive visual workow interface. The Clementine data mining workbench enables analysts to quickly develop predictive models using business expertise and deploy them into business operations to improve decision making. Unlike traditional Web analytics methods, Web Mining for Clementine delivers more meaningful customer intelligence on both the current and the future state of your online business.
By bringing together the leading technologies for both Web analytics and data mining, Web Mining for Clementine sets a new standard for Web analysis. For example, you can easily transform raw Web data into data on key business events, such as conversion and purchase, using proven technology. You can also: n Automatically discover user segments: Discover user groupings based on actual online behavior using automated segmentation, and provide your organization with a clearer, more accurate understanding of online prospects and customers. n Detect the most signicant sequences: Sequence detection techniques automatically identify which pages are critical to specic online business goalssuch as improving search engine effectivenesshighlighting important chains of activity. n Understand product and content afnities: Long used by retailers to understand product associations in the mind of the consumer, afnity or market basket analysis identies online customer preferences to improve cross-selling and optimize content. n Predict user propensity to convert, buy, or churn: One of the most powerful analytical best practices that data mining brings to Web analytics is propensity modeling and the ability to predict the likelihood that an individual user will convert, purchase, or churn.
55
Specications
SPSS Predictive Enterprise Repository (Server) The minimum hardware and software requirements are: Operating System: Microsoft Windows 2003 or Windows 2000; Sun Solaris 9 Hardware: P rocessor: Intel Pentium-compatible processor 1.8GHz or faster, or Ultra SPARC 1.2GHz or faster Memory: 1GB RAM or more Minimum free disk space: 1GB or more A CD-ROM drive is required for installation Software: Database: Microsoft SQL Server 2000 or 2005; Oracle 9i or 10g Application Server: JBoss 4.0.3; BEA WebLogic 9.1; IBM WebSphere 6.1 If using with Clementine, version 11.1 is required If using with SPSS, version 14.0 or above is required If using with ShowCase Suite, version 8.0 or higher is required SPSS Predictive Enterprise Manager (Administrative client) The minimum hardware and software requirements are: Operating system: Windows Vista, Windows XP, or Windows 2000 Hardware: Processor: Pentium processor 1.8GHz or faster Memory: 512MB RAM or more Minimum free drive space: 100MB or more A network adapter running TCP/IP protocol A CD-ROM drive is required for installation Software: If using with Clementine, version 11.1 is required If using with SPSS, version 14.0 or higher is required If using ShowCase Suite, version 8.0 or higher is required
Access models and processes used by SPSS statistical tools and the Clementine data mining workbench, as well as queries and reports created with ShowCase 8.0. You can link jobs, so that an analysis conducted with SPSS triggers a model refresh in Clementine, for example. Because jobs can run unattended, you are able to optimize your technological and infrastructure resources and gain unparalleled efciency.
SPSS Predictive Enterprise Services is an essential foundation for data-driven organizations. Its powerful content management and process automation capabilities are designed specically for handling analytical contentenabling you to use predictive analytics to make better decisions, meet organizational goals, and improve business outcomes.
Store, access, and retrieve all of your analytical assets in the centralized repository, and easily automate analytical jobs that comprise multiple steps and analytical tools.
Using SPSS Server in conjunction with SPSS Predictive Enterprise Services, you can embed analytics into your organizations core business processes and use your data to guide everything from daily operations to long-range planning.
56
Find out about the latest product releases Use the Tech Tips and Trainer Tips to get the most out of your software Read customer stories from various industries around the world Refresh your skills with training classes Register for upcoming seminars and conferences And more!
Improve data collection and analysis and drive your organizations success
Dimensions can support all of your data collection projects, including: n Customer and employee satisfaction surveys n Community-based studies of attitudes, behavior, or public program utilization n Alumni research and development n Customer proling
What if you could collect more accurate survey research data? And use it to make better business decisions? Youd be able to streamline the survey process. Speed data analysis. Achieve more meaningful results. And drive your organizations success. Dimensions 4.5
A complete survey research platform
Dimensions from SPSS helps you use survey data to greater competitive advantage. Its a complete platform that supports the entire survey research processfrom design and data collection to analysis and reporting. Easily create questionnaires and eld them through any channelWeb, phone, paper, or PDA. Author surveys in any language. Share results with anyone, anywhere, online. Create interactive reports from your desktop. With Dimensions, you control every aspect of the research process. And you can easily use SPSS to perform advanced analysis of your data. As a result, you can gain deeper insight from your data and make more customer-focused decisions.
58
Our goal is to ensure our internal people spend less time manipulating the data into a form they can use and more time actually interpreting and using the results. Dimensions allows us to give them the tools to achieve that goalthere hasnt been anything that we wanted to do that it has not done.
David Zotter Chief Technology Ofcer NOP Gfk
mrPaper and mrScan Quickly develop and format any type of paper questionnaire, easily maintaining your companys design standards and preferred formatting. Automate the preparation of paper surveys for scanning, saving your staff time and making it cost effective to scan even small or occasional surveys. mrTables Create graphs and tables using data obtained through mrInterview or other data collection or analysis software. Share reports with anyone, anywhere, through a secure portal. Control the content and level of interactivity available to each viewer through prole-based permissions. mrTranslate Efciently manage multilingual survey projects, creating translated versions of a core questionnaire, including instructions. Field each survey in your preferred interviewing mode, with data captured centrally for immediate analysis. Easily provide results in more than one language. Desktop Author Save time by creating appealing surveys incorporating any type of question from a simple, intuitive interface. You dont need to know scripting or programmingits as easy as creating a slide deck. (Available Q3 2007)
n
Dimensions is the perfect platform to help your organization: n Streamline the survey process n Improve the accuracy of the data you collect n Conduct highly targeted research that reaches the right people via the right channel n Collect cleaner data thats ready for immediate analysis n Analyze your data using any modeling technique or application, including SPSS and Clementine n Deliver ndings faster via reports or the Web
Desktop Reporter Generate interactive reports from your survey research quickly and easily, from your desktop. Filter variables, create new ones based on multiple response variables, and export results to Microsoft Ofce applications.
All Dimensions products are designed to work with each other for seamless survey research. Call 1.800.543.5815 to speak with an SPSS representative, or visit www.spss. com/Dimensions for more information.
59
Dimensions ASP
For anyone looking for more control over survey programming, monitoring results, and/or managing samples, SPSS ASP survey software is a great application. Their client service option allows you the added exibility of doing as little or as much as you want on each and every project. Tracy Trawick Manager, Consumer Insights Hamilton Beach, Inc.
SPSS is a registered trademark and the other SPSS products named are trademarks of SPSS Inc. All other names are trademarks of their respective owners. 2007 SPSS Inc. All rights reserved. SCATV11-0807