Beruflich Dokumente
Kultur Dokumente
IBM SPSS Advanced Statistics IBM SPSS Direct Marketing IBM SPSS Amos IBM SPSS Modeler 14.1
www.ibm.com/spss/uk
Business Analytics
Dear valued customer: Welcome to our 2011 catalogue, Decisions. This is an exciting time to be in analytics because analytics is increasingly seen as the key to a smarter planet. Smarter schools. Smarter businesses. Smarter healthcare. Smarter government. In the following pages, youll find a wealth of information about the products that support effective, reliable analytics and about how some of your peers have used these products to drive improved outcomes and achieve outstanding results. For instance, check out how Avis cut the cost of email marketing as a percentage of revenue by almost half; and how Zorg en Zekerheid, an independent health insurer in the Netherlands, uses predictive analytics to detect and prevent fraud. We continue to add to both the technical sophistication of our products and their ease of use. Weve made our core IBM SPSS Statistics products even easier to use, with syntax enhancements and Automatic Linear Models IBM SPSS Statistics Server offers greater speed and performance Weve improved our highly popular IBM SPSS Direct Marketing product by adding new scoring capabilities And IBM SPSS Modeler continues to lead the field in data mining, and now offers direct integration with Cognos Business Intelligence software Perhaps, in the coming year, youll use our software to bring greater focus to your colleges recruitment efforts. Or perhaps youll manage the product assortment at multiple store locations more effectively. Or maybe uncover surprising insights in your customers blog posts that will lead to a product breakthrough. Whatever your needs, were committed to continuing to offer you a full range of innovative software to help you reach your goals. And we look forward to building a lasting relationship that will help us all build a smarter planet. Sincerely,
table of contents
Introduction to predictive analytics
The Data Analysis Path to Success 4 The SPSS end-to-end story Get More Value with Every Release 6 See what youll gain when you upgrade from an earlier version of IBM SPSS Statistics IBM SPSS Statistics 19 8 Analyse data with the worlds leading statistical software IBM SPSS Forecasting 19 24 Build expert time-series forecasts in a flash IBM SPSS Categories 19 36 Predict outcomes and reveal relationships through perceptual maps of categorical data IBM SPSS Conjoint 19 38 Discover what drives your customers purchase decisions IBM SPSS Data Collection Data Entry 6 40 A faster, more effective way to collect and manage survey research data
Special-purpose products
IBM SPSS Bootstrapping 1926 Ensure the stability of your models IBM SPSS Complex Samples 19 28 Correctly and easily compute statistics for complex samples IBM SPSS Statistics Server 19 30 Analyse big data and data in dispersed organisations IBM SPSS Direct Marketing 19 31 Easily identify the right contacts and improve campaigns IBM Developer 19 32 Complete flexibility to build customised functionality and procedures in R or Python your way IBM SPSS Neural Networks 1933 Discover complex relationships in your data more easily IBM SPSS Visualization Designer 134 Easily create and share customised visualisations IBM SPSS Exact Tests 19 35 Reach accurate conclusions with small samples or rare occurrences SPSS
Complementary products
IBM SPSS Text Analytics for Surveys 4 42 Easily make your survey text responses usable in quantitative analysis IBM SPSS SamplePower 3 44 Save time, effort and money by finding the appropriate sample size for your study IBM SPSS Amos 19 46 Get your research noticed take your analysis to the next level
Data Access
IBM SPSS Statistics Base Stat a All Modules
Deployment
IBM SPSS Collaboration and Deployment Services
Reporting
IBM SPSS Statistics Base IBM SPSS Custom Tables IBM SPSS Visualization Designer IBM SPSS Data Collection
Data Analysis A
IBM SPSS Statistics Base IBM SPSS Amos IBM SPSS Advanced Statistics IBM SPSS Bootstrapping IBM SPSS Categories IBM SPSS Direct Marketing IBM SPSS Exact Tests IBM SPSS Forecasting IBM SPSS Neural Networks IBM SPSS Regression
* All products named here were formerly part of the Predictive Analytics Software (PASW) portfolio. Now that SPSS is an IBM company, they have been re-named to re ect IBM branding.
5. Data analysis
In this step, the data is examined, tested, explored and transformed. Patterns are identified, hypotheses are tested and information is extracted. Products for understanding data: IBM SPSS Statistics Base IBM SPSS Data Preparation IBM SPSS Complex Samples IBM SPSS Modeler Products for predicting numerical outcomes: IBM SPSS Statistics Base IBM SPSS Regression IBM SPSS Advanced Statistics IBM SPSS Complex Samples IBM SPSS Neural Networks IBM SPSS Amos IBM SPSS Modeler Products for identifying groups: IBM SPSS Statistics Base IBM SPSS Direct Marketing IBM SPSS Regression IBM SPSS Advanced Statistics IBM SPSS Complex Samples IBM SPSS Custom Tables IBM SPSS Categories IBM SPSS Decision Trees IBM SPSS Exact Tests IBM SPSS Neural Networks IBM SPSS Modeler
6. Reporting
Data is summarised, put in tables and charts and ready for consumption. Products: IBM SPSS Statistics Base IBM SPSS Custom Tables IBM SPSS Visualization Designer IBM SPSS Data Collection
2. Data collection
Data is collected through surveys, online activity, call centers and more. Products: IBM SPSS Data Collection Web Interviews IBM SPSS Data Collection Data Entry
7. Deployment
Data, reports and procedures are distributed to end users globally, with interaction and access managed centrally. Products: IBM SPSS Collaboration and Deployment Services
3. Data access
Data is brought in from available sources, using ODBC or direct file input. Products: IBM SPSS Statistics Base All modules
8. Success!
Take a moment to celebrate youve done it! But now its back to planning how to maintain your competitive advantage once again using IBM SPSS products to help you reach the next level of success.
Easily access, manage and analyse any kind of dataset Gain reliable results with a broad range of tests and procedures Report results in easy-to-understand formats
preparation to analysis, reporting and deployment. It enables you to get a quick look at your data, formulate hypotheses for additional testing and then carry out a number of statistical and analytic procedures to help clarify relationships between variables, create clusters, identify trends and make predictions.
In addition, Visual Binning allows you to easily create bands (such as breaking income into bins of 10,000 or ages into demographic groups). A histogram then enables you to interactively create cutpoints and automatically create data value labels for them.
Whether you are a beginner or an experienced analyst or statistician, IBM SPSS Statistics puts the power of advanced statistical analysis in your hands.
specifications
System requirements
For IBM SPSS Statistics 19 for Windows perating system: Microsoft O Windows XP (Professional, 32-bit) or Vista (Home, Business, 32- or 64-bit), Windows 7 (32- or 64-bit) Hardware Intel or AMD x86 processor running at 1 GHz or higher Memory: 1 GB or more recommended Minimum free drive space: 800 MB* DVD drive Super VGA (800x600) or higherresolution monitor For connecting with an IBM SPSS Statistics Server, a network adapter running the TCP/IP network protocol Browser Internet Explorer 7 or 8 For IBM SPSS Statistics 19 for MAC OS X Operating system Apple Mac OS 105 (Leopard) or 106 (Snow Leopard) (32-bit or 64-bit versions) Hardware Intel processor Memory: 1 GB or more recommended Minimum free drive space: 800 MB* DVD drive Super VGA (800x600) or a higher-resolution monitor Browser Web browser: Mozilla Firefox 2x and 3x IBM SPSS Statistics 19 for Linux Operating system: Any Linux OS that meets the following requirements (32 bit only): Kernel 263 glibc 28 libstdc++5 XFree86-47 Hardware Processor: Intel or AMD x86 processor running at 1 GHz or higher Memory: 1 GB or more recommended* Minimum free drive space: 800 MB DVD drive Super VGA (800x600) or a higherresolution monitor Browser Mozilla Firefox 2X and 3X * Installing Help in all languages requires 1.1 GB free drive space
Available on the following platforms: Windows n Mac n Linux
Driving to greater customer insight Avis cuts email marketing costs with predictive analytics
Avis Europe is a leading car rental company with a network of more than 2,800 locations. Approximately 86 percent of Avis Europes revenues in 2008 were generated in France, Germany, Italy, Spain and the United Kingdom. Brand leadership, service differentiation and cost effectiveness are part of Avis Europes strategic focus and We Try Harder philosophy. For the Avis customer, this translates into quick, professional services, a high quality vehicle at a reasonable price and targeted communications. Avis Europe turned to IBM SPSS software to help create targeted and cost-effective email campaigns and build customer retention through timely and relevant contact. The car rental group selected the IBM SPSS Modeler data mining workbench to develop customer profiles and segment its data more accurately. As a result of using IBM SPSS predictive analytics software, the cost of email marketing as a percentage of revenue was cut almost by half in 2009 compared to 2008. According to Chris Parker, direct analytics specialist at Avis Europe, We are now better at sending the right emails to the right people at the right time. This new targeted approach helps Avis cut email marketing costs and, hence, maximise revenue. The Customer Segmentation project allows us to keep in touch with our large database, but with all the benefits of a one-to-one relationship, he adds. The ability to identify and stay ahead of customers ever-changing activities and needs is one of the biggest benefits provided by IBM SPSS predictive analytics software.
We offer a variety of training options and courses customised to suit your needs and quickly get you up and running with IBM SPSS software and solutions. Our instructors and consultants use an interactive, engaging hands-on approach in a small group environment. Such focused instruction can dramatically increase your performance Click here for more information.
Quickly access and manage data Get a first look through descriptive statistics Perform fundamental tests to classify data, uncover relationships and make predictions
You can take the analytical process from start to finish with IBM SPSS Statistics Base. In addition to the data preparation, data management, output management and charting features now available in all IBM SPSS Statistics modules, IBM SPSS Statistics Base offers the most frequently used procedures for statistical analysis the fundamental toolset that every analyst should have.
These procedures will enable you to get a quick look at your data, formulate hypotheses for additional testing and then carry out a number of procedures to help clarify relationships between variables, create clusters, identify trends and make predictions.
8.59% 16.02%
19.53%
See and graph the characteristics of your dataset quickly and efficiently with IBM SPSS Statistics Base.
IBM SPSS Statistics Base also includes a number of procedures to help you identify groups and predict outcomes. These procedures include: Factor analysis K-means Cluster Analysis Hierarchical Cluster Analysis TwoStep Cluster Analysis Discriminant analysis Linear regression
The Data Editor makes it easy to manage data from IBM SPSS Statistics and IBM SPSS Data Collection files, as well as text files and data from other applications and databases.
Even if you use one or more of the other modules in the IBM SPSS Statistics family for specific kinds of analysis, IBM SPSS Statistics Base will continue to form the basis of many deployments, since it contains statistical tests and procedures that are fundamental to many analyses.
specifications
Key features Descriptive statistics
Crosstabulations Frequencies; Descriptives; Explore; Descriptive ratio statistics TwoStep Cluster Analysis Discriminant Linear Regression Ordinal regression PLUM Multithreaded algorithms: SORT, correlation, partial correlation, linear regression, factor analysis Nearest Neighbor analysis, which can be used for prediction or for classification Non-parametric tests provide multiple comparisons and perform efficiently on large datasets Data Editor onfigure attributes, so that some C can be hidden Spell check value labels and variable labels Sort by variable name, type, format, etc Use Find and Replace functionality Easily eliminate duplicate records with the Identify Duplicate Cases tool Make sense and keep track of your data files by adding notes to them with the Data File Comments command Create read-only datasets More accurately describe your data using longer variable names (up to 64 bytes) Create value labels up to 120 characters Clone or duplicate datasets Apply an extended Variable Properties command to customise properties for individual users Longer text stings (up to 32,000 bytes) Define Variables Properties tool Copy Data Properties tool Data Restructure Wizard
Continued on PG. 11
Bivariate statistics
Means; t tests; ANOVA; Correlation (Bivariate, Partial, Distances); and Non-parametric tests
10
Automatic Linear Models (ALM) is a great procedure for handling modeling of data - this is something we can add to our list of tools we can now offer our clients. While ALM is the feature I liked best, the faster table rendering will likely have the biggest impact on our business given the large amount of IBM SPSS Statistics output we produce, most of which uses IBM SPSS Custom Tables functionality.
Brian Robertson, PhD Director of Research, Market Decisions
Aggregate data to external or to the active data file Automatically convert string variables to numeric with Autorecode Spell-checking of long text strings Date and Time Wizard: Easily work with data containing time and dates in IBM SPSS Statistics Create a time/date variable from a string containing a date variable Create a time/date variable from variables that include individual date units, such as month or year Calculate times and dates Separate a date unit from a time/ date variable Apply splitters in the Data Editor for easier viewing of wide or long data files Create your own dictionary information for variables by using Custom Attributes Customise the viewing of extremely wide files with Variable Sets Use syntax to change string length and basic data type Set a permanent default working directory
Use programming structures, such as do repeat-end repeat, loop-end loop and vectors Compute variables using arithmetic, cross-case, date and time, logical, missing-value, random-number, statistical or string functions Create variables that contain the values of existing variables from preceding or subsequent cases Count occurrences of values across variables Make transformations permanent or temporary Execute transformations immediately, batched or on demand
Reporting
Reports OLAP cubes Case summaries Report summaries
Graphs
Categorical charts 3-D Bar: Simple, cluster and stacked Bar: Simple, cluster, stacked, dropped shadow and 3-D Line: Simple, multiple and drop-line Area: Simple and stacked Pie: Simple, exploding and 3-D effect High-low: High-low-close, difference area and range bar Box plot: Simple and clustered Error bar: Simple and clustered Error bars: Add to bar, line and area charts; and confidence level, SD or SE
Transformations
Easily find and replace text strings in your data using the find/replace function Recode string or numeric value Recode values into consecutive integers Create conditional transformations using DO IF, ELSE IF, ELSE and END IF statements
Dual-Y axes and overlay Scatterplots Simple, grouped, scatterplot matrix and 3-D Fit lines: Linear, quadratic, or cubic regression; Lowess smoother; confidence interval control; and for total or subgroups, display spikes to line Bin points by color or marker size to prevent overlap Density charts Population pyramids: Mirrored axis to compare distributions; with or without normal curve Dot charts: Stacked dots show distribution; symmetric, stacked and linear Histograms: With or without normal curve; custom binning options Quality control charts Pareto, X-Bar, range, Sigma, individual chart or moving range chart Rule-checking performed on primary and secondary charts utomatic flagging of points that A violate Shewhart rules, the ability to turn off rules and the ability to suppress charts Diagnostic and exploratory charts Caseplots and time-series plots Probability plots Autocorrelation and partial autocorrelation function plots Cross-correlation function plots Receiver-Operating Characteristics Multiple use charts 2-D line charts (with 2 scale axes) Charts for multiple response sets Custom charts
Graphics Production Language (GPL), a custom chart creation language, enables advanced users to attain a broader range of chart and option possibilities than the interface supports to create mixed charts and more Layout options Paneled charts: Create a table of subcharts, one panel per level or condition; multiple row and columns 3-D effects: Rotate, modify depth and display backplanes Chart templates Save selected characteristics of a chart and apply them to others automatically Apply the following attributes at creation or edit time: Layout, titles, footnotes and annotations; chart element styles; data element styles; axis scale range; axis scale settings; fit and reference lines; and scatterplot point binning ree-view layout and finer control T of template bundles
System requirements
Please see p 10 for complete system requirements
Available on the following platforms: Windows n Mac n Linux
11
Streamline the data preparation process Eliminate labor-intensive manual checks Reach more accurate conclusions
IBM SPSS Data Preparation enables you to easily identify suspicious and invalid cases, variables and data values; view patterns of missing data; and summarise variable distributions. You can streamline the data preparation process so that you can get ready for analysis faster and reach more accurate conclusions.
Bin or set cutpoints for scale variables With the Optimal Binning procedure, you can more accurately use algorithms designed for nominal attributes (such as Nave Bayes and logit models). Optimal Binning enables you to bin or set cutpoints for scale variables. Select from three types of optimal binning for preprocessing data prior to model building: Unsupervised: Create bins with equal counts Supervised: Take the target variable into account to determine cutpoints. This method is more accurate than unsupervised; however, it is also more computationally intensive. Hybrid approach: Combines the unsupervised and supervised approaches. This method is particularly useful if you have a large amount of distinct values.
specifications
Key features
Automated Data Preparation Recommends steps to speed up model building and improve predictive power Determine objective, prepare dates and times for modeling, exclude low-quality input fields, prepare fields to improve data quality, rescale fields, continuous input and target fields, transform fields, perform feature selection and construction, name fields and apply transformations to data Minimum standard deviation Flag incomplete IDs, duplicate IDs and empty cases Standard rules: Describe the data, view single variable rules and apply them to analysis variables Description of data Distribution: Shows a thumbnailsize bar chart for categorical variables or histogram for scale variables Min/max data values shown Single-variable rules: Apply rules to identify missing or invalid values User-defined Custom rules: Define cross-variable rule expressions in which respondents answers violate logic Output: Reports for invalid data Casewise report, specify by case Specify the minimum number of violations needed for a case Specify the maximum number of cases in the report Standard validation rules reports Summarise violations by analysis variable and rule Display descriptive statistics Save: Save variables that record rule violations and use them to clean data and filter out bad cases Summary variables: Empty case indicator Duplicate ID indicator Incomplete ID indicator Validation rule violation Indicator variables that record all validation rule violations
Validate data
Validate data in the working data file Basic checks: Maximum percent of missing values, single category cases and cases with a count of 1 Minimum coefficient of variation
12
Identify suspicious or invalid cases, variables and data values easily with IBM SPSS Data Preparation
Variables tab: The Validate Data dialog is used to validate your data. The Variables tab shows variables in your file. Start by selecting the variables you are interested in and moving them to the Analysis Variables list.
Basic checks: You can specify basic checks to apply to variables and cases in your file. For example, you can obtain reports that identify variables with a high percentage of missing values or empty cases.
Standard rules: Apply rules to individual variables that identify invalid values, such as values outside a valid range or missing values.
Define standard rules: The Validate Data dialog lets you create your own rules or apply predefined rules.
Define custom rules: Create cross-variable rules in which respondents answers violate logic (such as pregnant males).
Recommendations: Automated Data Preparation delivers recommendations and allows users to drill in and examine the recommendations.
Percentage and number of cases considered as anomalies and included in the anomaly list Cutpoint of the anomaly index to determine whether a case is considered as an anomaly Save additional variables to the working data file including: Anomaly index Peer group ID, size and size in percentage Variable, variable impact measure, variable value and norm value associated with a reason OUTFILE subcommand: Write a model to a filename as XML PRINT subcommand prints: Case-processing summary Anomaly index list, anomaly peer ID list and anomaly reason list
The Continuous Variable Norms table, for continuous variable, and the Categorical Variable Norms, for categorical variable Anomaly Index Summary Reason Summary Table
Optimal Binning
Preprocess data with Optimal Binning Categorises one or more continuous variables by distributing the values of each into bins Select from the following methods: Unsupervised binning via the equal frequency algorithm It uses the equal frequency algorithm to discretise the binning input variables Guide variable not required
Supervised binning via the MDLP (Minimal Description Length Principle) algorithm Discretises binning input variables using the MDLP algorithm without any preprocessing Ideal for small datasets Guide variable required Hybrid MDLP binning Involves preprocessing via the equal frequency algorithm, followed by the MDLP algorithm Ideal for large datasets Guide variable required Specify the following criteria: ow to define the minimum and H maximum cutpoint for each binning input variable, and the lower limit of an interval Whether to force-merge sparsely populated bins
Whether missing values uses listwise or pairwise deletion Save new variables with binned values and syntax to a IBM SPSS Statistics syntax file PRINT subcommand prints: The binning input variables cutpoint sets Descriptive information for all binning input variables Model entropy for binned variables
System requirements
Requirements vary according to platform
13
Create classification trees using your choice of algorithms Identify patterns, segments and groups in your data Choose from four established tree-growing algorithms
IBM SPSS Decision Trees creates classification and decision trees to identify groups, discover relationships and predict future events. By creating visual trees, you are able to present results in an intuitive manner so you can more clearly explain results to non-technical audiences.
Why IBM SPSS Decision Trees should be added to your desktop: Identify groups, segments and patterns in a highly visual manner with classification trees Choose from CHAID, Exhaustive CHAID, C&RT and QUEST to find the best fit for your data
Present results in an intuitive manner perfect for non-technical audiences Save information from trees as new variables in data (information such as terminal node number, predicted value and predicted probabilities)
Greater Manchester Police make contact Community-based policing addresses signal crime
Formed in 1974, Greater Manchester Police (GMP) has a workforce of more than 7,000 police officers and 3,500 support staff. Serving some 2.5 million people covering an area of 500 square miles and 10 metropolitan boroughs, it is one of the largest police forces in the United Kingdom. Public consultation plans, which report on community concerns, safety, crime and disorder, are a statutory function of GMP. As part of the National Reassurance Project, GMP sent a survey to 9,000 households to establish a community-based policing program. The concept focused on identification of signal crime. Before implementing an IBM SPSS solution, GMP had been using a paper system and Microsoft Excel. This method proved unfruitful due to Excels limited analysis capabilities. Keith Bentley, chief superintendent (retired), said, We wanted a solution that would not only find answers to questions that basic packages would miss, but also make data entry faster and more reliable. GMP decided to adopt an IBM SPSS solution to gather public views on policing that could be incorporated into the National Reassurance Project. The questionnaires were seamlessly created within the system, and the responses were scanned in electronically. The solution has enabled GMP to reduce the cost and time spent on survey research and analysis. Not only did we benefit from a massive reduction in time for this project, saving approximately 20,000 in two weeks, but these results are now being referenced by other Greater Manchester Police divisions, said Bentley.
14
Use the highly visual trees to discover relationships that are currently hidden in your data. IBM SPSS Decision Trees diagrams, tables and graphs are easy to interpret.
specifications
Key features
Create tree-based classification models for: Segmentation Stratification Prediction Data reduction and variable screening Interaction identification Category merging and discretising continuous variables Classify cases into groups or predict values of a dependent (target) variable based on values of independent (predictor) variables Validation tools for exploratory and confirmatory classification analysis View nodes using one of several ways: Show bar charts of your target variables, tables or both in each node Collapse and expand branches without deleting the model Generate syntax automatically from the UI Re-run tree building using syntax in production mode Score data based on results or use results in further analysis using other IBM SPSS Statistics procedures Exhaustive CHAID by Biggs, de Ville and Suen (1991) lassification & Regression Trees C (C&RT) by Breiman, Friedman, Olshen and Stone (1984) QUEST by Loh and Shih (1997)
Create tree models in IBM SPSS Decision Trees using CHAID, Exhaustive CHAID, C&RT or QUEST.
Evaluation
Evaluation graphs enable visual representation of gains summary tables Misclassification functionality Gains chart: Identify segments by highest (and lowest) contribution
Deployment
Export output objects to any of IBM SPSS Statistics available output formats Generate rules that define selected segments in SQL to score databases or IBM SPSS Statistics syntax to score IBM SPSS Statistics files Export XML models to score new cases
Use tree model results to score cases directly in IBM SPSS Statistics Base.
System requirements
Requirements vary according to platform
Available on the following platforms: Windows n Mac n Linux
Algorithms
Four powerful tree-modeling algorithms: CHAID by Kass (1980)
Directly select cases or assign predictions in IBM SPSS Statistics Base from the model results, or export rules for later use.
15
Quickly create tables with a drag-and-drop interface Preview tables as you create them to get it right the first time Customise tables to make them easier for your audience to understand
IBM SPSS Custom Tables enables you to display your analyses as presentation-quality, production-ready tables. It gives you all the tools you need to easily create and work with tabular reports. One of the most popular modules in the IBM SPSS Statistics family, IBM SPSS Custom Tables delivers many valuable capabilities: Create new fields directly in output tables to perform calculations (e.g., sum, difference, percentage difference) on output categories See the results of significance tests directly in output tables
Preview tables as you build them: Take out the guesswork by previewing your table as you select variables and table options with a simple drag-anddrop interface Control your table output: Choose from a variety of formats to represent multi-way information in a two-way table and generate the view you want Customise your table structure: Exclude specific categories, display missing value cells and add subtotals to your table Use in-depth analyses: Run Chisquare, column proportions and column means tests, and add more insight to your tables by identifying differences, changes or trends in your data
Automate frequent reports: Run large production jobs and complex table structures with ease to automatically build similar tables with new data
specifications
Key features
Graphical user interface Simple drag-and-drop table builder interface allows you to preview tables as you select variables and options Single, unified table builder instead of multiple menu choices and dialog boxes for different table types Control contents Create tables with up to three display dimensions: Rows (stub), columns (banner) and layers Nest variables to any level in all dimensions Crosstabulate multiple independent variables in the same table Display frequencies for multiple variables side-by-side with tables of frequencies Display all categories when multiple variables are included in a table even if a variable has a category without responses Display multiple statistics in rows, columns or layers Place totals in any row, column or layer Create subtotals for subsets of categories of a categorical variable Perform calculations (eg, sum, difference, percentage difference) on output categories, and display the results in new fields created directly in Custom Tables output No limit on the number of c alculated fields Show significance test results directly in Custom Tables output instead of in a separate table No need to combine findings in a Word document Complies with APA guidelines Control category display order and the ability to selectively show or hide categories Statistics Select from over 40 summary statistics Calculate statistics for each cell, subgroup or table Calculate percentages at any or all levels for nested variables Calculate counts and percentages for multiple-response variables based on the number of responses or the number of cases Select percentage bases for missing values to include or exclude missing responses Formatting controls Sort categories by any summary statistic in the table
Continued on PG. 17
16
Once all your variables are in place, push the OK button to create your final table. Apply the optional TableLooks for a more polished appearance.
Add additional variables by dragging and placing them where you want.
Hide the categories that make up subtotals remove a category from the table without removing it from the subtotal calculation Directly edit any table element, including formatting and labels Sort tables by cell contents in ascending or descending order Automatically display labels instead of coded values Specify minimum and maximum width of table columns (overrides TableLooks) Show a name, label, or both for each table variable Display missing data as blank, zero, . or any other user-defined term
Add titles and captions Output as IBM SPSS Statistics pivot tables Specify corner labels Customise labels for statistics Display the entire label for variables, values and statistics Choose from a variety of numerical formats Apply pre-formatted TableLooks Define the set of variables that are related to multiple response data and save it with your data definition for subsequent analysis Use both long- and short-string elementary variables
Define an unlimited number of sets, or variables that can exist in a set Tests of significance: Chi-square Column means Column proportions Exclude categories from significance tests Significance tests for multiple response variables Syntax and printing formats Simpler, easy-to-understand syntax Syntax converter (for upgrade users) Specify page layout: Top, bottom, left and right margins and page length
Use the global break command to produce a table for each value of a variable when the variable is used in a series of tables
System requirements
Requirements vary according to platform
Available on the following platforms: Windows n Mac n Linux
17
Move beyond basic analysis Build flexible models using a wealth of model-building options Achieve more accurate predictive models using a wide range of modeling techniques
Created to provide you with more statistical power, IBM SPSS Advanced Statistics enables you to reach more accurate conclusions. Consider what it would be like to harness sophisticated univariate and multivariate analytical techniques and unleash them on your data. You can reach more dependable conclusions with procedures designed to fit the inherent characteristics of data describing complex
relationships. IBM SPSS Advanced Statistics provides a powerful set of sophisticated techniques designed to help you solve realworld problems, such as: Medical research: Analyse patient survival rates Manufacturing: Assess production processes Pharmaceutical: Report test results to the FDA Market research: Determine product interest levels
Its time to step up your analysis when you require multiple outcomes, want to measure outcomes over time, analyse data with hiearchical structure or estimate length of time until an event. Break through the barrier between general analysis and advanced modeling, and begin reaping the rewards today.
specifications
Key features Generalized Linear Mixed Models (GLMM)
Extends the linear model so that: 1) The target is linearly related to the factors and covariates through a specified link function, 2) The target can have a non-normal distribution, 3) The observations can be correlated Generalized linear mixed models cover a wide variety of models, from simple linear regression to complex multilevel models for non-normal longitudinal data Specify the subject structure for repeated measurements and how the errors of the repeated measurements are correlated Choose among the 8 covariance types Specify the target, optional offset and optional analysis (regression) weight Choose among the following probability distributions: binomial, gamma, inverse Gaussian, multinomial, negative binomial, normal, Poisson Choose among the following link functions: identity, complementary log-log log-link, log complement, logit, negative log-log, power, probit
repeated measures analysis, and repeated measures analysis with time-dependent covariate Use one of six covariance structures offered Select from 11 non-spatial covariance types
GENLOG
Fit loglinear and logit models to count data by means of a generalized linear model approach Model fit, using ML estimation under Poisson loglinear model and multinomial loglinear models Accommodate structural zeros Generalized log-odds ratio facility tests the specific generalized logodds ratios are equal to zero, and can print confidence intervals Diagnostic plots: Scatterplots and normal probability plots of residuals
GLM
Describe the relationship between a dependent variable and a set of independent variables n Select univariate and multivariate lack-of-fit tests Regression model Fixed effect ANOVA, ANCOVA, MANOVA and MANCOVA Random or mixed ANOVA and ANCOVA Repeated measures: Univariate or multivariate Doubly multivariate design
MIXED
Expands the general linear model used in the GLM procedure so that data can exhibit correlation and nonconstant variability Fit the following types of models: Fixed effects ANOVA model, randomised complete blocks design, split-plot design, purely random effects model, random coefficient model, multilevel analysis, unconditional linear growth model, linear growth model with person-level covariate,
System requirements
Requirements vary according to platform
Available on the following platforms: Windows n Mac n Linux
VARCOMP
Variance component estimation Estimation methods: ANOVA MINQUE, maximum likelihood and restricted maximum likelihood Type I and Type III sums of squares for the ANOVA method Choices of zero-weight or uniformweight methods Choices of ML and REML calculation methods
18
Market size, age of store location and promotion are selected as the fixed effects. MarketID is chosen as the random effect.
A marketing group tests three campaigns to determine which promotion has the greatest effect on sales. First the marketing group specifies the structure of the data with MarketID as the subjects.
Output shows that promotion is significantly related to units sold. Specifically, promotion 1 has a higher amount of units sold than promotion 3 and promotion 2 has a lower number of units sold than promotion 3.
19
Predict categorical outcomes with more than two categories Easily classify your data into two groups Gain more control over your model
IBM SPSS Regression gives you an even wider range of statistics so you can get the most accurate response for specific data types. Do you build predictive models but find ordinary least squares regression too limiting? If so, IBM SPSS Regression can make your life easier. Use IBM SPSS Regression for: Market research: Study consumer buying habits Medical research: Study response to dosages Loan assessment: Analyse good and bad credit risks Institutional research: Measure academic achievement tests
of constraints such as yes/no answers, MLR allows you to model which factor predicts if the customer buys product A, B or C. Choose from four methods for choosing predictors: forward entry, backward elimination, forward stepwise and backward stepwise Stepwise function in MLR: Save time and easily find the best predictors for your data Use Score and Wald methods for a faster and more accurate conclusion for variable selection Apply a highly scalable, highperformance algorithm to handle big datasets Save time by specifying the reference category in your outcome variable in the user interface. You no longer need to recode the dependent variable set up in the desired reference category. Use Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) to better assess model fit
Binary logistic regression (BLR): Predict dichotomous variables such as buy or not buy, vote or not vote. This procedure offers many stepwise methods to select the main and interaction effects that best predict your response variable. Nonlinear regression (NLR) and constrained nonlinear regression (CNLR): Get control over your model and your model expression. These procedures give you two methods for estimating parameters of non-linear models. Weighted least square regression (WLS): Give more weight to measurements within a series Probit analysis (PROBIT): Analyse potency of responses to stimuli, such as medicine doses, prices or incentives. Probit evaluates the value of the stimuli using a logit or probit transformation of the proportion responding.
specifications
Key features Multinomial logistic regression
Control the values of the algorithmtuning parameters Include interaction terms Customise hypotheses by directly specifying null hypotheses as linear combinations of parameters Specify a dispersion scaling value Build equations with or without a constant Use a confidence interval for odds ratios Save the following statistics: predicted probability, predicted response category, probability of the predicted response category and probability of the actual response category Find the best predictor from dozens of possible predictors using stepwise functionality Use Score and Wald methods to quickly reach results with a large number of predictors Assess model fit Diagnostics for the classification table
Criteria for model building: probability of score statistic for entry, probability of Wald, or likelihood ratio statistic for removal Save the following statistics: Predicted probability and group, residuals, deviance values, logit, Studentised and standardised residuals, leverage value, analog of Cooks influence statistic and difference in Beta Export the model using XML
Continued on PG. 21
20
Use the binary logistic regression procedure to test the impact of service offers (new or expanded) on customer satisfaction. The classification table indicates that this model can correctly predict satisfaction with 90 percent accuracy.
Output for calculated weights: Log-likelihood functions for each value of Delta; R, R2, adjusted R2, standard errors, analysis of variance and t tests of individual coefficient for Delta value with maximised log-likelihood function Display output in pivot tables
Probit
Transform predictors: Base 10, natural or user-specified base Natural response rate estimates or specified Algorithm control parameters: Convergence, iteration limit and heterogeneity criterion probability Statistics: Frequencies, fiducial confidence intervals, relative median potency, test of parallelism, plots of observed probits or logits Display output in pivot tables
System requirements
Requirements vary according to platform
Available on the following platforms: Windows n Mac n Linux
21
Overcome missing data issues Use multiple imputation to replace missing data Build models taking missing data into account
When you ignore or exclude missing data, you risk obtaining biased or insignificant results. Use IBM SPSS Missing Values to impute your missing data and draw more valid conclusions. IBM SPSS Missing Values is a critical tool for anyone concerned about data validity. You can easily examine your data to uncover missing data patterns. Then, estimate summary statistics and impute missing values through statistical algorithms.
For example, improve survey questions that youve identified as possibly confusing based on observed missing data patterns. You can even determine if missing values for one variable are related to missing values of another with the percent mismatch of patterns table. You might find that respondents who skip a question on income might also bypass a question about education level. Use this information to enhance the quality of your surveys in the future.
specifications
Key features Analyse patterns
Display missing data and extreme cases for all cases and all variables using the data patterns table Display system-missing and three types of user-defined missing values Sort in ascending or descending order isplay actual values for specified D Variables Display patterns of missing values for all cases that have at least one missing value using the missing patterns table Group similar missing value patterns together Sort by missing patterns and variables isplay actual values for specified D variables Determine differences between missing and non-missing groups for a related variable with the separate variance t test table t test, degrees of freedom, mean, p value and count Show differences between present and missing data for categorical variables using the distribution of categorical variables table Produce crosstabs showing product and missing data for each category of one variable by the other variables Assess how much missing data for one variable relates to the missing data of another variable using the percent mismatch of patterns table Sort matrices by missing value patterns or variables Identify all unique patterns with the tabulated patterns table, which summarises each missing data pattern and displays the count for each pattern plus means and frequencies for each variable Display count and averages for each missing value pattern using the summary of missing value patterns table
Multiple Imputation
Specify which variables to impute and specify constraints on the imputed values, such as minimum and maximum values You can also specify which variables are used as predictors when imputing missing values of other variables Impute values for categorical and continuous variables Logistic regression is used for categorical variables and linear regression for continuous variables Predictive mean matching is an option for continuous outcomes; this ensures that the imputed values are reasonable (within the range of the original data) Missing data pattern detection helps determine which imputation method to use Three imputation methods are offered: onotone: an efficient method M for data that have a monotone pattern of missingness
ully conditional specification F (FCS): an iterative Markov Chain Monte Carlo (MCMC) method that is appropriate when the data have an arbitrary (monotone or no monotone) missing pattern Automatic: scans the data to determine the best imputation method (monotone or FCS) Specify: The number of imputations The range of imputed values Whether or not interaction effects are used when imputing Optionally, turn off imputation for variables that have a higher percentage of missing values Tolerance levels, to check for singularity You can also specify a variable containing analysis (regression) weights The procedure incorporates analysis weights in regression and classification models used to impute missing values Analysis weights are also used in summaries
Continued on PG. 23
22
Lnsfrskringar Swedish insurance company increases the number of direct marketing campaigns
Lnsfrskringar comprises 24 independent, regional insurance companies collaborating through the jointly owned Lnsfrskringar AB and its subsidiaries. Lnsfrskringar offers non-life insurance, accident and medical insurance, life assurance, pension-saving plans, fund savings and various banking services. Lnsfrskringars campaign selection was based on the experience of the sales force. The work carried out to obtain information from different data sources was inefficient and administratively cumbersome. There was a great need for a more effective solution whereby information could be structured and analysed more systematically to optimise campaign selections and the products offered to customers. Lnsfrskringar implemented IBM SPSS Modeler to analyse large quantities of data from various data sources and to identify patterns of customer behaviour. By building and deploying models, Lnsfrskringar can now analyse customer behaviour and needs, and leverage this information to customise its marketing campaigns. According to Ola Gustafsson, Customer Data Analyst at Lnsfrskringar, The efficiency of the analyst team has been enhanced significantly work which used to take a week can now be done in a day and a half. We have also seen a qualitative improvement, with results that are up to four times more reliable and accurate, he added. In addition, it has been possible to standardise the way the work is done. Today, Lnsfrskringar has a very straightforward way to predict the likelihood of a customer having an interest in a specific insurance policy. As a result, Lnsfrskringar haS moved from executing two nationwide campaigns per year to running 13 unique, customer-specific campaigns a day.
of imputed values (eg, mean, standard deviation and standard error) Display an overall summary of missingness in your data as well as an imputation summary and the imputation model for each variable whose values are imputed You can obtain analysis of missing values by variable as well as tabulated patterns of missing values Optionally, you can obtain descriptive statistics for imputed values Graphically summarise missingness for cases, variables and individual data (cell) values Request an IBM SPSS Statistics data file containing imputed values and/or an FCS iteration history Multiple imputation datasets can be analysed using supported analysis procedures to obtain final (combined) parameter estimates that take into account the inherent uncertainty in the various sets of imputed values
Analysis
Supported analysis procedures for Multiple Imputation (note: you must have purchased the proper module in which the procedure is located) Descriptive procedures: frequencies, descriptives, crosstabs, correlations, non-parametric correlation, partial correlation Comparison of means: means, t test, non-parametric tests, one-way ANOVA, univariate ANOVA Models: General Linear Models, Generalized Linear Models, linear regression, multinomial logistic regression, binary logistic regression, discriminant analysis, ordinal regression, linear mixed models Survival analysis techniques: Cox regression
Pooling
Pooling of output: output is pooled using one of two levels of pooling produces pooled parameters Pooling Diagnostics Relative Increase in Variance: measure of relative variability in parameter estimate across imputations Fraction of Missing Information: relative increase in variance scaled as a proportion A measure of uncertainty due to nonresponse elative Efficiency: efficiency of R estimate for M imputations relative to that for an infinite number of imputations Obtain Model PMML for pooled parameter estimates: Linear regression, Generalized Linear Models, multinomial logistic regression, binary logistic regression, discriminant analysis, Cox Regression
System requirements
Requirements vary according to platform
Available on the following platforms: Windows n Mac n Linux
23
Support time-series analysis Find the best model for your data using the Expert Modeler Apply saved models to what-if scenarios to optimise your decisions
Time-series analysis is the most powerful procedure you can use to analyse historical information, build models, predict trends, and forecast future events. IBM SPSS Forecasting is the best way to quickly create powerful forecasts with confidence. With better forecasts, long-term goals can be set with insight on how to achieve them based on your organisations past performance and knowledge of your industry. Unlike spreadsheet programs, IBM SPSS Forecasting has the advanced statistical techniques you need in order to work with time-series data. But you dont need to be an expert statistician to use it. Regardless of your level of experience, you can analyse historical data and predict trends faster, and deliver information in ways that your organisations decision makers can understand and use.
IBM SPSS Forecasting will help you find answers to tough questions: If I increase my advertising budget, how will it affect sales by product or region? How will increasing assembly line capacity affect production? Will a change in fees affect the number of new customers we gain? How will tuition increases affect enrollment? If youre new to building models from time-series data, IBM SPSS Forecasting helps you by: Generating reliable models, even if youre not sure how to choose exponential smoothing parameters or ARIMA orders, or how to achieve stationarity Automatically testing your data for seasonality, intermittency and missing values, and selecting appropriate models Detecting outliers and preventing them from influencing parameter estimates
Generating graphs showing confidence intervals and the models goodness of fit If youre experienced at forecasting, IBM SPSS Forecasting allows you to: Control every parameter when building your data model Or use IBM SPSS Forecastings Expert Modeler recommendations as a starting point or to check your work Use IBM SPSS Forecasting to: Develop reliable forecasts quickly, regardless of the size of the dataset or number of variables Update and manage forecasting models efficiently Reduce forecasting errors by automating appropriate model selection and parameters Gain more control over choices affecting models, parameters and output Deliver high-resolution graphs and communicate results effectively
specifications
Key features TSMODEL
Model a set of time-series variables by using the Expert Modeler or specifying the structure of an ARIMA or exponential smoothing model Allow Expert Modeler to select the best- fitting predictor variables and models Limit search space to only ARIMA or only exponential smoothing models Treat independent variables as events Specify custom ARIMA models Produces maximum likelihood estimates for seasonal and non-seasonal univariate models General or constrained models specified by autoregressive or moving average order, order of differencing, seasonal autoregressive, or moving average order, and seasonal differencing Two dependent variable transformations: Square root and natural log Automatically detect or specify outliers: Additive, level shift, innovational, transient, seasonal additive, local trend and additive patch Specify seasonal and nonseasonal numerator, denominator and difference transfer function orders and transformations for each independent variable Specify custom exponential smoothing models Four non-seasonal model types: Simple, Holts linear trend, Browns linear trend and damped trend Three seasonal model types: Simple seasonal, Winters additive and Winters multiplicative Two dependent variable transformations: Square root and natural log Display forecasts, fit measures, Ljung-Box statistic, parameter estimates and outliers by model Generate tables and plots to compare statistics across all models Eight goodness of fit measures available: stationary R2, R2, root mean square error, mean absolute percentage error, mean absolute error, maximum absolute percentage error, maximum absolute error and normalised BIC Tables and plots of residual autocorrelation function (ACF) and partial autocorrelation function (PACF) Plot observed values, forecasts, fit value, confidence intervals for forecasts and confidence intervals for fit values for each series Filter output to fixed number or percentage of best or worst fitting models Save predicted values, lower confidence limits, upper confidence limits and noise residuals for each series back to the dataset
Continued on PG. 25
24
This screenshot of the time-series modeler shows how it provides you with the ability to model multiple series simultaneously. Because the module presents results in an organised fashion, you can concentrate on the models that need closer examination.
This screenshot displaying a forecast for womens apparel shows how you can automatically determine which model best fits your time-series and independent variables.
Specify forecast period, treatment of user-missing values, and confidence intervals Export models to an XML file for later use by TSAPPLY
TSAPPLY
Apply saved models to new or updated data Simultaneously apply models from multiple XML files created with TSMODEL Re-estimate model parameters and goodness of fit measures from the data or load from the saved model file Selectively choose saved models to apply
Override the periodicity (seasonality) of the active dataset Same output, fit measure, statistics and options as TSMODEL Export re-estimated models to an XML file
SPECTRA
Decomposes a time series into its harmonic components, a set of regular periodic functions at different wavelengths or periods Produces/plots univariate or bivariate periodogram and spectral density estimate Bivariate spectral analysis Smooth periodogram values with weighted moving averages Spectral data windows available for smoothing: Tukey-Hamming, Tukey, Parzen, Bartlett, equal weight, no smoothing and user-specified weights
High-resolution charts available: Periodogram, spectral and cospectral density estimate, squared coherency, quadrature spectrum estimate, phase spectrum, cross amplitude and gain
SEASON
Estimates multiplicative or additive seasonal factors for periodic time series Multiplicative or additive model Moving averages, ratios, seasonal and seasonal adjustment factors, seasonally adjusted series, smoothed trend-cycle components and irregular components
System requirements
Requirements vary according to platform
Available on the following platforms: Windows n Mac n Linux
25
Quickly estimate the sampling distribution of an estimator Eliminate outliers and anomalies that degrade accuracy Bootstrap many analytical procedures
The models your organisation creates drive important decisions. They may be used to shape public policy, to prevent the spread of disease or to determine a multi-million dollar investment. Its important that your models are stable, so that they will produce accurate, reliable results. Bootstrapping is a useful technique for testing model stability, and IBM SPSS Bootstrapping makes it easy to do. This module of IBM SPSS Statistics provides an efficient way to ensure your models are stable and reliable. It estimates the sampling distribution of an estimator by re-sampling with replacement from the original sample. With IBM SPSS Bootstrapping, you can reliably estimate the standard errors and confidence intervals of a population parameter like a mean, median, proportion, odds ratio, correlation coefficient, regression coefficient and numerous others.
sample to a population, simply running the model once on the sample data on hand may not be the best approach because results are dependent on your sample data. Resampling with replacement will provide you with more accurate estimates of the reliability of your data. In order to identify precisely how suitable your model is, you will want to bootstrap the model to assess its stability.
Through the IBM SPSS Bootstrapping dialog box, you can easily control the numbers of bootstrap samples, set a random number seed, specify confidence intervals and indicate if a simple or stratified method is appropriate.
IBM SPSS Bootstrapping is available for installation as client-only software but, for greater performance and scalability, a server-based version is also available.
specifications
Key features
IBM SPSS Bootstrapping provides the ability to bootstrap a number of analytical procedures found throughout the IBM SPSS Statistics product family, including: t tests Correlations/nonparametric correlations Partial correlations IBM SPSS Advanced Statistics: GLMM GENLIN Linear Mixed Models Cox Regression IBM SPSS Regression: Regression Nominal Regression Logistic Regression
Available on the following platforms: Windows n Mac n Linux
Modeling Procedures
IBM SPSS Statistics Base: One-way UniAnova PLUM Discriminant
Descriptive Procedures
IBM SPSS Statistics Base: Descriptives Frequencies Examine Means Crosstabs
26
what if
...you could easily identify the best customers for a marketing campaign?
Todays marketers understand the value of leveraging their data to gain insight into customer behaviour and preferences. But that value is only delivered when the insight (and in the case of predictive analysis, the foresight based on customer propensities and predicted behaviour) is turned into action at the point of customer contact. IBM SPSS Predictive Analytics lets you embed improved decision-making and automation into CRM and other customer-facing systems, based on data insights. Find out how IBM SPSS Predictive Analytics can turn data into actions.
27
Get a more accurate picture of your data when working with large-scale surveys Achieve more statistically valid inferences for populations Reach correct point estimates for statistics such as totals, means and ratios, and obtain standard errors of these statistics Predict numerical and categorical outcomes from non-simple random samples Take up to three stages into account when analysing data from a multistage design
If youre working with complex sample designs, such as stratified, clustered or multistage sampling, you need specialised statistical techniques to account for the sample design and its associated standard error. IBM SPSS Complex Samples has everything you need to correctly and easily compute statistics and their standard errors from complex sample designs.
You can apply it to: Survey research Obtain descriptive and inferential statistics for survey data Market research Analyse customer satisfaction data Health research Analyse large public-use datasets on public health topics such as health and nutrition or alcohol use and traffic fatalities
Social science Conduct secondary research on public survey datasets Public opinion research Characterise attitudes on policy issues In addition to giving you the ability to assess your designs impact, this module also produces a more accurate picture of your data because subpopulation assessments take other subpopulations into account.
IBM SPSS Complex Samples helps agencies drive better outcomes for public policy
Analyzing data for government agencies poses special challenges. For some projects, enough data exists so that obtaining a random sample for analysis is relatively straightforward. For other projects, however, if an analyst drew a simple random sample, there would probably be too few cases in some of the categories, because these conditions are relatively rare. Therefore, an analyst must oversample certain groups in which there are relatively few cases. For example, if an analyst for a city public health department wanted to evaluate the effectiveness of several different substance abuse programs, there might not be enough cases in each age group, in each geographic area, for each treatment option being studied resulting in some sub-groups being over-sampled. The procedures in IBM SPSS Complex Samples enable analysts to apply scientific sample designs in the sample selection process in order to reduce the risk of a distorted view of the population. In addition, analysts can document how a sample was selected, so that subsequent evaluations can replicate the sample design and obtain results that can be used to reliably identify trends and make predictions about future developments. This not only helps agencies evaluate the effectiveness of their programs but also anticipate and plan for future needs.
28
Plan les
Analyse data
Results
specifications
Key Features
Complex Samples Plan (CSPLAN): Provides a common place to specify the sampling frame to create a complex sample design or analysis design used by procedures in IBM SPSS Complex Samples Complex Samples Selection (CSSELECT): Selects complex, probability-based samples from a population It chooses units according to a sample design created through the CSPLAN procedure Complex Samples Descriptives (CSDESCRIPTIVES): Estimates means, sums and ratios, and computes their standard errors, design effects, confidence intervals and hypothesis tests Complex Samples Tabulate (CSTABULATE): Displays one-way frequency tables or two-way crosstabulations and associated standard errors, design effects, confidence intervals and hypothesis tests Complex Samples General Linear Model (CSGLM): Enables you to build linear regression, analysis of variance (ANOVA) and analysis of covariance (ANCOVA) models Complex Samples Ordinal (CSORDINAL): Performs regression analysis on a binary or ordinal polytomous dependent variable using the selected cumulative link function Complex Samples Logistic Regression (CSLOGISTIC): Performs binary logistic regression analysis, as well as multinomial logistic regression analysis Complex Samples Cox Regression (CSCOXREG): Applies Cox proportional hazards regression to analysis of survival times that is, the length of time before the occurrence of an event
System requirements
Requirements vary according to platform
Available on the following platforms: Windows n Mac n Linux
29
Analyse massive data files faster Support distributed offices in performing analytics efficiently Improve analyst productivity
IBM SPSS Statistics Server offers all the features of IBM SPSS Statistics but with faster performance because the processing is centralised on the server machine. This eliminates the need to transfer data over the network, which saves time, improves productivity and enhances security. Analysts can work in their application of choice without any disruption while waiting for a long-running job to complete and can initiate multiple jobs at the same time.
creating pivot tables in the output is now two to three times faster than before. In addition, you can now run IBM SPSS Statistics Server on your IBM System z machines using Linux for more powerful, enterprise-wide analysis.
from meeting the analytic needs of a single department to meeting those of hundreds and even thousands of users across the enterprise. When you combine the strength of worldclass analytical tools and techniques with the flexibility and speed of server functionality, you have a powerful solution for supporting better decision-making throughout your enterprise.
Faster performance
IBM SPSS Statistics Server provides faster performance across your enterprise when working with large datasets having multiple predictors. There is no limit on the number of CPUs or cores that an analytical procedure can use, and no limit on the number of threads that can be used for multithreaded procedures. Operations like sorts and aggregates can be pushed back to the database, where they can be performed faster. Temporary files created by analytical procedures can be striped over multiple disks, which also speeds analytical processing. And with this latest release,
IBM SPSS Statistics represents another important step to satisfy the needs of business clients with innovations that help describe, explain and predict.
Alexander Vinogradov, PhD Associate Professor Social Sciences and Social Technologies Faculty National University - Kyiv-Mohila Academy Ukraine
specifications
System requirements Operating system
Microsoft Windows Server 2008 or 2003 (32-bit or 64-bit); Sun Solaris 9 or 10 (SPARC 64-bit machine); IBM AIX 53 or 61; IBM zSeries running Linux, 64-bit only (PowerPC); HP-UX 11i v3 (64-bit Itanium); or Red Hat Enterprise Linux 4x or 5 (32-bit and 64-bit), Advanced Platform (32-bit and 64-bit) or Advanced Server 4x (64-bit)
Hardware
Minimum CPU: Two CPUs recommended, running 1 GHz or higher Memory: 4 GB RAM per expected concurrent user Minimum free drive space: 500 MB for installation Additional space is required to run the program (for temporary files) Other: A network adapter running the TCP/IP protocol
Available on the following platforms: Windows n Sun Solaris n Linux n IBM AIX n HP-UX n IBM zSeries Running Linux
30
Analyse response rates to marketing campaigns Classify contacts according to common characteristics Test and target specific customer or donor clusters Connect to Salesforce.com to extract data, collect details and perform analyses
a campaign can help increase your return on investment. And performing recencyfrequency-monetary value (RFM) analysis can give you insight into the lifetime value of your customers or donors. With IBM SPSS Direct Marketing, you get the features you need to: Classify customers and prospects based on identifying characteristics: These include age, marital status, job function, where they live, etc. Segment prospects or customers into clusters, which are groups that share similar qualities and differ from others. Compare campaign performance: Test existing campaigns against new campaigns, collect your data and run a control package test. Use color coding to quickly see which test package would outperform your control package. Create responder profiles: Generate profiles of those who responded to test campaigns. Use the prospect profiling tool to pinpoint specific characteristics in the data, such as age, marital status and job functions.
Maximising your marketing programs in terms of both efficiency and impact is critical in todays marketplace. Doing so by uncovering patterns in your customer data, however, can be a tedious process requiring highly specialised skills.
Generate propensity-to-purchase scores: Target customers most likely to respond to campaigns or offers, and generate XML files to score other data. This helps eliminate less active or unlikely customers from your list. Analyse responses by postal code: Identify postal codes with high rates of response to your marketing campaigns. Even find the best location for adding an agency or a brick-andmortar store. Refine individual customer lists: Use the improved scoring interface to write RFM scores, prospect profiles and response rates back to existing or new datasets. Quickly build targeted lists and adapt marketing strategies for each customer group. IBM SPSS Direct Marketing helps you better understand your customer groups, identify the most active customers for your organisation and maximise your ROI. Whether youre launching or testing campaigns, looking to increase cross-sell and up-sell revenue, or planning to open a store, use IBM SPSS Direct Marketing to develop and execute successful marketing programs.
specifications
Key features Analytical capabilities
Compute RFM scores from a dataset in which each row contains the aggregated data for one customer or the data for one transaction Accept recency data in the form of a transaction date or the time interval since the transaction Generate scores indicating which contacts are most likely to respond to similar campaigns Create training and testing groups for diagnostic purposes Recode the response field to represent positive or negative responses Cluster analysis works with continuous and categorical fields Contact profiling works with nominal, ordinal, string or numeric fields
Data output
Illustrate in an average monetary value chart how recency, frequency and spending are related in the sample Display counts and percentages of positive and negative responses for each group and identify which are significant Examine descriptions of each profile group, response rates and cumulative response rates using Contact Profiles feature
Export all analytics to Excel Leverage improved, descriptive text to help explain in everyday language the output that results from executed procedures
User options
Write the computed scores (and ID variables, where necessary) to the active dataset, a new dataset or to a file Append the RFM results directly to your data or a new data file to quickly identify and build lists of high-value customers Create descriptive profiles that indicate who responded to which previous campaigns
System requirements
Requirements vary according to platform
Available on the following platforms: Windows n Mac n Linux
31
Wrap custom functionality and procedures from R and Python in IBM SPSS Statistics syntax Distribute wrapped packages via the Web or email
Non-Programmers can:
Gain access to custom procedures and functionality developed by advanced users Run wrapped packages in the IBM SPSS Statistics environment
syntax. A wrapped package can then be invoked through IBM SPSS Statistics, and it can easily be given a dialog box interface that makes it indistinguishable from any of IBM SPSS Statistics built-in dialogs. Furthermore, wrapped procedures can produce standard pivot tables in the style of IBM SPSS Statistics. IBM SPSS Statistics Developer features the same Programmability Extension and Custom Dialog Builder features now present in all of the IBM SPSS Statistics modules. These features allow the advanced user to quickly share highly customised work with any user of any IBM SPSS Statistics module, complete with a user interface that they can make as complex or simple as they like. An ideal platform for your unique solutions, IBM SPSS Statistics Developer gives statistical programmers a development platform on which to implement customised or unique analytical solutions that can be accessed and executed by anyone familiar with IBM SPSS Statistics. Available in 10 languages, IBM SPSS Statistics Developer includes all of the core functionality found in IBM SPSS Statistics except for IBM SPSS Statistics analytical procedures. This makes it the perfect blank slate on which advanced users can write their customised procedures.
IBM SPSS Statistics Developer has all of the features found in other IBM SPSS Statistics products except for the statistical algorithms they contain. It provides a lower-priced option for users who need the power of IBM SPSS Statistics to support their R and Python functionality. The R programming language is very flexible, and provides fine control over the way statistical functions are executed. However, it can be complex and difficult to wield and requires data to be held in memory, preventing it from being easily used with large datasets. Users of Python face similar challenges.
IBM SPSS Statistics Developer allows R and Python users to overcome the limitations of using these languages alone and integrates their efforts into an interface that also features robust data access, management, preparation and reporting tools. IBM SPSS Statistics provides a complete framework into which users can place customised features and algorithms built using R and/or Python for their own use, or for deployment to users of any IBM SPSS Statistics module.
specifications
Key features Advantages for R programmers:
Simple and easy to learn how to wrap R packages and algorithms Easy to distribute via Web download or email Custom Dialog Builder allows programmers to create specific interfaces that simplify users access to the functionality they have provided Makes their skills and expertise available to a far wider range of users in both academic and commercial worlds Easy to install just point IBM SPSS Statistics at the wrapped package, and it installs automatically Easy to use packages are invoked through the standard IBM SPSS Statistics interface or, for more advanced users, through IBM SPSS Statistics syntax Runs in the familiar IBM SPSS Statistics environment, which provides Full access to IBM SPSS Statistics data handling functions The ability to work with larger datasets no in-memory limitations fficient production of more E graphs and other forms of output
Pre-wrapped functions
Estimate a linear model and carry out the Breusch-Pagan heteroscedasticity test, using the ncvtest function from the R car package Estimate conditional quantiles for a linear model, using the rq function from the R quantreg package Estimate a Rasch model for item response data, using the rasch function from the R ltm package Estimate a linear regression model, robustly with an M estimator, using the rlm function from the R MASS package Estimate a regression model with optional fixed upper and/or lower bounds for the dependent variable, using the tobit function from the R AER package
Calculate correlations for nominal, ordinal, and scale variables accounting for the measurement levels of the variables using the hector function from the R polycor package Create boxplots with different options from standard IBM SPSS Statistics boxplots, using the R boxplot function
System requirements
Requirements vary according to platform
Available on the following platforms: Windows n Mac n Linux
32
Explore data using non-linear data modeling tools Uncover hidden connections in your data Improve model performance
IBM SPSS Neural Networks offers techniques that enable you to explore your data in new ways and, as a result, build more accurate and effective predictive models. A computational neural network is a set of non-linear data modeling tools consisting of input and output layers plus one or two hidden layers. The connections between neurons in each layer have associated weights, which are iteratively adjusted by the training algorithm to minimise error and provide accurate predictions. You set the conditions under which the network learns and can finely control the training stopping rules and network architecture, or let the procedure automatically choose the architecture for you. IBM SPSS Neural Networks provides a complementary approach to the statistical techniques available in IBM SPSS Statistics Base and its modules. From the familiar IBM SPSS Statistics interface, you can mine your data for hidden relationships,
using either the Multilayer Perceptron (MLP) or Radial Basis Function (RBF) procedure. With either of these approaches, the procedure operates on a training set of data and then applies that knowledge to the entire dataset, and to any new data.
specifications
Key features Multilayer Perceptron (MLP)
Fits an MLP neural network, which uses a feedforward architecture Can have multiple hidden layers One or more dependent variables may be specified scale, categorical or a combination of these EXCEPT subcommand excludes selected variables RESCALE subcommand rescales covariates or scale dependent variables PARTITION subcommand specifies the method of partitioning the active dataset into training, testing and holdout samples ARCHITECTURE subcommand specifies the network architecture The number of hidden layers The activation function to use for all units in the hidden layers The activation function to use for all units in the output layer CRITERIA subcommand is used to specify computational resources STOPPINGRULES subcommand specifies the rules that determine when to stop training the network MISSING subcommand controls whether user-missing values for categorical variables are treated as valid values PRINT subcommand indicates the tabular output to display and can be used to request a sensitivity analysis PLOT subcommand indicates the chart output to display SAVE subcommand writes temporary variables to the active dataset OUTFILE subcommand saves XMLformat files containing the synaptic weights Normalised RBF or Ordinary RBF When using the CRITERIA subcommand, users can specify the computation settings for the RBF procedures, specifying the hidden-unit overlapping factor that controls how much overlap occurs among the hidden units
System requirements
Requirements vary according to platform
Available on the following platforms: Windows n Mac n Linux
33
Create compelling visualisations without programming skills Share and reuse visualisation templates within IBM SPSS products Customise and brand your data visualisations
IBM SPSS Visualization Designer makes the creation of powerful and compelling data visualisations accessible to non-technical users. With an intuitive, drag-and-drop design interface, it eliminates the need for advanced programming skills. And because IBM SPSS Visualization Designer gives you the ability to create reusable chart/graph templates, you can share your unique visualisations with a broad array of end users.
With IBM SPSS Visualization Designer you dont need extensive programming skills to conceive, create and share compelling visualisations.
specifications
Key features
Easy creation of new visualisations for use within other IBM SPSS products (IBM SPSS Modeler and IBM SPSS Statistics) Create and apply custom styles to visualisations Automatic suggestions of visualisations to fit your data Data source support (Delimiterseparated, SPSS Data Files, and relational databases including DB2, SQL Server, Oracle, and Sybase) Dozens of available built-in visualisation templates GUI interface to update/edit visualisations Drag-and-drop interface to construct and visualise data Factor many data elements with multi-faceted and multi-dimensional visualisations Create advanced graphics using The Grammar of Graphics (Lee Wilkinson) principles using GPL syntax Includes support for the VisualizationML visualisation markup language Integration with IBM SPSS Collaboration and Deployment Services Export visualisations as images Built-in sampling for large datasets Memory: 512MB RAM (1GB recommended) Available hard-drive space: 250MB or more Web browser: Internet Explorer 7 or 8
Available on the following platforms: Windows
System requirements
Microsoft Windows 7 Enterprise and Professional (32- and 64-bit); Windows Vista Enterprise and Business (32- and 64-bit) SP1; or Windows XP Professional (32- and 64-bit) SP3 Hardware: Processor speed: 18GHz or higher
34
Use smaller sample sizes and be confident of your results Analyse rare occurrences in large datasets with more than 30 exact tests
IBM SPSS Exact Tests is the add-on module to turn to when youd like to analyse rare occurrences in large databases or work more accurately with small samples. With more than 30 exact tests, youll be able to analyse your data where traditional tests fail, because you can use smaller sample sizes and be confident of your results. And now IBM SPSS Exact Tests operates on Mac and Linux platforms, as well as on Windows. IBM SPSS Exact Tests has the tests and statistics you need, including: Exact p values Monte Carlo p values Pearson Chi-square test Linear-by-linear association test Contingency coefficient Uncertainty coefficient symmetric or asymmetric Wilcoxon signed-rank test Cochrans Q test Binomial test IBM SPSS Exact Tests ensures youll always have the right statistical test for your data. Because it is part of the IBM SPSS Statistics
product line, you can count on beginningto-end solutions for your modeling and data analysis needs. IBM SPSS Exact Tests is crucial to your research if you are: Operating with a small number of cases Working with variables that have a high percentage of responses in one category Subsetting your data into fine breakdowns Searching for rare occurrences in large datasets
IBM SPSS Exact Tests fills the deficiencies that many of the major statistics programs have in the area of nonparametric inference and hypothesis tests ...
Vincent C. Arena, PhD University of Pittsburgh, Graduate School of Public Health, Department of Biostatistics
specifications
Tests and statistics
Pearson Chi-square test, Likelihood ratio, and Fishers Exact Exact 1-tailed, 2-tailed p values for 2x2 table Exact 2-tailed p values for general RxC table Monte Carlo (MC) 2-tailed p values and general RxC table
In this example, even though there are only 10 cases, IBM SPSS Exact Tests helps you determine that a significant relationship exists.
coefficient symmetric or asymmetric, Kappa, Gamma, Kendalls Tau-b and Tau-c, Somers D symmetric and asymmetric, Pearsons R and Spearman correlation
Exact 2-tailed p values MC 2-tailed p values and CIs
Marginal homogeneity
Asymptotic, exact, MC 1- and two 2-tailed p values and point probability
Jonckheere-Terpstra
Asymptotic, exact, MC 1- and 2-tailed p-values and point probability
2-Sample Kolmogorov-Smirnov
Exact 2-tailed p values, and point probability MC 2-tailed p-values and CIs
System requirements
Requirements vary according to platform
Available on the following platforms: Windows n Mac n Linux
McNemar
Exact 1- and 2-tailed p values and point probability
Linear-by-linear association
Exact 1- and 2-tailed p-values and exact point probability MC 1- and 2-tailed p values and CIs
Wald-Wolfowitz Runs
Exact 1-tailed p-value and point probability
35
Visualise and explore complex categorical and numeric data, as well as high-dimensional data Understand information in large two-way and multi-way tables Use biplots, triplots and perceptual maps to see relationships in your data
With the sophisticated features of IBM SPSS Categories in your toolbox, you are no longer hampered by categorical or highly dimensional data. These techniques ensure you have all the tools you need to easily analyse and interpret your multivariate data and its relationships more completely. For example, use IBM SPSS Categories to understand which characteristics consumers relate most closely to your product or brand, or to determine customer perception of your products compared to other products you or your competitors offer. IBM SPSS Categories uses categorical regression procedures to predict the values of a nominal, ordinal or numerical outcome variable from a combination of numeric and (un)ordered categorical predictor variables. You can use regression with optimal scaling to describe, for example, how job satisfaction can be predicted from job category, geographic region and the amount of work-related travel.
Optimal scaling techniques quantify the variables in such a way that the Multiple R is maximised. Optimal scaling may be applied to numeric variables when the residuals are non-normal or when the predictor variables are not linearly related with the outcome variable. Three new regularisation methods Ridge regression, the Lasso and the Elastic Net improve prediction accuracy by stabilising the parameter estimates. Automatic variable selection makes it possible to analyse highvolume datasets more variables than objects. And by using the numeric scaling level, you can do regularisation in regression by using the Lasso or the Elastic Net for your numeric data as well.
The data are a 2x5x6 table containing information on two genders, five age groups and six products. This plot shows the results of a two-dimensional multiple correspondence analysis of the table. Notice that products such as A and B are chosen at younger ages and by males, while products such as G and C are preferred at older ages.
specifications
Key features PREFSCAL
Multidimensional unfolding analysis Read one or more rectangular matrices of proximities Read weights, initial configurations, and fixed coordinates Plots: Stress plots, common space scatterplots, individual space weight scatterplots, and individual space scatterplots Plots: Joint category plots, transformation plots, residual plots, projected centroid plots, object plots, biplots, triplots and component loadings plots
CATREG
Statistics: Frequencies, regression coefficients, ANOVA table, iteration history and category quantifications Plots: Correlations between untransformed predictors, correlations between transformed predictors, residual plots and transformation plots Three regularisation methods: Ridge regression, the Lasso and the Elastic Net Improve prediction accuracy by stabilising the parameter estimates Analyse high-volume data (more variables than objects)
Continued on PG. 37
CATPCA
Statistics: Frequencies, missing values, optimal scaling level, mode, variance accounted for by centroid coordinates, vector coordinates, total per variable and per dimension, component loadings for vectorquantified variables, category quantifications and coordinates, iteration history
CORRESPONDENCE
Statistics: Correspondence measures; row and column profiles; singular values; row and column scores; and inertia, mass Plots: Transformation plots, row point plots, column point plots and biplots
PROXSCAL
Statistics: Iteration history, stress measures, stress decomposition, coordinates of the common space, and object distances within the final configuration
36
Unleash the full potential of your data through optimal scaling and dimension reduction techniques
Correspondence analysis (CORRESPONDENCE): Describe the relationships between two nominal variables in a low-dimensional space, while simultaneously describing the relationships between categories for each variable. Categorical regression (CATREG): Predict the values of a categorical dependent variable from a combination of categorical independent variables. Optimal scaling techniques quantify the variables in such a way that the Multiple R is maximised. Regularisation methods improve prediction accuracy by stabilising the parameter estimates. Multiple correspondence analysis (MULTIPLE CORRESPONDENCE): Analyse a categorical multivariate data matrix for two or more nominal variables. Categorical Principal Components Analysis (CATPCA): Use alternating least squares to generalise principal components analysis to accommodate variables of mixed measurement levels. Specify a transformation type of nominal, ordinal or numeric on a variableby-variable basis. Nonlinear canonical correlation (OVERALS): Use alternating least squares to generalise canonical correlation analysis. It allows more than one set of variables to be compared to one another on the same graph. Proximity scaling (PROXSCAL): Takes a matrix of similarity and dissimilarity distances between observations in a high-dimensional space and assigns them to a position in a low-dimensional space in order for you to gain spatial understanding of how objects relate. Preference scaling (PREFSCAL): Set up the Preference Scaling procedure (PREFSCAL) in syntax to perform multidimensional unfolding on two sets of objects in order to find a map that represents the relationships between these two sets of objects as distances between two sets of points.
Use correspondence analysis to easily display and analyse differences between categories. In this example, researchers present and analyse the two-dimension table relating staff group to smoking category in a particular workplace.
Easily incorporate supplementary information on additional variables in IBM SPSS Categories. Here, additional information on alcohol consumption by staff group is known. These additional columns can be projected into the staff group by smoking category space.
Obtain automatic variable selection from the predictor set Write regularised models and coefficients to a new dataset for later use Two model selection and predictive accuracy assessment methods: the 632 bootstrap and Cross Validation (CV) Find the model that is optimal for prediction with the 632(+) bootstrap and CV options
Obtain nonparametric estimates of the standard errors of the c oefficients with the bootstrap Systematic multiple starts
MULTIPLE CORRESPONDENCE
Statistics: Model summary; history statistics, descriptive statistics; discrimination measures; category quantifications; inertia of the categories; contribution of the categories to the inertia of the dimensions and contribution of the dimensions to the inertia of the categories; and iteration history
Plots: Object points, category points (centroid coordinates), discrimination measures and transformation (optimal category quantifications against category indicators)
System requirements
Requirements vary according to platform
Available on the following platforms: Windows n Mac n Linux
OVERALS
Statistics: Frequencies, centroids, iteration history, object scores, category quantifications, weights, component loadings and single and multiple fit Plots: Object scores plots and category coordinates plots
37
Identify product features important to new customers Discover which product attributes are most important to current customers Determine the influence product attributes have on customer preferences
Thoroughly understand customer preferences, tradeoffs and price sensitivity with IBM SPSS Conjoint. Using conjoint analysis, you can uncover more information about how customers compare products in the marketplace and measure how individual product attributes affect consumer behaviour. Armed with this knowledge, you can design, price and market products and services
tailored to your customers needs. IBM SPSS Conjoint 19 outputs its results to pivot tables, which: Provide better, more professionallooking and presentable charts Offer more formatting options than text output Can be exported to Microsoft Word, Excel or PowerPoint Can be used by the SPSS Output Management System (OMS)
IBM SPSS Conjoint helps uncover answers to critical questions such as: Which features or attributes of a product or service drive the purchase decision? Which feature combinations will have the most success? What market segment is most interested in the product? What marketing messages will most appeal to that segment? What is the optimal price to charge customers for a product or service? Who are the closest competitors in the market?
Zorg en Zekerheid
Health insurer detects fraud with IBM SPSS predictive analytics
Zorg en Zekerheid is a medium-sized and independent regional health insurer in the Netherlands, with more than 460 employees and more than 380,000 policyholders. The company is committed to providing accessible and affordable healthcare. The majority of policyholders and healthcare providers submit claims for treatments that have actually taken place. However, a small number commit fraud. There are millions of records, and the challenge is to quickly identify the records that are fraudulent. In 1999, Zorg en Zekerheid set up a unit to detect and combat fraud using software that analysed data for pre-defined risk indicators. This required manually selecting the data to determine if fraud was involved, but this was time-consuming and did not always produce the desired results. After three years, Zorg en Zekerheid sought a solution that would produce more accurate results. The main criterion was that the chance of being caught had to be greater. Zorg en Zekerheid began examining various data mining solutions, including SAS and IBM SPSS software. In a pilot project, a model was created to detect deviations in claims. We had just solved a fraud case and I gave the data regarding that case to IBM SPSS consultants, continued De Vries. They selected five healthcare providers who were potentially submitting fraudulent claims, from a total of over a hundred providers. The healthcare provider which had just turned out to be fraudulent, was also on that list. From that moment on, I was sure that I was dealing with the right party.
38
Find out how your product ranks with IBM SPSS Conjoint
IBM SPSS Conjoint offers the procedures you need to plan, implement and analyse efficient conjoint surveys. With these techniques, you can discover how respondents rank their preferences and product attributes. Orthoplan: Produces an orthogonal array of product attribute combinations, dramatically reducing the number of questions you must ask while ensuring enough information to perform a full analysis. Plancards: Print cards to elicit respondents preferences. Quickly generate cards that respondents can sort to rank alternative products. Conjoint procedure: Get results you can act on, such as which product attributes are important and at what levels they are most preferred. Plus, perform simulations which will tell you expected market shares for alternative products. Add IBM SPSS Conjoint to your competitive research and develop products and services that are more successful.
Establish the parameters of your study with the Orthoplan design generator. Orthoplan gives you an orthogonal array of alternative potential products that combine different product features at specified levels. This saves you time and money by presenting a fraction of all possible alternatives.
Generate charts
After you gather and input the data using your plan cards, IBM SPSS Conjoint performs an ordinary least squares analysis of preference or rating data. It then generates charts to simulate expected market shares. For instance, in this chart you can quickly see that subjects strongly consider package design and price as the most important.
specifications
Key features
Orthoplan Generates orthogonal main effects fractional factorial designs Specify the desired number of cards for the plan Generate holdout cards to test the fitted conjoint model Orthoplan can mix the training and holdout cards or can stack the holdout cards after the training cards Plancards A utility procedure used to produce printed cards for a conjoint experiment; the printed cards are used as stimuli to be sorted, ranked, or rated by the subjects Specify the variables to be used as factors and the order in which their labels are to appear in the output Choose a format Listing-file format Card format Conjoint Performs an ordinary least squares analysis of preference or rating data Work with the plan file generated by Plancards or a plan file input Work with individual level rank or rating data Provide individual level and aggregate results Treat the factors in any of a number of ways; conjoint indicates reversals Experimental cards have one of three scenarios: Training, holdout and simulation Three conjoint simulation methods: Max utility; Bradley-Terry-Luce (BTL); and logit Print results Attribute importance Utility (part-worth) and standard error Graphical indication of most to least-preferred levels of each attribute Counts of reversals and reversal summary Pearson R for training and holdout data Kendalls tau for training and holdout data simulation results and simulation summary
System requirements
Requirements vary according to platform
Available on the following platforms: Windows n Mac n Linux
39
Maximise productivity and assure data quality Analyse results in real-time with IBM SPSS Statistics software
The innovative technology of IBM SPSS Data Collection Data Entry accelerates and simplifies your survey research projects. It helps you create your survey more rapidly, collect accurate results more quickly and start analysis without the typical wait. Part of the IBM SPSS Data Collection family of survey research solutions, IBM SPSS Data Collection Data Entry comprises two powerful products, IBM SPSS Data Collection Data Entry Author and IBM SPSS Data Collection Data Entry Interviewer. Both integrate seamlessly with each other, as well as with other IBM SPSS products, so you can move from data collection to analysis, modeling and reporting in a single step.
you through the survey creation process, you dont have to be an expert to create the best surveys. Once created, you can use other products in the Data Collection family to centrally manage and deploy your survey across multiple channels, including online, phone and in-person, and in any language.
specifications
IBM SPSS Data Collection Data Entry combines two powerful technologies, IBM SPSS Data Collection Data Entry Author and IBM SPSS Data Collection Data Entry Interviewer multiple response text date numeric complex questions such as matrices and grids Sophisticated validation to keep your data clean and accurate including: go-to and conditional routing statements (if then) list-filtering skip-and-fill Configurable, keyboard-driven data entry interface ensures that users can enter data quickly, accurately and comfortably Real-time validation, with visual and audio cues, keeps data clean during the data entry process, so you can start your analysis at any point Oversight and visibility into data entry through validation and verification, including double-entry Data entry project and case management lets users work on multiple projects easily Ability to stop and re-start records and keep track of users work Support for single user data entry as well as sophisticated data entry operations, involving multiple data entry specialists and supervisors
Key features
Create professional surveys as easily as you would a Microsoft PowerPoint presentation Leverage existing IBM SPSS Statistics files to jump-start your data collection efforts Supports all question types including: single response
System requirements
Requirements vary according to platform
Available on the following platform: Windows
40
41
Gain greater analytical value from your text responses Save time by automating the creation of categories and the categorisation of responses Save money by eliminating or reducing the need for outside coding services
With IBM SPSS Text Analytics for Surveys you can gain full value from text responses without the drudgery and expense associated with manual coding. Specifically designed for survey text, this product is based on our automated natural language processing (NLP) software technologies. Using this software, you can automate the creation of categories and categorisation of responses to transform unstructured survey data into quantitative data without having to read text responses word-for-word. You can get better out-of-the-box results in the English-language version by using pre-built Text Analysis Packages (TAPs) designed for categorising customer, product or employee satisfaction surveys. New TAPs are also available for evaluating opinions on advertising, on banking satisfaction and on brand awareness.
Two versions of IBM SPSS Text Analytics for Surveys are available for separate purchase: one for analysing English, Dutch, French, German and Spanish survey text; and a second for analysing Japanese text. When you use IBM SPSS Text Analytics for Surveys, you are empowered to gain greater insight from text responses using these capabilities: Dictionary-based text-extraction technology: This product ships with libraries and resources to automate concept extraction. You can easily customise these libraries by adding topic-specific terms to match your needs. Proven linguistic technologies: This product is based on natural language processing (NLP) technologies that enable you to quickly create categories and reliably categorise responses.
Create conditional rules to enhance categorisation: Use extraction results and Boolean operators to categorise responses based on more complex information and filter erroneous responses. Visualisation capabilities aid category refinement: Use bar charts, Web graphs and Web tables to quickly reveal which categories contain co-occurring responses. Then decide whether to combine certain categories or to create new ones that better account for shared responses. Re-use and share categories: Export categories for use by others in new projects to save time and ensure reliability across the same or similar studies. Export results to IBM SPSS Statistics or Excel: Analyse and graph results in other software for use in decision-making.
specifications
Key features
Import data from: IBM SPSS Statistics (SAV), IBM SPSS Data Collection (MDD), Excel (XLS), Excel (XLSX) for Office 2007 and ODBCcompliant databases New! TAPs are available for evaluating opinions on advertising, on banking satisfaction and on brand awareness* Supports the reuse of categories created in other programs Use linguistic algorithms and a semantic network to automatically create categories and categorise responses New! IBM SPSS Semantic Network supports hundreds of general themes for enhanced category building* Create conditional rules to categorise responses by using extraction results and Boolean operators Allows the force-in/force-out of responses into/out of categories without changing the category definition Print category lists and some visualisations
Refine results
Categorise responses manually by dragging terms, types and responses within the interface Sort categories by relevance View response co-occurrence by using a category bar chart, Web graph or Web table Use flags to mark responses as complete or to follow-up Profile categories by overlaying reference variables onto bar charts
Export results
Export data from: IBM SPSS Statistics (SAV) and Excel (XLS)
Share resources
Share project files that contain extracted results, categories and linguistic resources Export categories and definitions for reuse by others in new projects
Dictionaries (customisable)
Type Dictionary: Supports the grouping of similar terms
Continued on PG. 43
42
Refine categories Visualisation capabilities enable you to quickly see which categories share responses. This can help you to refine categories manually.
A Web graph showing which categories share responses enables the user to decide whether to combine certain categories or to create new ones that better account for shared responses.
Export results for analysis and graphing When you are satisfied with your categories, you can export results either as dichotomies or as categories. These can be used to create tables and graphs, either separately or in combination with other survey data.
Export results to IBM SPSS Statistics to create crosstabs or whatever your analysis requires.
Create categories and categorise text responses Automatically create categories and categorise responses using term derivation, term inclusion, a semantic network or frequency. Also, categorise responses manually by dragging terms, types and responses within the interface.
Click the Create Categories based on Linguistics button at upper left to automatically create categories and categorise responses. Results can also be exported to IBM SPSS Statistics to create graphs that communicate survey findings.
Substitution Dictionary: Groups similar terms under a target name Exclude Dictionary: Contains terms to be ignored during extraction
Libraries
Survey Library: Contains resources related to pattern rules and types, as well as a predefined list of synonyms and excluded terms (proprietary) Project Library: Stores dictionary changes for a particular project Core Library: Contains reserved Type Dictionaries for Person, Location, Product and Organsation Budget Library: Contains a built-in
type for words or phrases that represent qualifiers and adjectives Opinions Library: Contains seven built-in types that group terms for qualifiers and adjectives New! Additional libraries are available for banking, emoticons, finance and slang
System requirements
For Microsoft Windows 7 Enterprise and Professional (32- and 64-bit); Windows Vista Ultimate, Enterprise, Business, Home Premium, and Home Basic (32- and 64-bit) SP1; or Windows XP Professional (32and 64-bit) SP3
Japanese language version supports Microsoft Windows 7 Enterprise and Professional (32-bit); Windows Vista Ultimate, Enterprise, Business, Home Premium, and Home Basic (32-bit) SP1; or Windows XP Professional and Home Edition (32-bit) SP3 Hardware: Processor: Intel Pentium-class; 3 GHz recommended Monitor: 1024 x 768 (SVGA) resolution Memory: 512 MB RAM minimum (1 GB for Japanese language version); 1 GB or more for large
datasets (2 GB for Japanese language version) Minimum free space: 800 MB minimum (1 GB for Japanese language version); more recommended for larger datasets * Indicates an exclusive feature of the English-language version
Available on the following platforms: Windows
43
Easily find the optimal sample size Quickly perform What if analyses to find the sample size under various sets of assumptions
Whether youre an advanced or beginning statistician or researcher, youll easily identify the appropriate sample size every time for any research criteria. Strike the right balance among confidence level, statistical power, effect size and sample size by using IBM SPSS SamplePower.
I cannot believe the time IBM SPSS SamplePower saves and how easy it is to use.
Mandel Bellmore, PhD President, Block, McGibony, Bellmore & Assoc. Health and Hospital Consultants
as a table showing power and precision at select sample sizes. Easily view how sample size affects power or precision. Plus, you can embed this table in a document for informative presentations. The graph tool generates graphs relating power to sample size. The resulting graphs make it easier to see how sample size affects power. You can also include charts in documents.
specifications
Key features
Find power as a function of effect size and sample size Find sample size needed to yield a given level of power Find precision as a function of sample size (for select procedures) Display Cohens effect-size conventions (for select procedures)
Two interfaces
Classic interface for experienced users allows you to quickly enter data, modify parameters and find the appropriate sample size Easy interface that prompts you with a series of simple questions providing an executive summary of the rationale for the sample size and a detailed report*
Statistical tests
Means One-sample and paired t tests when mean equals zero or equals a specified value Precision
t test for two independent groups with variance known and unknown Proportions One-sample test that proportion = 0.50 or specific value 2x2 for independent samples 2x2 for paired samples (McNemar) ANOVA One-way Analysis of Variance and Analysis of Covariance Regression Templates for study design Error model
Continued on PG. 45
44
IBM SPSS SamplePowers Find N tool finds the sample size for the default power setting in one click. You also have the flexibility to choose different power-size settings to compare results.
The Stored Scenarios tool gives you optimum control over the flow of your research. You can vary Alpha level, power, effect size or sample size in the main screen and store your results as you continue. This illustration shows how the sample size varies as other settings, such as Alpha, are changed.
Logistic regression One continuous predictor or two continuous predictors One categorical predictor with two or more levels Survival analysis Accrual options: Subjects entered prior to first study interval, subjects entered during study at constant rate, and accrual varies Hazard rate options: Constant and varies Attrition rate options: No attrition, constant rate and rate varies
Cluster-randomised trials* One level of clustering (for example, patients within hospitals) and one level of treatment (for example, Treated vs Control) Find the optimal (most cost-effective) allocation ratio Equivalence tests Equivalence tests for means and for proportions *Indicates a new feature in IBM SPSS SamplePower 3
System requirements
Microsoft Windows 2000, XP or Vista XGA monitor Pentium class processor 105 MB drive space
Available on the following platform: Windows
45
Easily perform structural equation modeling (SEM) Quickly create models to test hypotheses and confirm relationships among observed and latent variables Move beyond regression to gain additional insight
When you conduct research, youre probably already using factor and regression analyses in your work. But whats the next step in creating more precise models and confidence in your research?
numeric variable. Its rich, visual framework lets you easily compare, confirm and refine models. Quickly build graphical models using IBM SPSS Amos simple drag-and-drop drawing tools. Models that used to take days to create are just minutes away from completion. And once the model is finished, simply click your mouse and assess your models fit. Then make any modifications and print a presentationquality graphic of your final model.
specifications
Key features
Modeling capabilities Create structural equation models with observed and latent variables Specify each individual candidate model as a set of equality constraints on model parameters Analyse data from several populations at once Save time by combining factor and regression models into a single model, and then fit them simultaneously Bayesian estimation Fit models with ordered-categorical and censored data Markov chain Monte Carlo (MCMC) simulation Computationally intensive modeling Evaluate parameter estimates with normal or non-normal data using powerful bootstrapping options Analytical capabilities and statistical functions Determine probable values for missing or partially missing data values in a latent variable model Use full information maximum likelihood estimation in missing data situations for more efficient and less biased estimates Use a variety of estimation methods, including maximum likelihood, unweighted least squares, generalised least squares, Brownes asymptotically distribution-free criterion and scale-free least squares Evaluate models using more than two dozen fit statistics, including Chi-square; AIC; Bayes and Bozdogan information criteria; Browne-Cudeck (BCC); ECVI, RMSEA and PCLOSE criteria; root mean square residual; Hoelters critical n; and Bentler-Bonett and Tucker-Lewis indices Bootstrapping of user-defined functions of the model parameters, using new User-Defined Estimands shortcut in Start menu [new feature]
Continued on PG. 47
46
IBM SPSS Amos makes structural equation modeling easy and accessible
Structural equation modelings approach to multivariate analysis encompasses and extends standard methods including regression, factor analysis, correlation and analysis of variance. IBM SPSS Amos makes SEM easy. Its the perfect modeling tool for a variety of purposes, including market research, business planning and academic program evaluation.
View output
Input data from a variety of file formats (IBM SPSS Statistics, Excel, text files, or many others). Select grouping variables and group values. Amos also accepts data in a matrix format if youve computed a correlation or covariance matrix.
Amos output provides standardised or un-standardised estimates of covariances and regression weights as well as a variety of model fit measures. Hotlinks in the help system link to explanations of the analysis in plain English. Use drag-and-drop drawing tools to quickly specify your path diagram model. Click on objects in the path diagram to edit values, such as variable names and parameter values. Or simply drag variable names from the variable list to the object in the path diagram to specify variables in your model.
Data imputation Impute numerical values for ordered-categorical and censored data Impute missing values and latent variable scores Choose from three different methods: Regression, stochastic regression and Bayesian
System requirements
Operating system: Microsoft Windows XP or Windows Vista Memory: 256 MB RAM minimum Minimum free drive space: 125 MB Web browser: Internet Explorer 6
Available on the following platform: Windows
47
Access, manipulate and model any type of data, from any source, from within a single intuitive interface Integrate predictive intelligence into your operational processes Proactively and repeatedly identify revenue opportunities and reduce costs Boost productivity through collaboration and improve scalability and performance with server-based options
users in mind, so you dont need to be an expert in data mining to enjoy its benefits. IBM SPSS Modelers intuitive graphical interface makes it easy to visualise the data mining process, and how it all fits together. From this interface, you can easily access your operational data, regardless of its format, along with text, Web 2.0 and survey data. In version 14.1, you can read from and write to a Cognos Business Intelligence environment, as well as integrate with IBM Infosphere technology, so you can gain even more predictive intelligence and integrate it into your business processes.
IBM SPSS Modeler is a powerful, versatile data and text mining workbench that helps you gain unprecedented insight from your data. Its breadth and depth of techniques allows you to build predictive models easily, efficiently and rapidly. Organisations use IBM SPSS Modeler to improve business outcomes in critical areas such as customer relationship management, marketing, fraud and risk mitigation, inventory management and resource planning.
IBM SPSS Modelers powerful automated data preparation helps you get your data ready for analysis in a single step, while automated modeling takes the guess-work out of choosing an analysis method by applying all relevant techniques with a single click. Data preparation and modeling are presented in an interactive, highly visual viewer that makes models easy to understand and communicate. Once models are created, you can apply them to new data to identify cross-sell/up-sell opportunities proactively, manage customer retention strategically and identify fraud before it happens.
48
You can install the software on your desktop and begin mining your data right away. Full integration with IBM SPSS Statistics allows you to not only use all of Modelers predictive capabilities, but to use IBM SPSS Statistics data transformation, hypothesis testing and reporting capabilities to complement and enhance your modeling efforts. Choose the IBM SPSS Modeler software that best meets your analysis needs: IBM SPSS Modeler Professional includes all of the tools needed to leverage your structured data such as behaviours and interactions tracked in your CRM systems, demographics, purchasing behaviour and sales data
IBM SPSS Modeler Premium extends the functionality of IBM SPSS Modeler Professional by including a powerful text mining workbench for extracting key concepts, sentiments and relationships from textual or unstructured data and converting them to a structured format that can be used to make predictive models more accurate
Premium, provides client/server architecture, in-database mining, SQL pushback and batch processing. In-database mining exposes modeling algorithms in operational databases such as SQL, DB2 and Oracle and allows you to leverage database resources, vastly improving performance while enabling you to use the additional algorithms IBM SPSS Modeler provides. SQL pushback allows you to push data transformations and some key models directly into your operational database, improving performance significantly by leveraging the power of the database for high-volume, mission-critical tasks.
IBM SPSS Modeler offers leading ease-of-use features such as automatic data preparation and automatic modeling, making it easy to build models that leverage single or multiple (ensemble) techniques.
IBM SPSS Modeler includes advanced, interactive visualisation for models that use a single technique, or ensemble models that combine techniques, making modeling results easy to understand and communicate.
IBM SPSS Modeler introduces the ability to read from and write to a Cognos Business Intelligence environment directly, ensuring business analysts can leverage the same view of the data across their analytical work and can easily integrate predictive intelligence into their Business Intelligence dashboards directly from the IBM SPSS Modeler interface.
IBM SPSS Modeler Premium adds the ability to integrate unstructured data (text) into modeling efforts; analysts can extract concepts and associated sentiment, visualise relationships between concepts and sentiment and structure unstructured data for use in predictive models from a single interface.
49
specifications
Key features
IBM SPSS Modeler Premium includes all of the data mining features available in IBM SPSS Modeler Professional (see p 54), as well as the following text analytics capabilities Spanish or Japanese or translate virtually any language using Language Weaver Extract domain-specific concepts such as uniterms, expressions, abbreviations, acronyms and more Calculate synonyms using sophisticated linguistic algorithms and embedded or user-specified linguistic resources Name concepts by person, organisation, term, product, location and other user-defined types Extract non-linguistic entities such as address, currency, time, phone number and Social Security number Use and customise pre-built templates and libraries for sentiment analysis, CRM, security and intelligence, market intelligence, life sciences and IT Leverage pre-packaged Text Analytics Packages (TAPs) for the most common business applications, or create your own Create clusters based on term co-occurrence using concept clustering algorithms, which provide an at-a-glance view of main topics and the way in which they are related Intelligently group text documents and records based on content, using text classification algorithms Enable advanced concept selection and deselection for use in predictive modeling Use text-based and visual reports to interrogate concept relationship, occurrence, frequency and type
50
A major healthcare survey in just six months, thanks to SPSS, an IBM Company Building society explores customer data with predictive analytics
Sharpening product development and marketing efforts
Newcastle Building Society (NBS) is the eighth largest mutual building society in the UK and the biggest building society in the North East of England with a network of 34 branches. NBS experienced dramatic changes following the economic crisis of 2007, which intensified issues surrounding customer retention, profitability and customer acquisition. Banks and building societies across the country were forced to seek more cost-effective methods of attracting new customers and retaining current ones. First, NBS had to gain a deeper understanding of its customers. NBS had previously used a blanket marketing method, but this could not distinguish between customers. So NBS implemented a marketing database (MDB) that included geo-demographic data and research from a Customer Panel of 3,000 NBS customers. NBS used IBM SPSS Statistics to understand correlations within various customer segments, to link responses back to the overall customer base, and to build propensity models. It also used the software to analyse the customer panel results. The ability to question specific demographics meant that NBS could identify smaller, more targeted groups. Mike Moses, Customer Insight Analyst at NBS, said: Our advice-led strategy is all about understanding the needs and motivations of our customers, so we are able to offer products and services that more accurately meet their requirements. Predictive analytics allows us to drill down to the core issues. Having identified the needs, motivations and preferences of our customers, we have improved our communications so they are more tailored to the individual. We have seen significant improvements in overall response rates, he concluded.
Identify links and associations between, for example, people and events or diseases and genes Identify and extract content from URLs within blogs Include opinions, semantic relationships and linked events in deployable predictive models Reveal complex relationships through interactive graphs that show multiple semantic links between two concepts xtract domain-specific concepts E such as uniterms, expressions, abbreviations, acronyms and more
Calculate synonyms using sophisticated linguistic algorithms and embedded or user-specified linguistic resources Name concepts by person, organisation, term, product, location and other user-defined types Extract non-linguistic entities such as address, currency, time, phone number and Social Security number Use and customise pre-built templates and libraries for sentiment analysis, CRM, security and intelligence, market intelligence, life sciences and IT
Leverage pre-packaged Text Analytics Packages (TAPs) for the most common business applications, or create your own Create clusters based on term co-occurrence using concept clustering algorithms, which provide an at-a-glance view of main topics and the way in which they are related Intelligently group text documents and records based on content, using text classification algorithms Enable advanced concept selection and deselection for use in predictive modeling
Use text-based and visual reports to interrogate concept relationship, occurrence, frequency and type
System requirements
Requirements vary according to platform
Available on the following platform: Windows
51
specifications
Key features Data understanding
Create a wide range of interactive graphs with automatic assistance Use visual link analysis to see the associations in your data Select regions or items on a graph and view the selected information; or select key data for use in analysis Access IBM SPSS Statistics graphs and reporting tools Use predictive modeling to automatically impute missing values Automatically generate operations for the detection and treatment of outliers and extreme values
Data transformation
Extensive data transformation, selection and manipulation techniques Partition data automatically into training, test, and validation sub-sets Access data management and transformations performed in IBM SPSS Statistics directly from IBM SPSS Modeler
Data preparation
Access structured (numeric) data from databases, delimited and fixed-width text files, IBM SPSS Statistics files, SAS files, Excel spreadsheets and IBM SPSS Data Collection survey data Apply automatic data preparation to interrogate and condition data for analysis in a single step Access unstructured (free-text) data, automatically and extract concepts from text Use RFM scoring to analyse customer value and behavior Export data to operational databases, delimited text, Excel spreadsheets, Statistics files, and SAS files
Data cleaning
Remove or replace invalid data
Decision List Interactive rulebuilding algorithm K-Means, Kohonen, Two Step, Discriminant, Support Vector Machine (SVM) Clustering and segmentation algorithms Factor/PCA, Feature Selection Data reduction algorithms Regression, Linear, GenLin (GLM) Linear equation modeling Self-learning response model (SLRM) Bayesian model with incremental learning Time-series Generate and automatically select time-series forecasting models Neural Networks Multi-layer perceptrons with back-propagation learning, and radial basis function networks Support Vector Machines Accurate performance for wide datasets Bayesian Networks Graphical probabilistic models Cox regression Calculate likely time to an event Anomaly Detection Detect unusual records KNN Nearest neighbor modeling and scoring algorithm Apriori Popular association discovery algorithm with advanced evaluation functions
CARMA Association algorithm which supports multiple consequents Sequence Sequential association algorithm for order-sensitive analyses
Evaluation
Customisation and Extensibility Through the integration of Statistics, use R to extend analysis options Use scripts to automate complex or repetitive tasks Customise the palettes to suit your style of working Use the Modeler Extension Framework to add new operations
Deployment
Export models using SQL or PMML (the XML-based standard format for predictive models) Leverage IBM SPSS Collaboration and Deployment Services for analytics management, process automation and deployment
System requirements
Requirements vary according to platform
Available on the following platform: Windows
52
Index
Lnsfrskringar ............................................23 Newcastle Building Society ...........................51 Norg en Zekerheid .........................................38 Follow the path to improved outcomes............4 Government scenario .....................................28 IBM SPSS Training ...........................................9 IBM SPSS Advanced Statistics 19 .................18 IBM SPSS Amos 19 .......................................46 IBM SPSS Bootstrapping 19 ..........................26 IBM SPSS Categories 19 ...............................36 IBM SPSS Complex Samples 19 ...................28 IBM SPSS Conjoint 19 ...................................38 IBM SPSS Custom Tables 19 .........................16 IBM SPSS Data Collection Data Entry 6 ........40 IBM SPSS Data Preparation 19......................12 IBM SPSS Decision Trees 19 .........................14 IBM SPSS Developer 19 ................................32 IBM SPSS Direct Marketing 19 ......................31 IBM SPSS Exact Tests 19 ..............................35 IBM SPSS Forecasting 19 ..............................24 IBM SPSS Missing Values 19 .........................22 IBM SPSS Modeler 14.1 ................................48 IBM SPSS Modeler Premium 14.1 .................50 IBM SPSS Modeler Professional 14.1 ............52 IBM SPSS Neural Networks 19 ......................33 IBM SPSS Regression 19 ...............................20 IBM SPSS SamplePower 3 ............................44 IBM SPSS Statistics 19 ....................................8 IBM SPSS Statistics Base 19 .........................10 IBM SPSS Statistics Server 19.......................30 IBM SPSS Text Analytics for Surveys 4 .........42 IBM SPSS Visualization Designer 1................34 Greater Manchester Police .............................14 Fiat .................................................................48 Whats new in IBM SPSS Statistics 19.............7 Who uses IBM SPSS analytics?.......................3 University of Ulster .........................................21
53