Sie sind auf Seite 1von 4

Nonlinear Regression: Modern Approaches and

Applications
What is Nonlinear Regression?
Nonlinear regression is a form of regression analysis in which observational data are modeled by
a function which is a nonlinear combination of the model parameters and depends on one or
more independent variables. In the past, advanced modelers would work with nonlinear
functions, including exponential functions, logarithmic functions, trigonometric functions, power
functions, Gaussian function, and Lorenz curves. Some of these functions, such as the
exponential or logarithmic functions, would then be transformed so that they would be linear.
When so transformed, standard linear regression would be performed, but the classical approach
has significant problems, especially if the modeler is working with larger datasets and/or if the
data includes missing values, nonlinear relationships, local patterns and interactions.
This paper, and others, will cover improvements to conventional and logistic regression, and will
include a discussion of classical, regularized, and nonlinear regression, as well as modern
ensemble and data mining approaches. We will begin with Multivariate Adaptive Regression
Splines (MARS).
Nonlinear Regression Techniques:

Logistic Regression
Regularized Regression: GPS Generalized Path Seeker
Nonlinear Regression: MARS Regression Splines
Nonlinear Ensemble Approaches: TreeNet Gradient Boosting; Random Forests; Gradient
Boosting incorporating RF
Ensemble Post-Processing: ISLE; RuleLearner

This whitepaper will focus on MARS nonlinear regression and offer case study examples.

What is "MARS" Nonlinear Regression


MARS, an acronym for Multivariate Adaptive Regression Splines, is a multivariate nonparametric regression procedure introduced in 1991 by world-renowned Stanford statistician and
physicist, Jerome Friedman (Friedman, 1991). Salford Systems? MARS nonlinear regression,
based on the original code, has been substantially enhanced with new features and capabilities in
exclusive collaboration with Friedman.

How does MARS nonlinear regression differ from linear


regression?

Salford Systems 2013

Linear regression models typically fit straight lines to data. MARS approaches model
construction more flexibly, allowing for bends, thresholds, and other departures from straightline methods. MARS builds its model by piecing together a series of straight lines with each
allowed its own slope. This permits MARS to trace out any pattern detected in the data.

Overview of the MARS Nonlinear Regression Methodology


The MARS nonlinear regression procedure builds flexible nonlinear regression models by fitting
separate splines (or basis functions) to distinct intervals of the predictor variables. Both the
variables to use and the end points of the intervals for each variable-referred to as knots-are
found via a brute force, exhaustive search procedure, using very fast update algorithms and
efficient program coding. Variables, knots and interactions are optimized simultaneously by
evaluating a "loss of fit" (LOF) criterion. MARS chooses the LOF that most improves the model
at each step. In addition to searching variables one by one, MARS also searches for interactions
between variables, allowing any degree of interaction to be considered.
The "optimal" MARS nonlinear regression model is selected in a two-phase process. In the first
phase, a model is grown by adding basis functions (new main effects, knots, or interactions) until
an overly large model is found. In the second phase, basis functions are deleted in order of least
contribution to the model until an optimal balance of bias and variance is found. By allowing for
any arbitrary shape for the response function as well as for interactions, and by using the twophase model selection method, MARS is capable of reliably tracking very complex data
structures that often hide in high-dimensional data.

Core Capabilities
MARS core capabilities include:

Automatic variable search Large numbers of variables are examined using efficient
algorithms, and all promising variables are identified.
Automatic variable transformation Every variable selected for entry into the model is
repeatedly checked for non-linear response. Highly non-linear functions can be traced
with precision via essentially piecewise regression.
Automatic limited interaction searches MARS repeatedly searches through the
interactions allowed by the analyst. Unlike recursive partitioning schemes, MARS
models may be constrained to forbid interactions of certain types, thus allowing some
variables to enter only as main effects, while allowing other variables to enter as
interactions, but only with a specified subset of other variables.
Variable nesting Certain variables are deemed to be meaningful (possibly non-missing) in
the model only if particular conditions are met (e.g., X has a meaningful non-missing
value only if categorical variable Y has a value in some range).
Built-in testing regimens The analyst can choose to reserve a random subset of data for
test, or use v-fold cross-validation to tune the final model selection parameters.

Applications
Salford Systems 2013

This new, flexible regression modeling tool is applicable to a wide variety of data analyses,
particularly those in which variables possibly may be in need of transformation and interaction
effects are likely to be relevant. The software can assist a data analyst to rapidly search through
many plausible models and quickly identify important interactions-insights that can lead to
significant model improvements. Further, because the software can be exploited via intelligent
default settings, for the first time analysts at all levels can easily access MARS innovations.

Graphical User Interface


Salford Systems' MARS has an easy-to-use, intuitive graphical user interface (GUI). As shown
below, the interface allows the user to control the variables and functional forms to be entered
into the model and the interactions to be considered or forbidden, while allowing the MARS
nonlinear regression algorithm to optimize those parts of the model the analyst chooses to leave
free.
Once the model is selected, the user can easily remove or add terms, instantly see the impact of
changes on model fit, review diagnostics that assist in model selection, save the model and apply
the model to new data for prediction. Other MARS GUI features include an optional
batch/command-line mode, spreadsheet-style browsing of the input data set, and summary text
reports. The enhanced MARS text report includes extensions to the "classic" output (e.g.,
addition of residual sums of squares, loglikelihood, and other useful diagnostics), making the
results easier to comprehend and assisting the analyst in refining the model in subsequent runs.
In addition, the MARS interface provides all essential data management facilities for:

New variable creation and deletion,


Aorting, merging, and concatenating of data sets,
Deletion of cases,
Random, stratified and exact count sampling, and
Filtering of cases into training and/or test and hold-out samples.

Visualization of Results
In addition to summary text reports, MARS results are also displayed in the Results dialog box.
The GUI output includes ANOVA decomposition, variable importance, and final model tables as
well as graphical plots. MARS automates both the selection of variables and the non-parametric
transformation of variables to achieve the best model fit. Variable transformation is
accomplished implicitly through the piecewise regression function used by MARS to trace
arbitrary non-linear functions. MARS communicates this non-parametric transformation
graphically, displaying the predicted response as a function of either one or two variables.
MARS automatically produces 2-D plots for main effects (response variable as a function of
each predictor) and 3-D surface plots for interactions, with options to spin and rotate. For higherorder interactions, the user can choose slices of the function for display of 2-D and 3-D
subspaces. Examples of main effects and interaction plots are shown below.
References

Salford Systems 2013

Friedman, J. H. (1991a), Multivariate Adaptive Regression Splines (with discussion), Annals of


Statistics, 19, 1-141(March).
Steinberg, D. and Colla, P. L., (1995), CART: Tree-Structured Nonparametric Data Analysis,
San Diego, CA: Salford Systems.
Steinberg, D., Colla, P. L., and Kerry Martin (1999), MARS User Guide, San Diego, CA:
Salford Systems.

Salford Systems 2013

Das könnte Ihnen auch gefallen