24 views

Original Title: Penalized Regression

Uploaded by Pino Bacada

- 10.1.1.53
- Ecological amplitude of plants.pdf
- hw1
- HTK Book(Building HMM)
- Burr Type III Software Reliability Growth Model
- Op Rob It
- MLE2
- Lecture 06
- bb
- 1
- Chapter 09
- 1411.6387v2
- TeachingQM
- Prml Slides 1
- 187983404-93-454-1-PB
- Linear Logistic Regression Proofs
- 171411.full
- Estimation of Claim Cost Data Using Zero Adjusted Gamma and Inverse Gaussian Regression Models
- raynal-villasenor
- 11-2001-ganchev

You are on page 1of 19

Penalized likelihoods

Scaling and invariance

Patrick Breheny

August 30

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

Maximum likelihood

estimation:

One begins by specifying a distribution f (x|) for the data

This distribution in turn induces a log-likelihood function

L(|x), which specifies how likely the observed data is for

various values of the unknown parameters

Estimates can be found by finding the most likely values of

the unknown parameters

Hypothesis tests and confidence intervals can be carried out

by evaluating how the likelihood changes in response to

moving away from these most likely values

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

but is often unsatisfactory in regression problems for two reasons:

Large variability: when p is large with respect to n, or when

columns of X are highly correlated, the variance of is large

Lack of interpretability: if p is large, we often desire a smaller

set of predictors in order to gain insight into the most relevant

relationships between y and X

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

Subset selection

a subset of important explanatory variables based on ad hoc

trial and error using p-values to guide whether a variable

should be in the model or not

However, this approach is impossible to replicate and therefore

to study the statistical properties of

A more systematic approach is to exhaustively search all

possible subsets and then use some sort of criterion such as

AIC or BIC to choose an optimal subset

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

possible subsets grows exponentially with p

With todays computers, searching all possible subsets is

computationally infeasible for p much larger than 40 or 50

For this reason, modifications such as forward selection,

backward selection, and forward/backward hybrids have been

proposed

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

discontinuous, in the sense that an infinitesimally small

change in the data can result in completely different estimates

As a result, subset selection is often unstable and highly

variable, especially in high dimensions

As we will see, penalized regression allows us to accomplish

the same goals as subset selection, but in a more stable,

continuous, and computationally efficient fashion

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

inferential results from ordinary least squares models following

subset selection as if the model had been prespecified from

the beginning

This is unfortunate, as the resulting inferential procedures

violate every principle of statistical estimation and hypothesis

testing:

Test statistics no longer follow t/F distributions

Standard errors are biased low, and confidence intervals falsely

narrow

p-values are falsely small

Regression coefficients are biased away from zero

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

penalty: instead of maximizing `(|x) = log{L(|x}, we maximize

the function

M () = `(|x) P ()

where

P is a function that penalizes what one would consider less

realistic values of the unknown parameters

controls the tradeoff between the two parts

The function M is called the objective function

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

(usually squared error loss) rather than maximizing likelihood; an

equivalent formulation is that we estimate by minimizing

M () = L(|x) + P ()

proportional to a negative log likelihood, such as the residual sum

of squares)

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

Penalty functions

In the first section of this course, we mean that coefficient

values around zero are more believable than those far away

from zero, and P is therefore a function which penalizes

coefficients as they get further away from zero

The two penalties that we will cover in depth in this section of

the course are ridge regression and the lasso:

p

X

Ridge: P () = j2

j=1

Xp

Lasso: P () = |j |

j=1

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

fitting curves to data, to favor simple, smoother curves over

wiggly, erratic ones; here, P is a measure of the wiggliness

of the curve

There are also plenty of other varieties of penalization in

existence that we will not address in this course

The all operate on the same basic principle, however: all other

things being equal, we would tend to favor the simpler

explanation over the more complicated one

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

between the penalty and the fit (loss/likelihood):

When is too small, we tend to overfit the data and end up

models that have high variance

When is too large, we tend to underfit the data and end up

with models that are too simplistic and thus potentially biased

The parameter is called the regularization parameter;

changing the regularization parameter allows us to directly

balance the bias-variance tradeoff

Obviously, selection of is a very important practical aspect

of fitting these models

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

Bayesian interpretation

arising from a prior distribution on the parameters

The objective function is then (proportional to) the log of the

posterior distribution of given the data

By optimizing this objective function, we are finding the mode

of the posterior distribution of

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

Constrained regression

they imply a constraint on the values of

Suppose we were trying to maximize the likelihood subject to

the constraint that P () t

A standard approach to solving such problem is to introduce a

Lagrange multiplier; when we do so, we arrive at the same

objective function as earlier

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

around an assumption that coefficient values around zero are

more believable than those far away from zero

Some care is needed, however, in the application of this idea

First of all, it does not make sense to apply this idea to the

intercept, unless you happened to have some reason to think

that the mean of y should be zero

Hence, the intercept is not included in the penalty; if it were,

the estimates would not be invariant to changes of location

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

Standardization

the same thing for all the variables

For example, suppose x1 varied from 0 to 1, while x2 varied

from 0 to 1 million; clearly, a one-unit change in x does not

mean the same for both of these variables

Thus, the explanatory variables are usually standardized prior

to model fitting to have mean zero and standard deviation 1;

i.e.,

xj =0

T

xj xj =n

for all j

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

Standardization (contd)

Any location shifts for X are absorbed into the intercept

Scale changes can be reversed after the model has been fit:

xij

xij j = aj

a

= xij j ;

divide the transformed solution j by a to obtain j on the

original scale

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

Further benefits

Centering and scaling the explanatory variables has added benefits:

term, meaning that in the standardized covariate space,

0 = y regardless of what goes on in the rest of the model

In other words, if we center y by subtracting off its mean, we

dont even need to estimate 0

Also, standardization simplifies the solutions; recall from BST

760 that for simple linear regression

= y x

P

(yi y)(xi x)

=

(xi x)2

P

the much simpler expression = xT y/n

Patrick Breheny BST 764: Applied Statistical Modeling 18/19

Problems with maximum likelihood and subset selection

Penalized likelihoods

Scaling and invariance

Standardization summary

each explanatory variable to have mean 0 and standard deviation 1

has the following benefits:

Estimates of are location-scale invariant

Computational savings (we only need to estimate p

parameters, not p + 1)

Simplicity

No loss of generality, as we can transform back to the original

scale once we finish fitting the model

- 10.1.1.53Uploaded bySyed Rahman
- Ecological amplitude of plants.pdfUploaded byMichaiel Piticar
- hw1Uploaded bywheretheworldmeets
- HTK Book(Building HMM)Uploaded bySayf Hugo
- Burr Type III Software Reliability Growth ModelUploaded byInternational Organization of Scientific Research (IOSR)
- Op Rob ItUploaded bysamanthazc
- MLE2Uploaded byLetra Mirez Dos
- Lecture 06Uploaded byJuan Luis
- bbUploaded byFumaça Persa
- 1Uploaded bymarkeduard
- Chapter 09Uploaded byArinze3
- 1411.6387v2Uploaded byCreativ Pinoy
- TeachingQMUploaded bytotissoit
- Prml Slides 1Uploaded bymohammadalihumayun
- 187983404-93-454-1-PBUploaded bySegredo Silva
- Linear Logistic Regression ProofsUploaded byHarold Valdivia Garcia
- 171411.fullUploaded byappeared14
- Estimation of Claim Cost Data Using Zero Adjusted Gamma and Inverse Gaussian Regression ModelsUploaded byTanadkit Munkonghat
- raynal-villasenorUploaded bysipil123
- 11-2001-ganchevUploaded byMustafiz Rahman
- Ipsos Poll conducted for Reuters, May 2012 - Afghanistan PolicyUploaded bybyline beat
- Sufficient Statistics and Exponential FamilyUploaded byuser31415
- The Adaptive Lasso and Its Oracle PropertiesUploaded bydjtere2
- alesina_instabilitygrowthUploaded byLaura Laktamilena
- Buffon s NeedleUploaded byPurbayan Chakraborty
- OnExtensionOfWeibullDistributionWithBayesianAnalysisUsingSPlusSoftware.PDFUploaded byDr. Amarjeet Singh
- 2019 06 Exam Stam SyllabiUploaded byAbdul Hafeez
- Influencia local para modelos lineares elipticos - Shuangzhe Liu 2000.pdfUploaded byÉrica Vieira Nogueira
- Descriptive StatisticsUploaded byTaryciigirlfun BLack ID
- Jones Et Al 2007 AbacusUploaded byAgus Reza Pahlevi

- Athens Feb 2017Uploaded byPino Bacada
- jit_103-1pww4m5Uploaded byPino Bacada
- The Economic Consequences of the PeaceUploaded bycypres08
- Baltagi PoissonUploaded byPino Bacada
- Kee Et Al-2009-The Economic JournalUploaded byPino Bacada
- Trust No One - Security and International TradeUploaded byPino Bacada
- 1601 KEYNES National Self-sufficiency 1933Uploaded byPino Bacada
- Trade Unit Values_wp2011-10Uploaded byPino Bacada
- Malchow MøllerMunchSkaksen2015Uploaded byPino Bacada
- Guren_HowToGiveALunchTalkUploaded byPino Bacada
- Baysian PoissonUploaded byPino Bacada
- text.pdfUploaded byChristian Merino Maravi
- The State of Applied EconometricsUploaded byPino Bacada
- Integral UtilityUploaded byPino Bacada
- Committment to a Course of ActionUploaded byPino Bacada
- ForeachUploaded bySteve Su
- Nightmares of Eminent Person2011 - Bertrand RusselUploaded byPino Bacada
- Modified.cholesky.decompositionUploaded byPino Bacada
- Baltagi PoissonUploaded byPino Bacada
- Threshold TobitUploaded byPino Bacada
- Standard Retributivo Brancaccio 020311Uploaded byPino Bacada
- Classificazione Discipline Economic HeUploaded byPino Bacada
- Classificazione Discipline Economic HeUploaded byPino Bacada
- Bootstrap - Shalizi.pdfUploaded byPino Bacada
- Configurazione Per Windows VistaUploaded byPino Bacada
- Critica Del Liberoscambismo Di SinistraUploaded byPino Bacada
- Integral UtilityUploaded byPino Bacada
- The Ramsey ModelUploaded bynickedwinjohnson
- Centering in Multilevel ModelUploaded byPino Bacada

- ECOR 1010 Lab 5Uploaded byXuetong Yan
- CHEM Lab 10 Conductive TitrationUploaded bySage Osicka
- Dissertation - a good report on motivationUploaded by@12
- When to Push or PullUploaded byLevy
- Team_Peer_Evaluation_Examples.pdfUploaded byĐỗ Thanh Thảo
- An Introduction to Multivariate Statistical Analysis.pdfUploaded byRomán A. Pérez
- essay 4Uploaded byAlejandroZamudioRivera
- BS-1881-Part-201-86Uploaded byYohanesGilbert
- A Flexible Lognormal Sum ApproximationUploaded byAhmed Albidhany
- 015_WH_1003 Inspection of Castings for General Pressure ServiceUploaded byAhmed
- Assumptions of Simple and Multiple Linear Regression ModelUploaded byDivina Gonzales
- (Springer Series in Statistics) R.-D. Reiss (auth.)-Approximate Distributions of Order Statistics_ With Applications to Nonparametric Statistics-Springer-Verlag New York (1989).pdfUploaded byNicko Villar
- Analiza Empirica CSR CommUploaded byAndreea Tudose
- Assignment-bfc 34303 StatisticsUploaded byNur Azmina
- 3_Chapter3_Methodology_AndreaGorra.pdfUploaded byasa
- Applying the ACSM Preparticipation Screening Algorithm to U.S. AdultsUploaded byPaisan Ngerndee
- Table of contents of PAYROLL SYSUploaded byChester Barry
- REFERENCE MATERIALS IN QUALITY ASSURANCE.pdfUploaded byyousrazeidan1979
- Stats MidtermUploaded byNhi Thi Tam Tran
- European Union Power and Policy Making PDFUploaded byRyan
- Common Tools or Ancient Advanced Technology How Did the Egyptians Bore Through GraniteUploaded bymacovei003
- Production and Operations ManagementUploaded byYash koradia
- SyllabusUploaded byManoj Das
- bayesianUploaded byAnonymous 61kSe1M
- Short Case #3 the National Tax ServiceUploaded byVitor Costa
- JCBoemer_On_Stability_of_Sustainable_Power_Systems_June2016 Good with DG penetration with Data.pdf1.pdfUploaded bySe Samnang
- Competitive Strategy Performance MeasurementUploaded byKim Jaenee
- SurveyUploaded byPedroJunio
- Questionnaire for t&c OverrunUploaded byDeepak Kumar
- Past Exam2Uploaded byShyama Sundari Devi