Sie sind auf Seite 1von 21

Estimating Parameters

• Model parameters are unknown

• Estimate of expected Y

• Parameters

• Fitting a model to the data


Ordinary Least Squares (OLS)

• Purpose:
• One way to fit a model to data
• Used to explain the variability in the data
that is not explained by the model
• Also known as:
• Error or residual sum of squares
• SSE, SSR, RSS Observed
point
• Method:
Error
• Fit many lines through the data Observed
• Find the vertical distance from each point point
to the fitted line at the same x-value Predicted
• Square each difference and sum them to points fall
calculate the SSE on the line
• The line with the smallest SSE is the
Ordinary Least Squares Line and will be
used for the parameter estimates
• Interpretation:
• A perfect fit of data points to the line is
SSE = 0
• A poor fit of data has a high SSE value
Ordinary Least Squares equations
Least squares regression analysis

Total Sum of Squares (SST) Residual Sum of Squares (SSE) Model Sum of Squares (SSM)
• Compare points to the average y • Compare points to the expected y • Compare the expected y to the average y

• This is the total variation in Y due • This is the comparison of actual Y • This is the variation in Y due to expected
to the actual data points values to expected Y values Y values from the model

2 2
𝑆𝑆𝑇 = 𝑦𝑖 − 𝑦 2 𝑆𝑆𝐸 = 𝑦𝑖 − 𝑦𝑖 𝑆𝑆𝑀 = 𝑦𝑖 − 𝑦
Least squares equations

𝑆𝑆𝑇 = 𝑆𝑆𝐸 + 𝑆𝑆𝑀

𝑆𝑆𝑇 = 𝑦𝑖 − 𝑦 2

𝑆𝑆𝐸 = 𝑦𝑖 − 𝑦𝑖 2

2
𝑆𝑆𝑀 = 𝑦𝑖 − 𝑦
Mean Square Error (MSE) Root Mean Square Error (RMSE)

• Purpose: • Purpose:
• Measures how close the fitted line is to • Measures the average distance from the
the data points thus evaluates the data points to the fitted line
performance of the estimator
• Also known as:
• This will estimate the variance and bias
of the error variable (𝜀) • Residual mean square

• Also known as: • Method:


• Residual mean square • Define the SSE

• Method: • Interpretation:
• Define the SSE • The smaller the RMSE the better the fit
• Identify the number of observations (N) • This is the same unit size as the
dependent variable, so a small RMSE is
• Identify the number of parameters
contextual to the range of values
• In 𝑌 = 𝛽0 + 𝛽1 𝑋
there are 2 parameters (𝛽0 𝑎𝑛𝑑 𝛽1 )

• Interpretation: 𝑅𝑀𝑆𝐸 = 𝑀𝑆𝐸


• The smaller the MSE the closer the fit
• This value is used for the F-test

2
𝑆𝑆𝐸
𝑀𝑆𝐸 = 𝜎 𝑜𝑓 𝜀 =
𝑁 − #𝑝𝑎𝑟𝑎𝑚
Coefficient of Determination (𝑅2 )

• Purpose:
• The proportion of variance of Y that is
explained by the model
• Method:
𝑆𝑆𝐸
𝑅2 = 1 −
𝑆𝑆𝑇
• Interpretation:
• Ranges from 0 to 100%
• In a perfectly fit model, SSE = 0 and R2=1
• Limitations:
• Cannot identify bias

docs.statwing.com

http://work.thaslwanter.at/Stats/html/statsRelation.html
Pearson Correlation Coefficient (𝑟)

• Purpose:
• Measures the strength of the correlation
between to variables (change to measure
the linear association between the two )
• Method:
r = 𝑅2
• Interpretation:
• Ranges from -1 to 1
• In a perfectly fit model r = 1 or -1
• A poorly fitted model r = 0
• Limitations:
• There may be a distinct pattern, but this
measurement may not capture it

http://work.thaslwanter.at/Stats/html/statsRelation.html
T-test

• Purpose:
• This test is used to determine if there is a
significant linear relationship between
the independent and dependent
variables
• Test if the intercept and the slope of the
regression line are non-zero

• Method:
• Choose a significance level (often 0.05)
• Identify the degrees of freedom (n=2)

• Interpretation:
• Look at the associated p-value
• If the p-value is below your chosen level
of significance, then the null hypothesis
is rejected (ie. the slope is not 0)
T-distribution

http://financialexamhelp123.com/

P-values

http://ichthyosapiens.com/School/Statistics/ttable.jpg
F-test
• Purpose:
• This is another lack of fit test that
tests the null hypothesis: slope = 0
• This test is different from the t-test
since it uses the sums of squares
• Method:
• Review the output from the ANOVA
• Choose a significance level (often
0.05)
• Identify the degrees of freedom for
both MSM and MSE

• Interpretation:
• Look at the associated p-value
• If the p-value is below your chosen
level of significance, then the null
hypothesis is rejected (ie. the slope
is not 0)
Confidence Intervals
• Purpose:
• This is a range of parameter estimates for a given x-
vale rather than a single y-value estimation
• Method:
• Uses the:
• estimate of the slope
• the standard error of the slope
• a chosen level of significance
• the associated t-value and df

• Interpretation:
• Describes a range of Y-values where an expected
percentage of X-values (the LOS) can be found
• The CI is shorter around the mean X
• It is important that the value 0 is not found within
the interval or the null hypothesis of the slope is http://work.thaslwanter.at/Stats/html/statsRelation.html
rejected
SLR and ANOVA Output

www.southalabama.edu
Output from SAS, R, Minitab

SAS
R

Minitab
Residual Plots

• Purpose:
• Evaluate the fit of the model
• Also known as:
• Standardized residuals (same Y axis)
• Studentized residuals (divided by an
estimate of its standard deviation)
• Method:
• Plot predicted values against the
standardized residuals
• Look for patterns in the residual plots
• Interpretation:
• Should be symmetrically distributed
around the 0 line
• Clustered around the lower digits docs.statwing.com
• No obvious patterns
Residual Plots that Meet Requirements Plots that Indicate Poor Model Fit

docs.statwing.com
docs.statwing.com
Example: Loblolly data in R
Linear relationships of 3 measurements
Empirical Solutions for Models with Poor Residual Fit

Example: Revenue Data


• Data is not normally distributed
• Try transforming the y values
• Square root the data
• Log-transform the data

docs.statwing.com
Empirical Solutions for Models with Poor Residual Fit

http://stattrek.com/

Das könnte Ihnen auch gefallen