Beruflich Dokumente
Kultur Dokumente
its models
and what corporate credit risk modeling desk can do about it?
the distinction between expected and unexpected loss is important when dealing w
ith a diversified portfolio of exposures. The expected loss
Appendix 3A
A1 Derivation of Least Squares Estimates
A2 Linearity and Unbiasedness Properties of Least-Squares Estimators
A3 Variances and Standard Errors of Least-Squares Estimators
A4 Covariance between Beta1 and Beta2
A5 The Least-Squares Estimatorsof Beta2
A6 Minimum-Variance Property of Least-Squares Estimators
A7 Consistency of Least-Squares Estimators
5.1 Statistical Prerequisites
probability
probability distributions
Type I and Type II errors
level of siginificance
power of a statistical test
confidence interval
5.2
5.3
5.4
5.5
Furthermore, the results of the ROC/AUC, power statistics and KS statistic showe
d there is little differencein the performance of the survivalmodels and the log
istic regression
The model shows no imporvement in performance but has certain advantages compare
d to the current model. This model requires significantless data cleaning becaus
e of the model estimates the survival probability over the entire data set, in c
ontrast to logistic regression that only estimates the survival probablity for a
fixed time interval.
Some remarks for further researchare the incorporation of truncation into the su
rvival functions. This is another type of missing data and is not developed beca
use it was beyond the scope of this thesis. Furthermore, the logrank transformat
ion outperforms the logrank transformation and is recommended. This should be re
searched further
censored data
truncated data
3.3 Types of survival models
3.3.1 Kaplan Meier estimator
3.3.2 Parametric models
3.3.3 Accelerated failure time
3.3.4 Fully parametric proportional hazards model
3.3.5 Cox proportional hazards model
3.4 Comparison
although between the different models there are similarities, the models are ver
y different
The advantage of the KM estimator are that it is easy to compute and to interpre
t. Furthermore it doesn't reuire any assumptions about a baseline. One of the ma
in drawbanks of this estimator is that it doesn't account for variables that are
related to the survival time
It is a descriptive estimator and only describes the estimation of the survival
function of the population. Therefore it is only an appicable to homegenous samp
les. this model can be used in order to get a quick impression of the survival f
unction of a population
the AFT model assumes that a covariate is able to accelerate or decelerate the
ime to a certain event by some constant. These models have two main advatnges:
hey are very easy interpreted and are more robust to mitted covariates and the
ess affected by the choice of probability ditribution compared to proportional
azards model
t
t
l
h
The bsic idea behind proportional hazard models is that the effect of the covari
ate is to multiply the baseline hazard by some constant. In order to use these m
odels the proportional hazards assumption should hold. This assumption states th
at the risk of default of different groups is constant over time. For example if
at the start facility 1 has a risk of default twice as hihigh as Facility 2, th
e the risk of default for Facility 1 should twise as high everywhere in time. Th
ere are 2 types of PH models: parametric PH models and COx PH models. The differ
ence is that the parametric PH models assume the baseline hazard function follow
s a specific distribution whereas the Cox model does not make assumptions about
the baseline. The Cox model makes estimations on the basis of the rank of the su
vival times
For the popularity of the Cox PH model are several reasons
First the model does not require any assumptions about the baseline, the model i
s robust, flexible and a safe choice in many cases. Furthermore, the model is ca
pable of handling discrete and continuous measures of event times and is it poss
ible to incorporate time-dependent covariates, in order to correct for changes i
n value of covariates over the course of the observation periods.
The Cox PH model is chosenfor the development o the model
In traditional approaches the split of the factors was based upon good-bad ratio
(default rate) or similar measures.
Tong et al stratified on the home
1. distribution of American baseball players' salaries in 1994. The horizontal a
xis shows salaries in millions of dollars, andth
chapter 8 Multiple regression analysi: the problem of inference
8.3 hypothesis Testing about individual regression coefficients
8.4 Testing the overall significance of the sample regression
The Analysis of Variance Approach to Testing the Overall Signifiance of an Obser
ved Multiple Regression: the F Test
Testing the Overall Signifiance of a Multiple Regression: The F Test
An Important Relationship between R2 and F-test
Testing the Overall Signifiance of a Multiple Regression in terms of R2
Tesing the Equality of Two Regression Coefficients
1. This chapter extended and refined the ideas of interval estimation and hypoth
esis testing first introduced in Chapter 5 in the context of the two-variable li
near regression model
2. In a multiple regression, testing the individual significance of a partial re
gression coefficient (using the t test) and testing the overall signifance of th
e regression (all partial slope coefficients are zero or R2=0) are not the same
thing.
3. In particular, the finding that one or more partial regression coefficients a
re statistically insignificant on the basis of the individual t test does not me
an that all partial regression coefficients are also (collectively) statistical
insignificant. The latter hypothesis can be tested only by the F test.
4. the F test is versatile in that it can test a variety of hypotheses,such as w
hether
(1) an individual regression coefficient is statistically significant
(2) all partial slope coefficients are zero
(3) two or more coefficients are statistically equal
(4) the coefficients satisfy some linear restrictions
(5) there is structural stability of the regression model
5. as in the two-variable case, the multiple regression model can be used for th
e purpose of mean and/or individual predictions
Chapter 9 Dummy Variable Regression Models
In Chapter 1 we discussed
several topics related to dummy variables are discussed in the literature that a
re rather advanced, including
(1) random, or vayring parameters models
(2) switching regression models
(3) disequilibrium models
In the regression models considered in this text it is assumed that the paramete
rs,the beta's are unkown but fixed entities. The random coefficient models-and t
here are several versions of them--asume the beta's can be random too. A major r
eference work in this area is by Swamy.
10.1 The Nature of Multicollinearity
10.2 Estimation in the Presence of Perject Multicollinearity
10.3 Estimation in the Presence of "High" but "Imperfect" Multicollinearity
10.4 Multicollnearity: Much Ado about nothing? Theoretical Consequences of Multi
collinearity
10.5 Practical Consequences of Multicollnearity
Large Variances and Covariances of OLS Estimators
Wider Confidence Intervals
"Insignificant" t Ratios
Variance-inflating Factor (VIF)
A High R2 but few significant t Ratios
Sensitivity of OLS Estimators and Their Standard Errors to Small Changes in Data
Consequences of Micronumerosity
1. One of the assumptions of the classical linear regression model is that there
is multi-collinearity among the explanatory variables, the X's. Broadly interpr
eted, multi-collinearity refers to the situation where there is either an exact
approximately exact linearity relationship among the X variables
2. The consequences of multicollinearity are as follows: if there is perfect col
linearity among the X's, theier regression coefficients are, there regression co
efficients are indeterminate and their standard errors are not defined. If colli
nearity is high but not perfect, estimation of regression coefficients is possib
le but their standard errors tend to be large. As a result, the population value
s of the coefficients cannot be estimated precisely. However, if the objective i
s to estimate linear combinations of these coefficients, the estimable functions
, this can be done even in the resence of perfect multicollinearity
3. Although there are no sure methods of detecting collinearity, there are sever
al indicators of it, which are as follows:
(a) the clearest sign of multicollinearity is when R2 is very high butnon of the
regression coefficients is statiscally significant on the basis of the conventi
onal t test. This case is, of course, extreme.
(b) In models involving just two explanatory variables, a fairly good idea of cl
iineatiy can be obtained by examining the zero-order, or simple, correlation coe
fficient between the two variables. If this correlation is high, multicollineari
ty is generally the culprit
(c) However, the zero-order correlation coefficients can be misleading inmodels
involving more than 2 variables isnce it is possible to have low zero-order corr
elations and yet find high multicollinearity. In situations like these, one may
need to examine the partial correlations coefficients