Statistical Learning

Problem 1
Statistical learning is the set of tools for understanding data. These tools can be classified as supervised or
unsupervised. Supervised statistical learning generally involves building a statistical model for predicting,
or estimating, an outcome based on one or more inputs. While, unsupervised statistical learning involves
inputs but no supervising output; nevertheless, we can learn relationships and structure from such data.
Statistical learning data generally deal with variables which are attributes that can assume different values
depending on the instance. These variables can be qualitative where the values are non-numerical or
quantitative in which the values are numerical. Data generated from the variables is fitted to some
function that relates the dependent variable to the independent variables; this function can now be used
for prediction (estimation of a dependent variables values based given data) or inference (properties of
the system function). If the outcome of the system is continuous, it is called a regression problem and if it
is categorical it is a classification problem. Also, the observed set of data points we start with to generate
the function of the system is called the training data and the set of data that was not used in generating
the function but applied to the system is the test data. The estimation of the elusive function can be
model-based, where training data set is mapped into some already established function which is
parametric modelling or no assumptions taken and just an estimate of the function that gets as close to
the data points as possible without being too rough is generated.
Problem 2
Show that
E[Y] = arg
[( ) ]
From
E[Y] = arg
[( ) ] [( ) [( [ ]) ]
i.e. the minimum of [( ) ] is when = [ ]

But
[( ) = [(( [ ]) + ( [ ] ))
=
[( [ ]) + ( [ ] ) + 2( [ ])( [ ] )]
[( [ ]) ] +
( [ ] ) ] + [2( [ ])( [ ] )]
But also
[2( [ ])( [ ] )] = 2 [( [ ])( [ ] )]
= 2( [ ] [ [ ]])( [ ] )]
=0
From let [ ] =
[ ] = [ ]=
=0
Therefore
[( [ ]) ] + ( [ ] )
[( ) ] =
But ( [ ] ) 0 ,
= [ ]
[( [ ]) ]
[( )
Therefore
Problem 3
The practical benefit of the equations show that the mean is the statistic function of our data that
minimizes the mean square error (MSE) i.e. the lower bound for MSE.
Show that
[(
( )
( ))
( ) ( ) ) +
+(
( )
= ( )+
Given
[(
( ))
( ))
( )) + ( ( )
= [( (
= [(
( )) ] + [
= [ ( )+
( )
( )
( )) ] + [
( )
( )))
( ))
( )
( )( ( )
+2 (
( ))
( ))
+ 2 [( ( ) +
( ) ]
= [( ) ] + [
( )
( ))
= [( ) ] + [
( )
( ))
+2
( )
( )
[ ]=0
( )
Also from
[( [ ]) ] + ( [ ] ) replacing Y with ( )and c with
[( ) ] =
same principle
[
( )
( ))
( )
( ) )
+(
( ) and applying the
( ) ( ))
Therefore,
[(
( )=
Since
[(
( ))
( ))
[( ) ] +
[( ) ]
=
( )
( ) )
( ( )
( )+
( )
+(
( ) )
( ) ( ))
[ ( )] = ( ))
+ ( [ ( ) ( )])
proved
Problem 4
(b) The trainset contains 80 observations and 31 observations in the test set and in total we have 111
observations.
(c) There are four variables with values as shown in the table below:
Variables
Range (max-min)
Ozone
167
Radiation
327
Temperature
40
wind
18.4
(d) The plot below shows scatterplots for all pairs
Mean
42.1
184.8
77.8
9.9
Standard deviation
33.274
91.152
9.530
3.559
Pearson correlation co-efficient values

ozone
radiation
temperature
wind
ozone
1.0000000
0.3483417
0.6985414
-0.6129508
radiation
0.3483417
1.0000000
0.2940876
-0.1273656
temperature
0.6985414
0.2940876
1.0000000
-0.4971459
wind
-0.6129508
-0.1273656
-0.4971459
1.0000000
The range of the Pearson correlation co-efficient in general is 1.00000 (0.61295) = 1.61295
Correlation of zero value implies that that the variables involved are independent of one another I.e.
there is no linear relationship between the variables.
The correlation between wind and all other variables is negative hence, as values for wind decreases all
other variables value increases while all other pairs of variables yield a positive correlation in which case
they are directly proportional.
This inference can be seen visually.
(f) Below is a scatter plot for the true responses and predicted response
The
= (
) is 8208.509
And the correlation of the true responses to the predicted values is

actual Value
predicted
actual Value
1.0000000
0.8268958
(g)
Plot of RSS on k for the test dset
The most suitable value of k is 5 having the lowest RSS value

KNN assumes the response is categorical
predicted
0.8268958
1.0000000
(h) I would select the linear model with better RSS values and correlation which is obvious since the
categories used for the KNN is too much.

Statistical Learning

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Statistical Learning

Hochgeladen von

Copyright:

Verfügbare Formate

Problem 1

i.e. the minimum of [( ) ] is when = [ ]

( ) and applying the

Pearson correlation co-efficient values

And the correlation of the true responses to the predicted values is

The most suitable value of k is 5 having the lowest RSS value

Das könnte Ihnen auch gefallen