Sie sind auf Seite 1von 6

Problem 1

Statistical learning is the set of tools for understanding data. These tools can be classified as supervised or
unsupervised. Supervised statistical learning generally involves building a statistical model for predicting,
or estimating, an outcome based on one or more inputs. While, unsupervised statistical learning involves
inputs but no supervising output; nevertheless, we can learn relationships and structure from such data.
Statistical learning data generally deal with variables which are attributes that can assume different values
depending on the instance. These variables can be qualitative where the values are non-numerical or
quantitative in which the values are numerical. Data generated from the variables is fitted to some
function that relates the dependent variable to the independent variables; this function can now be used
for prediction (estimation of a dependent variables values based given data) or inference (properties of
the system function). If the outcome of the system is continuous, it is called a regression problem and if it
is categorical it is a classification problem. Also, the observed set of data points we start with to generate
the function of the system is called the training data and the set of data that was not used in generating
the function but applied to the system is the test data. The estimation of the elusive function can be
model-based, where training data set is mapped into some already established function which is
parametric modelling or no assumptions taken and just an estimate of the function that gets as close to
the data points as possible without being too rough is generated.
Problem 2
Show that
E[Y] = arg

[( ) ]

From
E[Y] = arg

[( ) ] [( ) [( [ ]) ]

i.e. the minimum of [( ) ] is when = [ ]


But
[( ) = [(( [ ]) + ( [ ] ))
=

[( [ ]) + ( [ ] ) + 2( [ ])( [ ] )]

[( [ ]) ] +

( [ ] ) ] + [2( [ ])( [ ] )]

But also
[2( [ ])( [ ] )] = 2 [( [ ])( [ ] )]
= 2( [ ] [ [ ]])( [ ] )]
=0
From let [ ] =

[ ] = [ ]=

=0

Therefore
[( [ ]) ] + ( [ ] )

[( ) ] =
But ( [ ] ) 0 ,

= [ ]

[( [ ]) ]

[( )

Therefore

Problem 3
The practical benefit of the equations show that the mean is the statistic function of our data that
minimizes the mean square error (MSE) i.e. the lower bound for MSE.

Show that
[(

( )

( ))

( ) ( ) ) +

+(

( )

= ( )+

Given
[(

( ))

( ))

( )) + ( ( )

= [( (
= [(

( )) ] + [

= [ ( )+
( )

( )

( )) ] + [

( )

( )))
( ))

( )

( )( ( )

+2 (

( ))

( ))

+ 2 [( ( ) +

( ) ]

= [( ) ] + [

( )

( ))

= [( ) ] + [

( )

( ))

+2

( )

( )

[ ]=0

( )

Also from
[( [ ]) ] + ( [ ] ) replacing Y with ( )and c with

[( ) ] =
same principle
[

( )

( ))

( )

( ) )

+(

( ) and applying the

( ) ( ))

Therefore,
[(

( )=

Since
[(

( ))

( ))

[( ) ] +

[( ) ]
=

( )

( ) )

( ( )
( )+

( )

+(

( ) )

( ) ( ))
[ ( )] = ( ))

+ ( [ ( ) ( )])

proved

Problem 4
(b) The trainset contains 80 observations and 31 observations in the test set and in total we have 111
observations.
(c) There are four variables with values as shown in the table below:
Variables
Range (max-min)
Ozone
167
Radiation
327
Temperature
40
wind
18.4
(d) The plot below shows scatterplots for all pairs

Mean
42.1
184.8
77.8
9.9

Standard deviation
33.274
91.152
9.530
3.559

Pearson correlation co-efficient values


ozone
radiation
temperature
wind

ozone
1.0000000
0.3483417
0.6985414
-0.6129508

radiation
0.3483417
1.0000000
0.2940876
-0.1273656

temperature
0.6985414
0.2940876
1.0000000
-0.4971459

wind
-0.6129508
-0.1273656
-0.4971459
1.0000000

The range of the Pearson correlation co-efficient in general is 1.00000 (0.61295) = 1.61295
Correlation of zero value implies that that the variables involved are independent of one another I.e.
there is no linear relationship between the variables.
The correlation between wind and all other variables is negative hence, as values for wind decreases all
other variables value increases while all other pairs of variables yield a positive correlation in which case
they are directly proportional.
This inference can be seen visually.
(f) Below is a scatter plot for the true responses and predicted response

The

= (

) is 8208.509

And the correlation of the true responses to the predicted values is


actual Value
predicted

actual Value
1.0000000
0.8268958

(g)
Plot of RSS on k for the test dset

The most suitable value of k is 5 having the lowest RSS value


KNN assumes the response is categorical

predicted
0.8268958
1.0000000

(h) I would select the linear model with better RSS values and correlation which is obvious since the
categories used for the KNN is too much.

Das könnte Ihnen auch gefallen