Sie sind auf Seite 1von 8

00:03

so this may go out and we may be left


in this session I will introduce 01:56
00:05 with a simple linear regression where
multiple linear regression whenever we 01:59
00:10 labor cost is dependent only on labor
do this kind of study the first thing 02:02
00:13 hours
that we should always do is to do a 02:07
00:16 I will import the data set which is on
scatter plot because the scatter plot 02:09
00:18 my desktop and the file is a CSV file
will tell you whether the variables are 02:14
00:20 not Excel and I will simply open it and
really related to each other in a linear 02:19
00:24 import it so you can see this vehicle
way or not because if they are not 02:23
00:26 file has one six to four observations
related in a linear manner by linear I 02:25
00:29 with seven variables one two three four
mean if you plot x and y and if you get 02:30
00:33 five six seven but in this particular
a pattern like this so it looks like it 02:33
00:36 study we are only interested in mileage
is linear like as X goes up Y also goes 02:36
00:39 labor hours and labor costs so we'll
up so it could be positive linear or it 02:39
00:43 ignore all other variables I am going to
could be negative linear like in the 02:42
00:45 close this tab note that when I close
opposite direction so scatter plot will 02:45
00:48 that view it does not mean that the data
confirm whether or not there is a linear 02:48
00:50 is not available anymore so under data
relation between two variables or among 02:51
00:52 you can see on the right side that
different variables for multiple linear 02:54
00:56 vehicle file is still there so if you
regression when you are developing this 02:57
00:57 are starting something new you can
equation obviously you have more than 02:59
01:00 always click on this plus sign and say
one x variable so your equation may look 03:02
01:03 our script and it will open an empty
like b0 which is the y intercept and 03:05
01:07 window if you want to clear everything
then the slope b1 with variable first 03:08
01:11 to start afresh in this console option
independent variable X 1 plus B 2 X 2 03:11
01:16 or window you can say ctrl M so that
and so on depending on how many 03:14
01:18 will clean everything so let's look at
variables you have it can go from 1 to N 03:18
01:25 head vehicle just to see what type of
so labor cost equals v-0 plus b1 times 03:22
01:31 data we have so you can see all that we
you can say labor hours plus b2 times 03:25
01:36 will use pairs or vehicle and we will
mileage now depending on whether labor 03:30
01:40 use only you can see here this is first
hours or mileage has a significant 03:34
01:43 second so third fourth and fifth
contribution to the overall model if any 03:37
01:46 variables so I will say 3 : 5 so I want
one of them do not have significant 03:43
01:47 to see that scatterplot and you can see
contribution they may be dropped for 03:46
01:50 labor hours and labor cost are related
example if mileage is not contributing 03:49
01:52 to each other in a stronger way and
03:51 05:48
mileage versus labor hours not that negative eight point two seven times ten
03:54 05:52
strong correlation similarly mileage to the power of negative five which is
03:56 05:55
versus labor cost is not very strong zero point one two three four because
03:58 05:58
correlation but we'll go ahead and try this is negative five so four zeros and
04:02 06:02
to develop the first model I am going to then eight four seven obviously with a
04:05 06:05
name this s so what we'll do is we'll negative sign so that's our coefficient
04:09 06:09
develop this multiple linear regression zero point zero zero zero eight four
04:11 06:13
model and save the results in a variable seven for the mileage seventy three
04:15 06:17
called results so point five five labor hours so this is
04:18 06:22
this name is given by us you can simply the linear model we get by fitting this
04:20 06:25
say maybe a b c z anything so i'm using data now you can get some more outcome
04:24 06:28
results here and then this less than and by looking at the summary
04:28 06:31
- sign which means that results will be and if we run this you can see you will
04:33 06:36
assigned so whatever we do on the right get some more information for the
04:36 06:39
side and we are going to make use of a coefficients not only the coefficients
04:38 06:42
function called l m so LM is linear that we just now wrote one point three
04:42 06:45
model and we are developing a linear seven five and negative and all those
04:44 06:46
model for labor cost so I am using this numbers you also have some further
04:48 06:49
name as it is and then this tilde sign statistical analysis but the most
04:53 06:51
and we want to include mileage in labor important column is the last column so
04:58 06:54
hours now if you have more than two this is the probability value these are
05:00 06:56
variables you can always just keep on the p-values which tells you whether a
05:04 07:00
adding using a plus sign so you can add variable in the model is really
05:06 07:02
as many variables as you like and once contributing significantly or not these
05:10 07:05
you have used all the variables in which stars indicate how significant that
05:13 07:07
you are interested then you have to variable is in the model like whether it
05:15 07:11
specify what is your data file from our is playing a significant role or not so
05:17 07:14
file is vehicle so now I can run this if it is three star that means very
05:20 07:16
and if you want to look at the results highly significant if it is two star
05:23 07:19
so type results and run it it will give slightly less one star slightly less but
05:26 07:22
you the outcome so it will give you all still significant if there is no star
05:29 07:25
the coefficients so intercept is one like for mileage that means mileage is
05:32 07:27
point three seven five so our model will not really playing any significant role
05:36 07:29
look like this labor cost the in a statistically significant role
05:39 07:32
coefficient for mileage is so this means towards this model of predicting labor
07:35 09:40
cost from mileage in labor hours so if variables using p-values now if you look
07:41 09:44
it is not playing a significant role the at towards the bottom you will see some
07:44 09:47
natural thing will be to delete this further information provided you have F
07:47 09:50
from further analysis so we can drop statistic based on that F statistic they
07:50 09:53
this variable and develop our model only have also created another p-value and
07:53 09:56
based on labor hours because mileage is the interpretation of this p-value is
07:56 09:58
not contributing now the stars also you same as what I have told you earlier so
08:00 10:01
can see if you have a two star that you just do 1 minus this number now this
08:03 10:04
means the significance level is 0.01 now number is very very small like 10 to the
08:08 10:07
how do we interpret this 0.01 in fact in power of negative 16 so zero point zero
08:13 10:09
this case because you have three star zero zero zero almost hundred percent
08:14 10:11
let's use three star value so this is two other numbers that you may notice
08:17 10:14
0.001 and from this p value if you do 1 here are R square multiple R Square and
08:25 10:17
minus P which in this case becomes 1 adjusted R square point 9 5 means the
08:28 10:21
minus 0.001 you get zero point nine nine variables in the model contribute 95
08:34 10:24
nine or nineteen nine point nine percent percent to the overall variability which
08:39 10:27
confidence which means if you treat is very good so you have variables that
08:43 10:30
labor hours as a statistically are contributing high if you see R
08:45 10:32
significant variable your confidence squared values which are like point zero
08:47 10:34
level is 99.9% so that's a high level of one or like only 5 percent 2 percent or
08:51 10:38
confidence and that's what we look for 10 percent that's not really a good
08:55 10:41
like default value is 95 anything more model or those variables are not really
08:58 10:43
than 95 percent is really good that's contributing much in predicting labor
09:01 10:46
the interpretation of the stars and the cost for this data you are covering like
09:05 10:49
p-values similarly if you look at the almost 95 percent of the variation by
09:09 10:52
p-value for mileage which is point two using just two variables so that's a
09:12 10:54
zero 1 you can say 0.2 so 1 minus 0.2 is very good sign
09:16 10:55
0.8 so if you conclude that mileage is only like remaining 5 percent is the
09:19 10:58
really important or significant then explain part so if you try to bring more
09:22 11:02
your confidence level is 1 minus 0.2 variables in this model you are
09:26 11:04
which is 80% so anything less than 95% addressing only that 5% remaining and
09:30 11:07
is not really that good 80% confidence that's not really going to add too much
09:32 11:10
is not good that's why we can say value generally we develop final model
09:35 11:13
mileage is not really significant these based on significant variables I will
09:38 11:16
interpretations are about individual take out mileage and only keep labor
11:20 13:28
hours in that model and repeat those confidence level will be 80% if you
11:24 13:30
three lines so still it is significant create this as significant so this means
11:28 13:34
and if you have to say what is your that by adding this variable mileage we
11:31 13:39
final model based on this data then you are not adding any significant
11:35 13:41
should write negative 0.236 and then information to the model we are not
11:42 13:44
plus for the labor hours coefficient is gaining any significant information from
11:45 13:45
able hours so this would be the final the model by adding this variable I have
11:49 13:49
model because we have taken out a used this simple case your full model
11:53 13:52
variable that was not statistically may include may be depending on the data
11:55 13:54
significant in this case mileage and we you have maybe are three more variables
11:58 13:57
get this final model you can also or maybe fifteen more variables and just
12:01 14:01
compare the full model in this case by using these three lines you can
12:04 14:03
where all the variables in which we are figure out whether adding all those
12:07 14:06
interested are included versus the extra variables is creating any real
12:10 14:10
reduced model where we have only those value or not so in this case it is not
12:13 14:13
variables which we feel are because it is not significant obviously
12:16 14:16
significantly contributing you can do we developed models so that we can do
12:19 14:18
let's say reduced linear model the labor some prediction we will use default of
12:27 14:22
cost is a function of labor hours where 95% confidence predict and we want to
12:31 14:26
vehicle is our data put it a pilot predict from results so remember our
12:35 14:29
vehicle so I will run this line and I last results were based on this model of
12:41 14:34
will also write down full model and our significant variable so predict from
12:44 14:36
data file is vehicle so I've run this results , so we want to specify a data
12:47 14:41
model also and then we can do what is frame data dot frame and you can say
12:51 14:44
called ANOVA for reduced comma full you labor hours let's say we want to predict
12:56 14:48
know R stands for analysis of variance for 10 hours what will be the labor cost
12:58 14:51
so by doing analysis of variance which and then comma our interval equals
13:01 14:58
summarizes results in the form of our confidence so we have to specify that we
13:04 15:01
table let's run that I will run this are doing a confidence interval
13:08 15:02
line and you get the result prediction once you run this line you
13:12 15:06
so again the results are in the form of can see it gives you a lower and upper
13:14 15:10
a p-value so all the statical analysis value fitted value is the average and it
13:19 15:15
is summarized in the form of the p-value can go from seven twenty eight dollars
13:21 15:17
now p-value is not significant as you ninety two cents to seven forty dollars
13:24 15:21
can see like 1 minus 0.2 is only 0.8 so and seventy eight cents so those are the
15:24 15:44
low through this also note that they are
15:24 15:48
and upper bounds of this 95% confidence reading the data file using this command
15:28 15:50
interval on the course website obviously this command is for their
15:31 15:55
I have also uploaded a file called particular computer so it may not work
15:34 15:57
multiple linear regression which also on your computer but you get the idea by
15:37 16:00
gives you similar details using another looking at the commands and the output
15:41 16:03
example and I I suggest you can go what they are trying to do

TRDUCCION A ESPAÑOL

En esta sesión presentaré arriba por lo que podría ser y así sucesivamente
lineal o positiva dependiendo de cuántos
regresión lineal múltiple
cada vez que Podría ser lineal negativo Las variables que tienes
como en el pueden ir de 1 a N.
hacer este tipo de estudio
lo primero dirección opuesta por lo por lo que el coste laboral
que la trama de dispersión es igual a v-0 más b1 veces
lo que siempre debemos
hacer es hacer un Confirmar si hay o no un Se puede decir horas de
lineal. trabajo más b2 veces.
diagrama de dispersión
porque el diagrama de relación entre dos variables kilometraje ahora
dispersión o entre dependiendo de si la mano
de obra
le dirá si las variables son Diferentes variables para
multiples lineales. horas o kilometraje tiene
Realmente relacionados
una significativa
entre sí en una lineal regresión cuando estás
desarrollando esto contribución al modelo
Camino o no porque si no
general si es que hay
lo son. ecuación obviamente
tienes más de Uno de ellos no tiene
Relacionado de forma
significante.
lineal por I lineal. Una variable x para que se
vea tu ecuación. contribución que pueden
significa si traza x e y y si
ser retirados por
obtiene como b0, que es el
intercepto y ejemplo si el kilometraje
Un patrón como este por lo
no contribuye
que parece entonces la pendiente b1
con variable primero para que esto salga y
Es lineal como a medida
podamos quedarnos
que X sube Y también va variable independiente X 1
mas B 2 X 2 con una regresión lineal
simple donde
El costo de la mano de obra están empezando algo fuerte correlación de
depende solamente de la nuevo que puedes manera similar kilometraje
mano de obra.
Siempre haga clic en este El costo de la mano de obra
horas signo más y diga no es muy fuerte

Importaré el conjunto de nuestro script y se abrirá correlación pero vamos a


datos que está en un vacío seguir adelante y
trataremos
Mi escritorio y el archivo es Ventana si quieres borrar
un archivo CSV todo. Para desarrollar el primer
modelo que voy a
no Excel y simplemente lo para comenzar de nuevo
abriré y en esta opción de consola nombrar esto s así que lo
que haremos es que vamos
Importa para que puedas o ventana se puede decir
a
ver este vehículo. ctrl M para que
Desarrollar esta regresión
archivo tiene una seis a limpiará todo, así que
lineal múltiple.
cuatro observaciones echemos un vistazo a
Modelar y guardar los
con siete variables uno dos cabeza de vehículo sólo
resultados en una variable.
tres cuatro para ver qué tipo de
llamados resultados tan
cinco seis siete pero en Datos que tenemos para
este particular que puedas ver todo lo que Este nombre es dado por
tenemos. nosotros, simplemente
Estudia solo nos interesa el
puede
kilometraje. Utilizaremos pares o
vehículo y lo haremos. di tal vez un b c z cualquier
horas de trabajo y costos
cosa, así que estoy usando
laborales, así que vamos a Usa solo puedes ver aquí
esto es lo primero resultados aquí y luego
ignorar todas las otras
este menos de y
variables que voy a segundo tan tercero cuarto
y quinto - Signo que significa que los
cerrar esta pestaña nota
resultados serán.
que cuando cierre variables así que voy a
decir 3: 5 por lo que quiero Asignado por lo que
Esa visión no significa que
hacemos lo que hacemos a
los datos. para ver ese diagrama de
la derecha
dispersión y se puede ver
ya no está disponible por lo
lado y vamos a hacer uso
que bajo los datos Las horas laborales y el
de una
costo laboral están
Se puede ver en el lado
relacionados. función llamada l m así que
derecho que
LM es lineal
entre sí de una manera
archivo del vehículo
más fuerte y Modelo y estamos
todavía está allí, así que si
desarrollando un lineal.
kilometraje versus horas de
trabajo no es eso
modelo para el costo de punto tres siete cinco por que acabamos de escribir
mano de obra, así que lo que nuestro modelo será un punto tres
estoy usando este
Parece que esta mano de siete cinco y negativo y
nombre como es y luego obra le cuesta al todos esos
este signo de tilde
coeficiente de kilometraje Los números también
Y queremos incluir es por lo que esto significa tienen algo más
kilometraje en mano de
negativo ocho puntos dos El análisis estadístico pero
obra.
siete veces diez el más
Horas ahora si tienes más
al poder de los cinco columna importante es la
de dos
negativos que es última columna por lo
variables que siempre
cero punto uno dos tres este es el valor de
puedes simplemente seguir
cuatro porque probabilidad estos son
agregando usando un signo
esto es negativo cinco por los valores de p que le dice
más para que pueda
lo que cuatro ceros y si una
agregar
entonces ocho cuatro siete, variable en el modelo es
tantas variables como
obviamente, con una realmente
quieras y una vez
signo negativo por lo que contribuyendo
Has utilizado todas las
es nuestro coeficiente significativamente o no
variables en las que
estos
punto cero cero cero cero
estas interesado entonces
ocho cuatro las estrellas indican lo
tienes que
significativo que
siete para el kilometraje
Especifique cuál es su
setenta y tres variable está en el modelo
archivo de datos de
como si
nuestra punto cinco cinco horas de
trabajo por lo que este es Está jugando un papel
archivo es el vehículo por
importante o no tan
lo que ahora puedo El modelo lineal que
ejecutar este obtenemos encajando esto si es tres estrellas eso
significa
Y si quieres ver los datos ahora se puede
resultados. obtener un resultado más
así que escriba los mirando el resumen
resultados y ejecútelo dará
Y si corremos esto puedes
Usted el resultado por lo ver que lo harás
que le dará todo
obtener más información
Los coeficientes para para el
interceptar es uno
coeficientes no solo los
coeficientes

Das könnte Ihnen auch gefallen