Sie sind auf Seite 1von 3

Correlation Between Distance and Tardies

My partner and I chose to focus on a topic regarding the distance in miles one lives away

from the school, and how many tardies they had to their first hour class. The explanatory variable

in this study was the miles the individual lived away from the school, while the response variable

was how many tardies that student had to first hour. In our study we assumed a moderate

positive correlation, but after creating a scatter plot and recording the data we found there was

almost no correlation. Most of the responses for tardies in first hour was either zero, or a very

low number behind 10. With our set of data, we found that when making the scatter plot, some of

the numbers are located in the same area in the scatter plot, making this look like there is less

points. This is the explanation for the points stacked over one another within our scatter plots.

After this, we were able to find our x and y bar amounts, which are the averages of the two sets

of data we collected. The x bar average, or average number of miles that individuals lived away

from school, was 4.64 miles. Our y bar amount, or amount of tardies to first hour, was 1.3. So

out of every 4.64 miles that individual lived away from school, they had an average of 1.3 tardies

throughout the year so far.

After finding this information we found the correlation coefficient of our data, which was

.168. Since this number is so close to zero, it shows our data has a very weak positive

correlation, close to zero correlation. We then found our 𝑟2 variable, which happened to be .028,

this tells us that our data we collected is very condensed in some parts, with multiple outliers in

which our line of regression does not come close to. Possible lurking variables could be that

since its early in the school year attendance is better, students could be tardy but not documented,

these could all play a role in showing no correlation because time will tell at the end of the
school year and give us a better image about whether there is a correlation or not. If students

actually are late to classes in first period and they are not marked for it, they will not know if

they were officially marked down and will most likely not remember the number of times they

were tardy off the top of their head.

One point of influence we found in our data was a student who lived 7 miles away with

20 tardies to their first period class. The remainder of students did not necessarily live closer, but

had substantially fewer tardies so far this year. The significance of this influential point is it pulls

the slope up on the line of regression much more than it would if that point were not there.

An example of interpolation on the data we collected would be if a student were to live 9

miles away from the school, and had roughly 2 tardies first hour. This would be an example of

interpolation because it lies within our data set along the line of regression. An example of

extrapolation, or a point beyond the data we collected would be if a student were three miles

away and had 35 tardies so far this year, this would also be an influential point if this was in our

data set. The amount of explained variation we found within the information we collected was

2.8%, this left 97.2% to be unexplained and show that 97.2% is affected by lurking variables,

and that only 2.8% is not affected by lurking variables. Marginal change is the number of units

change in response variable for each unit change in the explanatory variable. For our

information, after forming our line of regression we found that every 1 mile students live away

from school, there amount of tardies increase by .15. This was very surprising to see at the end of

our study.

In conclusion, although my partner and I thought our study would lead to a moderate to

strong positive correlation, we were wrong and instead found almost no correlation at all.

Das könnte Ihnen auch gefallen