Beruflich Dokumente
Kultur Dokumente
My partner and I chose to focus on a topic regarding the distance in miles one lives away
from the school, and how many tardies they had to their first hour class. The explanatory variable
in this study was the miles the individual lived away from the school, while the response variable
was how many tardies that student had to first hour. In our study we assumed a moderate
positive correlation, but after creating a scatter plot and recording the data we found there was
almost no correlation. Most of the responses for tardies in first hour was either zero, or a very
low number behind 10. With our set of data, we found that when making the scatter plot, some of
the numbers are located in the same area in the scatter plot, making this look like there is less
points. This is the explanation for the points stacked over one another within our scatter plots.
After this, we were able to find our x and y bar amounts, which are the averages of the two sets
of data we collected. The x bar average, or average number of miles that individuals lived away
from school, was 4.64 miles. Our y bar amount, or amount of tardies to first hour, was 1.3. So
out of every 4.64 miles that individual lived away from school, they had an average of 1.3 tardies
After finding this information we found the correlation coefficient of our data, which was
.168. Since this number is so close to zero, it shows our data has a very weak positive
correlation, close to zero correlation. We then found our 𝑟2 variable, which happened to be .028,
this tells us that our data we collected is very condensed in some parts, with multiple outliers in
which our line of regression does not come close to. Possible lurking variables could be that
since its early in the school year attendance is better, students could be tardy but not documented,
these could all play a role in showing no correlation because time will tell at the end of the
school year and give us a better image about whether there is a correlation or not. If students
actually are late to classes in first period and they are not marked for it, they will not know if
they were officially marked down and will most likely not remember the number of times they
One point of influence we found in our data was a student who lived 7 miles away with
20 tardies to their first period class. The remainder of students did not necessarily live closer, but
had substantially fewer tardies so far this year. The significance of this influential point is it pulls
the slope up on the line of regression much more than it would if that point were not there.
miles away from the school, and had roughly 2 tardies first hour. This would be an example of
interpolation because it lies within our data set along the line of regression. An example of
extrapolation, or a point beyond the data we collected would be if a student were three miles
away and had 35 tardies so far this year, this would also be an influential point if this was in our
data set. The amount of explained variation we found within the information we collected was
2.8%, this left 97.2% to be unexplained and show that 97.2% is affected by lurking variables,
and that only 2.8% is not affected by lurking variables. Marginal change is the number of units
change in response variable for each unit change in the explanatory variable. For our
information, after forming our line of regression we found that every 1 mile students live away
from school, there amount of tardies increase by .15. This was very surprising to see at the end of
our study.
In conclusion, although my partner and I thought our study would lead to a moderate to
strong positive correlation, we were wrong and instead found almost no correlation at all.