Beruflich Dokumente
Kultur Dokumente
The following information was given to show the height (in centimeters) of mens high jump at the Olympic games. Note that the Olympic games did not occur in 1940 and 1944.
! As is shown in the graphs above, the winning height of mens high jump at the Olympic games is graphed. The x-axis is labelled as the year in which the Olympic games took place and follows the pattern of occurring every four years starting from 1932, with the exception of 1940 and 1944 as the Olympic games did not occur those years. The yaxis is labelled as the height (in centimeters) of the gold medalists high jump scores at the games. Parameters The parameters of this data would be the year in which the Olympic games took place because it is what determines the system of the high jump score. In other words, it is the year which would determine the score of the high jump as it would show how well the gold medalist trained and what technology was used during that year to aid them in achieving that score. For example, if 1932 is person As rst score, then they would get better and achieve higher scores as time progresses. So by 1948, Person A would have a higher score than his score in 1932 because this person has undergone more training and has better equipment. Also, the chart displays the year on the x-axis (i.e. the year being the rst row on the chart) and, as is shown in the graph, the year is also placed on the xaxis. Moreover, if the data is put into an equation, there would be an x value which would determine the value of the y , which in this case is the height of the high jump. For example, if an equation such as, y = 3x 2 + 7x + 8 is used to model the data, the y would be the height and the x would be the year in which the event took place. The year (the independent variable) would be the value that determines the height (the dependent variable) of the gold medalist jumper at that given year. The x value would be needed in order to nd out what the height ( y ) is. Thus, the year, is the parameter of the above set of data that shows the heights of the high jump during the Olympic games. Constraints The constraints of this task are that, when performing a regression analysis on the data, it would be difcult to nd the exact equation that models the data perfectly as this data does have some outliers and does not completely follow a pattern. The data does not follow an absolute pattern such as the points in the equation y = x 2 and so it would be difcult to determine an exact value (i.e. a future or a previous high jump score) based on a regression equation modeled by the data. Furthermore, the regression analysis would not take into account any outliers that could have resulted and would throw off estimates of future scores. Thus, the regression analysis would only give an estimate and would not necessarily be the actual value. Another constraint of this task would be that due to the Olympic games not having occurred in the years 1940 and 1944, it would reduce the accuracy of any interpretation/analysis and equation that is derived from the data due to the two points missing. This is because the absence of those two years make a gap in the data and pattern that it could have contained. Also, these two points are not the last or the
rst points in this set of data, rather the points are located within the data itself (i.e. the points are not the rst or the last).
197 203
198
204
212
216
218
224
223
225
236
The graphs of the standardized year would look as follows. There is not much change in the actual look of the graph. There is a slight change as the window settings have been altered to t the data, however no major changes to shape of the graph itself. Please note that the rst coordinate is a positive x-value. There have been technical problems which have limited the appearance of the rst coordinate when the Xmin is set to 0, which is why the Xmin is set at -10.
Graph of y = x
This, does not look fully like the plot of data that was constructed. This is due to the fact that shifts, stretches/compressions and/or restrictions would need to be applied to the parent function. I chose to use this function because its shape is very similar to the data. It is difcult to see the resemblance between both the data points and the root function at such a zoomed in setting, however, when the graph is zoomed out a little, it becomes easier to see how both graphs have a similar shape. Below is a display of the graph of the Olympic high jump heights at a different zoom setting.
The reason as to why I chose the root function to model this equation is because of its domain and range (i.e. all of the values lying in the positive quadrant of the
cartesian plane). The root function curve has a similar domain and range. Its domain is D : { x , x 0} and the range would be R : { y , y 0} , which is very similar to how the graph is modeled (i.e. all the values being on the positive quadrant of the plane). Moreover, another reason as to why this graph was chosen is because of its shape. The shape is very similar to the pattern that the long jump height data follows in that the coordinates go up to a certain point and then increase. However, the key point is that the rate of increase is contextually extremely slow to insignicant. Similarly, the graph of the high jump height increases at a point, however as each year passes by, it increases at an extremely slow rate that the increase is hardly noticeable when looking at the data in a short time period. For example, if the root function is zoomed in (representing a short time period), the change would be difcult to see. The graph below illustrates this judgement.
Zoomed In Function y = x
! The zoom in shows how the data is not increasing by a large amount, however, there is a slow increase. This further emphasizes how the increase is so slow, that it is considered insignicant. In order to create the equation to model this data, the a , b , c and d values must be identied in the base function. The base function is y = af [ k(x d)] + c , where
a shows a vertical stretch or compression and a reection on the x-axis, f is the parent
function, k is a horizontal stretch or compression and a reection on the y-axis, x is a variable, d is shift along the x-axis and c is a shift along the y-axis. Now all these values must be determined and inserted into the equation of y = a k(x d) + c i.e. the root function. The c-value is 197. This is determined by the y-intercept of the graph of high jump heights being at 197. The d-value would be 0, as there is no need for a horizontal shift. The a-value would also have no change and, thus, would stay at a value of 1. The k-value, however would have a value of 31. This means that the graph of y = x would be stretched vertically by a factor of 31. Furthermore, I obtained the value of 31
systematically by substituting values so that my y-value would be close to the last coordinate (48,236). My exact method was as follows: Substitute the x-value into the existing equation and solve for y , i.e. f (x) = 31 48 + 197 f (x) 236 . Thus, the equation to model the set of data is f (x) = 31x + 197 . This function is graphed below.
This makes more sense when put into the right context. Below is the graph of the function f (x) = 31x + 197 with the high jump score coordinates.
As is evident in the above graph, the equation of y = 31x + 197 does not clearly model the data and is, therefore, inaccurate. A number of differences have arisen between the data. Firstly, the most obvious one is that the graph of the root function only comes close to three points i.e. (0,197) (4,203) and (48,236). These point are close enough for the equation to be considered part of the shape of the root function, however, the curve is simply not in the right position to be considered a model of the set of data. Another difference in the two plots of data is that the shape of the root function does not precisely correspond with the shape of the high jump height data. Furthermore, the reason why this equation is awed is because it is trying to model
after every data point, including outliers. The outliers are throwing off the accuracy of the root equation and thus need to follow a pattern that is evident within the graph. The limitations of the model are that the function only represents two points on the actual set of data points and so it is limited to showing only those two points. Moreover, if one were to predict a given high jump score with the equation, the answer would be inaccurate and would thus throw the data off completely. The same would happen for estimating a future high jump score; the data would be inaccurate due to the curve only coming into contact with two of the points and not coming close to any others. In order to rene the modeling of the data points, the pattern within the data points of the high jump heights needs to be clearly identied. This is illustrated below.
The circled area represents a pattern. More specically it shows the square root function. This would make it easier to model the equation as now the outliers are identied which will not cause any interference in terms of the accuracy in the modeling of the graph. After many attempts of trial and error of changing the original function to model this data i.e. y = 31x + 197 . The nal equation to model the gold medal heights is y = (132x + 80) + 155 . The equation when graphed alongside the gold medal long jump heights data is shown below.
As is shown in the previous graph, the new, rened equation of y = (132x + 80) + 155 is much better to model the data as it is closer to more points on the graph, making it a better equation of best t. Moreover, it better ts the pattern that was noted previously, making it a better function to model the set of data than the previous equation. The d-value of 80 in the equation is actually (-80) because when added to the base function y = af [ k(x d)] + c there are two negative signs which simplify into a positive. The reason why the d-value was included is because a negative shift was needed, which pushes the whole function right and into the upper left quadrant of the cartesian plane and thus making domain D : { x , x 8 0} .
The judgement that the linear and root functions have similar slopes is clearly
1 5
expressed in the above diagram of the functions y = x + 1.5 and y = x . As is evident from the diagrams, the linear equation and the root equation have similar slopes which means that they both increase very slowly. This shows why the linear function would be a great t for modeling the data.
The r value or correlation coefcient, shows that the data points are closely related when it is analyzed by a linear regression (i.e. when a line of best t is very close to the data points). As is evident, the equation for the linear model is y = 0.755x + 194.138 . Notice how the slope of the curve is very low i.e. below 1, which was observed previously. The
next page shows the equation when it is graphed alongside the data points and the root function equation y = (132x + 80) + 155 .
Linear Regression Line, y = (132x + 80) + 155 and High Jump Heights
A few differences should be noted between the two graphs used to model the data. The rst is that the linear function is able to represent all the data points, whereas the root function is restricted to only the pattern that was identied previously. Another difference is that the domain and range of the linear function are D : { x } and R : { y } respectively. Note, that the x and y values in the linear function are all number including the negatives, whereas, in the root function the y values are only in the positive plane. Also note that the root function has a starting point (0,163.944) (i.e.
y=
[132(0) + 80 ] + 155 163.944 ) whereas the linear function does not have a starting
point. Another difference is that the linear function has a steeper slope than the root function, which is evident by the linear curve ending ( i.e. at the point (48,236)) at a higher point than the square root curve. This shows that if the points on the data set continue, they will continue to increase at higher rate than the root function. The linear function, in this case would be less accurate if estimating future scores as the nature of the data is that it can only reach a certain point and cannot increase any higher. This is due to the fact that the long jumpers can only jump so high even if they undergo many extra years of training. The linear function, thus is accurate when modeling the data set in the context of the data only, and in the notion that the data will continue to rise. However, the root function is more contextually sound, as it understands that the even though the data follows a pattern, there is a limit to its pattern. Furthermore, the root function operates in a way that it increases very slowly after it reaches a certain point which can be expected in future scores. This is because the future scores will not constantly increase because this data comes from a human sport and the human is a nite being that can only go up to certain point. Moreover, the data will have a very low chance of increasing after it reaches its maximum. The notion that the slope of the root function decreases is illustrated by looking
at its derivative at certain points along the curve. Using the GDC, derivative at various points along the curve will be found. See below.
As is shown above, it is clear that the slope of the root function increases, but at a decreasing rate. For example, the difference of the slope from when x is 4 and when x is 6 is ~0.442. This slope decreases as it moves along the curve horizontally which is shown by the following calculation. The difference of the slope when x is 48 and x is 50 is ~0.16.
The reason as to why I chose to use the linear regression model to nd the coordinates is because the linear model is more accurate than the root function model. This is because the root model is human made and is likely to contain errors, and tends to be less precise than the linear function which is found by means of a technological device. The technological device (GDC) is designed in a way that it contains fewer errors than human made calculations and is also more accurate. A really obvious reason as to why the data is sound is because the values are not outliers and they are very close to the real values. Another reason why the linear function was used is because it is a model of all of the data, whereas the root function is just limited to a pattern within the data set. The
image on the next page shows how data scores for 1940 and 1944 would be inaccurate if determined through the root function.
Another reason as to why the linear regression function was used to estimate the values is because when the 1940 and 1944 values are plotted alongside the linear regression equation (i.e. the one obtained without the presence of the new coordinates) it is part of the line of best t. This can be more clearly visualized below.
As is shown in the image above, both of the points lie on the linear regression curve and are, therefore accurate when shown alongside the other, real values.
y = (132x + 80) + 155 . The reason for inserting the values into the root equation and not
the linear function equation is because of the fact that, as was stated previously, the linear function is less accurate when extrapolating because it is under the assumption that the data will constantly increase. Contrarily, the data will stop at a certain point in a long term period because the results are extracted from humans which are nite beings. On the other hand, the root function is more contextually based and is therefore, more suitable for predicting the scores. The method for nding the value for 1984 is as follows: 52 is the standardized value for the year 1984. Now, substitute 52 into the equation y = (132x + 80) + 155 like this,
y=
Thus, the coordinate for the 1984 winning high jump score would be (52,238.331). Below is the graph of the point alongside the original high jump scores data and the root model equation.
The data here is increasing, however by a short increment as the slope of the function reects its very slow rate of increase. Now, the graph for the estimated scores in 2016 will be calculated. The standardized value for the year 2016 is 84 (i.e. 2016 - 1932 = 84). 84 will be substituted into the equation y = (132x + 80) + 155 and isolated for y .
y=
Thus, the coordinate for the height during 2016 is (84,260.679). On the next page is its graph alongside the extrapolated 1984 coordinate and the genuine high jump scores.
The answers for both values seem to be following the trend set by the root function which is a slow, gradual increase. Also, note that the values have increased by a higher amount than in 1984. This is perhaps due to increased training time and/or technological advances which have aided in high jump scores.
Additional Data
Year The following data was given to show additional gold medalist high jump scores. 1896 1904 1908 1912 1920 1928 1984 1988 1992 1996 2000 2004 2008 -28 -24 -20 -12 -4 52 56 60 64
68
72
76
180
191
193
193
194
235
236 236
This data will be graphed below alongside the root and linear model functions.
The root function does not model the data well as its slope is too steep for the data and the root function keeps on going higher at a rate that is faster than the rate of increase of the data itself. The linear function is a little bit better than the root function because it comes closer to more of the points, however, it does model the newer data well. The linear function, is therefore, only good for interpolating values that were given (i.e. the original scores from 1932 to 1980). This datas graph in the negative area starts off high and then goes dips down and then increases again. This pattern also occurs at the points after 1980 however it occurs more frequently. This dip and increase pattern occurs throughout the data but after each few series of this pattern, it slowly increases. In other words, this dip and increase pattern happens throughout the data but increases also. The modication that need to be made to the root function is that a vertical compression and a horizontal stretch needs to be applied. The reason for the vertical compression is obvious, as it will reduce the slope to make it t the data better. The horizontal stretch is to prolong the slope at one area. For example, if, before the horizontal stretch, the slope between x = 4 and x = 5 is 1, then the stretch would increase the length of the slope. So then, for example, the slope from x = 4 to x = 8 would be 1.