Sie sind auf Seite 1von 17

Math Portfolio 2

The following information was given to show the height (in centimeters) of mens high jump at the Olympic games. Note that the Olympic games did not occur in 1940 and 1944.

Winning Mens High Jump Height at Olympic Games


Year Height (cm) 1932 1936 1948 1952 197 203 198 204 1956 212 1960 216 1964 218 1968 224 1972 223 1976 225 1980 236

Height of Jump at the Olympic Games

Window Settings of the Graph

! As is shown in the graphs above, the winning height of mens high jump at the Olympic games is graphed. The x-axis is labelled as the year in which the Olympic games took place and follows the pattern of occurring every four years starting from 1932, with the exception of 1940 and 1944 as the Olympic games did not occur those years. The yaxis is labelled as the height (in centimeters) of the gold medalists high jump scores at the games. Parameters The parameters of this data would be the year in which the Olympic games took place because it is what determines the system of the high jump score. In other words, it is the year which would determine the score of the high jump as it would show how well the gold medalist trained and what technology was used during that year to aid them in achieving that score. For example, if 1932 is person As rst score, then they would get better and achieve higher scores as time progresses. So by 1948, Person A would have a higher score than his score in 1932 because this person has undergone more training and has better equipment. Also, the chart displays the year on the x-axis (i.e. the year being the rst row on the chart) and, as is shown in the graph, the year is also placed on the xaxis. Moreover, if the data is put into an equation, there would be an x value which would determine the value of the y , which in this case is the height of the high jump. For example, if an equation such as, y = 3x 2 + 7x + 8 is used to model the data, the y would be the height and the x would be the year in which the event took place. The year (the independent variable) would be the value that determines the height (the dependent variable) of the gold medalist jumper at that given year. The x value would be needed in order to nd out what the height ( y ) is. Thus, the year, is the parameter of the above set of data that shows the heights of the high jump during the Olympic games. Constraints The constraints of this task are that, when performing a regression analysis on the data, it would be difcult to nd the exact equation that models the data perfectly as this data does have some outliers and does not completely follow a pattern. The data does not follow an absolute pattern such as the points in the equation y = x 2 and so it would be difcult to determine an exact value (i.e. a future or a previous high jump score) based on a regression equation modeled by the data. Furthermore, the regression analysis would not take into account any outliers that could have resulted and would throw off estimates of future scores. Thus, the regression analysis would only give an estimate and would not necessarily be the actual value. Another constraint of this task would be that due to the Olympic games not having occurred in the years 1940 and 1944, it would reduce the accuracy of any interpretation/analysis and equation that is derived from the data due to the two points missing. This is because the absence of those two years make a gap in the data and pattern that it could have contained. Also, these two points are not the last or the

rst points in this set of data, rather the points are located within the data itself (i.e. the points are not the rst or the last).

Standardizing the Data


! In order to standardize the data, I will let the years from 1932 to 1980 be represented as reasonable x-values. In other words I will set x=0 at 1932, x=4 at 1936 and so on. The purpose of standardizing the data is so that the numbers of the data are not random and so that they follow a set pattern. Also, it simply puts the years at a common slate and makes them easier to work with. The table of values of the standardized gold medalist high jump heights is shown below. Note, that the standardized year in which the Olympic games were held is indicated by the shaded row. Also note that the standardized year does not follow the pattern of the year in which the Olympic Games were held i.e. 1932 is when x=1, 1936 is when x=2 and so on. It rather follows the pattern where each year, regardless of whether or not the Olympic Games were held that year or not is indicated. Year 1932 1936 1948 1952 4 16 20 1956 24 1960 28 1964 32 1968 36 1972 40 1976 44 1980 48

Standar 0 dized Year Height (cm)

197 203

198

204

212

216

218

224

223

225

236

The graphs of the standardized year would look as follows. There is not much change in the actual look of the graph. There is a slight change as the window settings have been altered to t the data, however no major changes to shape of the graph itself. Please note that the rst coordinate is a positive x-value. There have been technical problems which have limited the appearance of the rst coordinate when the Xmin is set to 0, which is why the Xmin is set at -10.

Standardized Year of High Jump

Window Settings of the Graph

Modeling the Data


The best type of function to model the behavior of this set of data would be the root function. The equation for the parent function of the root function is y = x . The graph of the root parent function is shown below.

Graph of y = x

This, does not look fully like the plot of data that was constructed. This is due to the fact that shifts, stretches/compressions and/or restrictions would need to be applied to the parent function. I chose to use this function because its shape is very similar to the data. It is difcult to see the resemblance between both the data points and the root function at such a zoomed in setting, however, when the graph is zoomed out a little, it becomes easier to see how both graphs have a similar shape. Below is a display of the graph of the Olympic high jump heights at a different zoom setting.

High Jump Heights

Window Settings of the Graph

The reason as to why I chose the root function to model this equation is because of its domain and range (i.e. all of the values lying in the positive quadrant of the

cartesian plane). The root function curve has a similar domain and range. Its domain is D : { x , x 0} and the range would be R : { y , y 0} , which is very similar to how the graph is modeled (i.e. all the values being on the positive quadrant of the plane). Moreover, another reason as to why this graph was chosen is because of its shape. The shape is very similar to the pattern that the long jump height data follows in that the coordinates go up to a certain point and then increase. However, the key point is that the rate of increase is contextually extremely slow to insignicant. Similarly, the graph of the high jump height increases at a point, however as each year passes by, it increases at an extremely slow rate that the increase is hardly noticeable when looking at the data in a short time period. For example, if the root function is zoomed in (representing a short time period), the change would be difcult to see. The graph below illustrates this judgement.

Zoomed In Function y = x

Window Settings of the Graph

! The zoom in shows how the data is not increasing by a large amount, however, there is a slow increase. This further emphasizes how the increase is so slow, that it is considered insignicant. In order to create the equation to model this data, the a , b , c and d values must be identied in the base function. The base function is y = af [ k(x d)] + c , where

a shows a vertical stretch or compression and a reection on the x-axis, f is the parent
function, k is a horizontal stretch or compression and a reection on the y-axis, x is a variable, d is shift along the x-axis and c is a shift along the y-axis. Now all these values must be determined and inserted into the equation of y = a k(x d) + c i.e. the root function. The c-value is 197. This is determined by the y-intercept of the graph of high jump heights being at 197. The d-value would be 0, as there is no need for a horizontal shift. The a-value would also have no change and, thus, would stay at a value of 1. The k-value, however would have a value of 31. This means that the graph of y = x would be stretched vertically by a factor of 31. Furthermore, I obtained the value of 31

systematically by substituting values so that my y-value would be close to the last coordinate (48,236). My exact method was as follows: Substitute the x-value into the existing equation and solve for y , i.e. f (x) = 31 48 + 197 f (x) 236 . Thus, the equation to model the set of data is f (x) = 31x + 197 . This function is graphed below.

Graph of the Function f (x) = 31x + 197

Window Settings of the Graph

This makes more sense when put into the right context. Below is the graph of the function f (x) = 31x + 197 with the high jump score coordinates.

High Jumps Heights and Graph of f (x) = 31x + 197

Window Settings of the Graph

As is evident in the above graph, the equation of y = 31x + 197 does not clearly model the data and is, therefore, inaccurate. A number of differences have arisen between the data. Firstly, the most obvious one is that the graph of the root function only comes close to three points i.e. (0,197) (4,203) and (48,236). These point are close enough for the equation to be considered part of the shape of the root function, however, the curve is simply not in the right position to be considered a model of the set of data. Another difference in the two plots of data is that the shape of the root function does not precisely correspond with the shape of the high jump height data. Furthermore, the reason why this equation is awed is because it is trying to model

after every data point, including outliers. The outliers are throwing off the accuracy of the root equation and thus need to follow a pattern that is evident within the graph. The limitations of the model are that the function only represents two points on the actual set of data points and so it is limited to showing only those two points. Moreover, if one were to predict a given high jump score with the equation, the answer would be inaccurate and would thus throw the data off completely. The same would happen for estimating a future high jump score; the data would be inaccurate due to the curve only coming into contact with two of the points and not coming close to any others. In order to rene the modeling of the data points, the pattern within the data points of the high jump heights needs to be clearly identied. This is illustrated below.

Pattern in the High Jump Score Data

The circled area represents a pattern. More specically it shows the square root function. This would make it easier to model the equation as now the outliers are identied which will not cause any interference in terms of the accuracy in the modeling of the graph. After many attempts of trial and error of changing the original function to model this data i.e. y = 31x + 197 . The nal equation to model the gold medal heights is y = (132x + 80) + 155 . The equation when graphed alongside the gold medal long jump heights data is shown below.

High Jump Heights and Graph of y = (132x + 80) + 155

Window Settings of the Graph

As is shown in the previous graph, the new, rened equation of y = (132x + 80) + 155 is much better to model the data as it is closer to more points on the graph, making it a better equation of best t. Moreover, it better ts the pattern that was noted previously, making it a better function to model the set of data than the previous equation. The d-value of 80 in the equation is actually (-80) because when added to the base function y = af [ k(x d)] + c there are two negative signs which simplify into a positive. The reason why the d-value was included is because a negative shift was needed, which pushes the whole function right and into the upper left quadrant of the cartesian plane and thus making domain D : { x , x 8 0} .

Using the Linear Function to Model the Data


To model the data a linear function can also be used. This is due to the fact the curve of the linear function, like the root function, is always increasing. If the slope of the linear function is very low then the rate of increase would be similar to that of the root function i.e. a slow increase. Below is an illustration of how the linear and root functions are very much alike.

Graph of Linear and Root Functions

Window Settings of the Graph

The judgement that the linear and root functions have similar slopes is clearly
1 5

expressed in the above diagram of the functions y = x + 1.5 and y = x . As is evident from the diagrams, the linear equation and the root equation have similar slopes which means that they both increase very slowly. This shows why the linear function would be a great t for modeling the data.

Finding the Equation of the Linear Function Using a GDC


The equation for the line of best t of the high jump heights data will be obtained using a GDC. The equation will be in the form y = ax + b , where a is the slope of the curve and b is a vertical shift. The r and r 2 values will be given, which are the correlation coefcients and the coefcient of determination respectively. These values are known as the linear regression for the set of data. The correlation coefcient is a number value between -1 and 1 that measures how closely two variables are related. If the data points on a graph are a perfect positive relationship i.e. the points are exactly straight, then there is an r value of 1 and, if vice versa, the r value is -1. The exact meaning of the correlation coefcient is that the closer the absolute value of the coefcient is to 1, the closer the variables are related and vice versa. The coefcient of determination is the square of the r value and is used to nd the correlation between two variables that are not constantly increasing. For example, to nd the correlation between points that form a parabola, the coefcient of determination would be used as the parabola is not always increasing i.e. increases up till a certain point and then decreases. On the other hand, when looking at data points that form an exponential curve, the coefcient of demand would be used as the variables are constantly increasing. The correlation coefcient will be used to measure the degree of the datas correlation as the data is constantly increasing (although it is apparent that the high jump scores will not increase further, the data itself does not stop increasing). The a , b , r and r 2 values are shown in the image below.

Linear Regression of High Jump Heights

The r value or correlation coefcient, shows that the data points are closely related when it is analyzed by a linear regression (i.e. when a line of best t is very close to the data points). As is evident, the equation for the linear model is y = 0.755x + 194.138 . Notice how the slope of the curve is very low i.e. below 1, which was observed previously. The

next page shows the equation when it is graphed alongside the data points and the root function equation y = (132x + 80) + 155 .

Linear Regression Line, y = (132x + 80) + 155 and High Jump Heights

Window Settings of the Graph

A few differences should be noted between the two graphs used to model the data. The rst is that the linear function is able to represent all the data points, whereas the root function is restricted to only the pattern that was identied previously. Another difference is that the domain and range of the linear function are D : { x } and R : { y } respectively. Note, that the x and y values in the linear function are all number including the negatives, whereas, in the root function the y values are only in the positive plane. Also note that the root function has a starting point (0,163.944) (i.e.

y=

[132(0) + 80 ] + 155 163.944 ) whereas the linear function does not have a starting

point. Another difference is that the linear function has a steeper slope than the root function, which is evident by the linear curve ending ( i.e. at the point (48,236)) at a higher point than the square root curve. This shows that if the points on the data set continue, they will continue to increase at higher rate than the root function. The linear function, in this case would be less accurate if estimating future scores as the nature of the data is that it can only reach a certain point and cannot increase any higher. This is due to the fact that the long jumpers can only jump so high even if they undergo many extra years of training. The linear function, thus is accurate when modeling the data set in the context of the data only, and in the notion that the data will continue to rise. However, the root function is more contextually sound, as it understands that the even though the data follows a pattern, there is a limit to its pattern. Furthermore, the root function operates in a way that it increases very slowly after it reaches a certain point which can be expected in future scores. This is because the future scores will not constantly increase because this data comes from a human sport and the human is a nite being that can only go up to certain point. Moreover, the data will have a very low chance of increasing after it reaches its maximum. The notion that the slope of the root function decreases is illustrated by looking

at its derivative at certain points along the curve. Using the GDC, derivative at various points along the curve will be found. See below.

Window Settings of the Derivative Graphs

Derivative When x = 4 . Equation is y = 2.677x + 168.951

Derivative When x = 6 . Equation is y = 2.235x + 171.119

Derivative When x = 48 . Equation is y = 0.824x + 195.549

Derivative When x = 50 . Equation is y = 0.808x + 196.355

As is shown above, it is clear that the slope of the root function increases, but at a decreasing rate. For example, the difference of the slope from when x is 4 and when x is 6 is ~0.442. This slope decreases as it moves along the curve horizontally which is shown by the following calculation. The difference of the slope when x is 48 and x is 50 is ~0.16.

Estimating the Scores for 1940 and 1944


To estimate the gold medalist high jump scores if the Olympic games occurred in the years 1940 and 1944, the standardized values for these two years will be substituted into the linear regression equation. The equation is as follows, y = 0.7550655542x + 194.1382598 . Note that the values have not been rounded - this is because it will make the data more accurate as rounding the data restricts accuracy. Now, the x-value of 8 (1940) will be substituted into the equation. y = (0.7550655542 8) + 194.1382598 200.179 . Therefore, the coordinate for the score in 1940 is (8,200.179). The same steps will be done to nd the coordinate for the score in 1944. y = (0.7550655542 12) + 194.1382598 203.199 . Therefore, the coordinate for the score in 1944 is (12,203.199). Below is a graph of the 1940 and 1944 scores with arrows indicating their coordinates.

Estimates of the 1940 and 1944 Scores

The reason as to why I chose to use the linear regression model to nd the coordinates is because the linear model is more accurate than the root function model. This is because the root model is human made and is likely to contain errors, and tends to be less precise than the linear function which is found by means of a technological device. The technological device (GDC) is designed in a way that it contains fewer errors than human made calculations and is also more accurate. A really obvious reason as to why the data is sound is because the values are not outliers and they are very close to the real values. Another reason why the linear function was used is because it is a model of all of the data, whereas the root function is just limited to a pattern within the data set. The

image on the next page shows how data scores for 1940 and 1944 would be inaccurate if determined through the root function.

High Jump Heights and Graph of y = (132x + 80) + 155

Another reason as to why the linear regression function was used to estimate the values is because when the 1940 and 1944 values are plotted alongside the linear regression equation (i.e. the one obtained without the presence of the new coordinates) it is part of the line of best t. This can be more clearly visualized below.

Graphs of Estimated High Jump Heights and Linear Regression

As is shown in the image above, both of the points lie on the linear regression curve and are, therefore accurate when shown alongside the other, real values.

Estimating Scores for 1984 and 2016


To estimate the 1984 and 2016 scores for the gold medal high jump at the Olympic games, the standardized values of the years will be substituted into the equation,

y = (132x + 80) + 155 . The reason for inserting the values into the root equation and not
the linear function equation is because of the fact that, as was stated previously, the linear function is less accurate when extrapolating because it is under the assumption that the data will constantly increase. Contrarily, the data will stop at a certain point in a long term period because the results are extracted from humans which are nite beings. On the other hand, the root function is more contextually based and is therefore, more suitable for predicting the scores. The method for nding the value for 1984 is as follows: 52 is the standardized value for the year 1984. Now, substitute 52 into the equation y = (132x + 80) + 155 like this,

y=

[(132 52) + 80 ] + 155 =

(6864 + 80) + 155 = 6944 + 155 = 4 434 + 155 238.331 .

Thus, the coordinate for the 1984 winning high jump score would be (52,238.331). Below is the graph of the point alongside the original high jump scores data and the root model equation.

Graph of High Jump Data, 1984 Value and Root Function

Window Settings of the Graph

The data here is increasing, however by a short increment as the slope of the function reects its very slow rate of increase. Now, the graph for the estimated scores in 2016 will be calculated. The standardized value for the year 2016 is 84 (i.e. 2016 - 1932 = 84). 84 will be substituted into the equation y = (132x + 80) + 155 and isolated for y .

y=

[(132 84) + 80 ] + 155 =

(11088 + 80) + 155 = 11168 + 155 = 4 698 + 155 260.679 .

Thus, the coordinate for the height during 2016 is (84,260.679). On the next page is its graph alongside the extrapolated 1984 coordinate and the genuine high jump scores.

Graph of High Jump Data, Extrapolated Values and Root Function

Window Settings of the Graph

The answers for both values seem to be following the trend set by the root function which is a slow, gradual increase. Also, note that the values have increased by a higher amount than in 1984. This is perhaps due to increased training time and/or technological advances which have aided in high jump scores.

Additional Data
Year The following data was given to show additional gold medalist high jump scores. 1896 1904 1908 1912 1920 1928 1984 1988 1992 1996 2000 2004 2008 -28 -24 -20 -12 -4 52 56 60 64

Stand -36 ardiz ed Year Heigh 190 t (cm)

68

72

76

180

191

193

193

194

235

238 234 239 235

236 236

This data will be graphed below alongside the root and linear model functions.

Additional Points with Linear and Root Models

Window Settings of the Graph

The root function does not model the data well as its slope is too steep for the data and the root function keeps on going higher at a rate that is faster than the rate of increase of the data itself. The linear function is a little bit better than the root function because it comes closer to more of the points, however, it does model the newer data well. The linear function, is therefore, only good for interpolating values that were given (i.e. the original scores from 1932 to 1980). This datas graph in the negative area starts off high and then goes dips down and then increases again. This pattern also occurs at the points after 1980 however it occurs more frequently. This dip and increase pattern occurs throughout the data but after each few series of this pattern, it slowly increases. In other words, this dip and increase pattern happens throughout the data but increases also. The modication that need to be made to the root function is that a vertical compression and a horizontal stretch needs to be applied. The reason for the vertical compression is obvious, as it will reduce the slope to make it t the data better. The horizontal stretch is to prolong the slope at one area. For example, if, before the horizontal stretch, the slope between x = 4 and x = 5 is 1, then the stretch would increase the length of the slope. So then, for example, the slope from x = 4 to x = 8 would be 1.

Math SL Portfolio 2 Gold Medal Heights

Ibrahim Asadullah MHF4UA Wednesday, February 8, 2012

Das könnte Ihnen auch gefallen