Sie sind auf Seite 1von 9

Assignment No.

1 Statistics and Research Methodology

Correlation and regression is used to find the relationship between the two or more variables. Here one dependent and one independent variable is taken to find the degree of association with each other. We can also observe different trends by changing one variable. Dependent variable chosen is Temperature which is in degree Celsius and independent variables is Altitude which is in meters. Altitude is measured in upward direction from a reference point. It is generally measured above the sea level vertically up. Troposphere depth is approximated to 11 km i.e. 11000 m which covers the lowest portion of the earths atmosphere. It includes 80% of mass. While temperature measures the degree of hotness and coldness. So as we move up in troposphere, the value of temperate changes.

1. Collection of data samples A random sample of 30 is selected for correlation regression analysis. A simple correlation is used for studying only two variable. One independent and one dependent variable is taken which is given below:

Independent variable x: - Altitude (in meters) Dependent variable y: - Temperature (in degree Celsius) Both the variables are inversely related to each other because the changes are observed in troposphere. I have considered the troposphere layer of atmosphere. As we go

vertically up in stratosphere which covers the area of 12 km to 50 km then temperature increases in this region. If we move in mesosphere covers the area between 50 km to 80 km then temperature again starts decreasing. It can be observed that each layer exhibits different relation. For finding the relationship between these two variables we have to consider one layer of atmosphere. If we consider mixture of layers then we cannot relate the variables because sometimes it exhibits direct relationship while other time indirect relation. We need to be very careful while choosing the variables.

The data table of collected sample values of Altitude and Temperature is given below
Sr. no. Altitude(meters) Temp ( c )

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

0 305 610 914 1219 1524 1829 2134 2438 2743 3048 3353 3658 3962 4267 4572 4877 5182 5406 5791 6096 6401 6706 7010 7315 7620 7925 8230 8534 8839

15 11.2 12.1 9.1 5.2 3.8 3.1 2.8 -0.1 -1.9 -3.8 -6.5 -10.8 -11.8 -12.1 -13.9 -15.5 -16.8 -18.2 -18.9 -21.6 -22.4 -26.2 -32.6 -33.5 -34.8 -34.9 -35.1 -36.5 -45.1

2. Scatter diagram between variable x and y Scatter diagram is plotted to determine the relationship between the two variables. Independent variable, Altitude is plotted on x-axis and dependent variable, Temperature on y-axis. Each point indicates the corresponding value of independent and dependent variable. Size of 30 sample is taken so 30 data pairs are observed on the graph. Scatter diagram is used to find the correlation between the variables and the degree of relationship between them. By looking at graph we can find that both the variables are inversely related to each other because it is showing downward slope. If we increase the altitude or height then temperature decreases. The relation between them is not linear because we are not getting a straight line, so it is a non-linear. But it is close to the line of fit which indicates the variation is less.

Scatter Diagram: Correlation between Altitude and Temperature

20 10 Temperature(c) 0 -10 0 -20 -30 -40 -50 Altitude(meters) Temp.(Degree Celsius) Linear (Temp.(Degree Celsius)) 2000 4000 6000 8000 10000

3. Mean and standard deviation of Altitude and Temperature The mean of altitude is 4416.93 that indicates the average value of altitude is 4416.93m for 30 samples but the deviation is very high which indicates that values may range between 4416.932682.19 m. In this case, deviation from mean is very high because the altitude is varying with higher proportion. High deviation does not always indicate the higher amount of error. Similarly in case of Temperature the average value of sample is -13.02 c and the variation from mean is 16.9 which is comparatively low from Altitude. So range of deviation -13.02 16.9 c is allowed. It is also observed that if we reduce the sample size then deviation also decreases which is good but small sample size does not give proper

value. So we need to maintain a proper balance between sample size and variation by keeping the cost involved and time required in mind.
Parameters Altitude(meters) Temp ( c )

Mean SD

4416.93 2682.19

-13.02 16.9

4. Covariance between two variables x and y Covariance between two variables indicates the amount of degree by which both the variables vary. Covariance can be positive, negative or zero. Cov (x,y) = -45078.27 Here both the variables are showing inverse relationship that indicates it is negative correlation. It helps in predicting the value of dependent variable. We can predict the value of temperature by using covariance. Covariance is directly related with the plot on x-axis and y-axis, which means it shows the property of bilinearity. The value obtained is very high that indicates altitude and temperature vary with high amount.

5. Correlation coefficient between x and y Correlation coefficient is denoted by r. It shows the degree of association between two variables. It is also called Product Correlation Coefficient. Its value ranges from +1 to 1. Negative value indicates the inverse relation while positive value indicates the direct relation between the two variables. Here negative sign indicates an inverse relation which means if the value of Altitude increases then temperature decrease. The value is 0.98 is showing a strong correlation that means both the variables are strongly correlated. r = -0.98 Hence the variables are reasonable valid. This has been proved that if altitude level increases from sea level then temperature reduces because as we move up then gravity force reduces and so is pressure. Air becomes lighter and expands which causes it to get cool which reduces the temperature. 6. Regression coefficient of y on x Regression analysis gives the measure of relationship between dependent and independent variable and its helps in forecasting the values which helps in decision making. Here curve is not a straight line which indicates it is curvilinear regression.

Negative sign indicates the inverse relation and on an average if we increase the altitude then temperature decreases by 0.0063 degree Celsius. The regression coefficient of y on x is given by b (yx) = -0.0063 Regression coefficient is the slope of the line obtained from the regression equation. 7. Linear regression equation of y on x It is given by y = -0.0063x + 14.653 Where x = Altitude (meters), y = Temperature (c) If we change the value of altitude then accordingly values of temperature is obtained. So this can be used for prediction of data. So we can control the value of independent variable i.e. altitude to get the desired value of temperature. By using this we can easily analyse the temperature at particular height above the earths surface.

Altitude vs Temperature
20 10 Temperature(c) 0 -10 -20 -30 -40 -50 y = -0.0063x + 14.653 R = 0.9888 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Altitude(meters) Temp.(Degree Celsius) Linear (Temp.(Degree Celsius))

R^2 is called coefficient of determination which measure the amount of variation in the dependent variable. So we can say that 98.88% variation in temperature is explained by the equation given above. We can conclude that regression fit is good if we accepts the error about 2% in the temperature. 8. Linear regression equation of x on y Now we have to reverse the case,

Equation of x on y is calculated as y = -157.8x + 2361.8 where x= Temperature ( c ), y= Altitude (meters) By this equation we can vary temperature in order to get the different value of altitude. In this case dependent variable is altitude and independent variable is temperature. We can see the regression coefficient is changed to -157.8 from -0.0063 which means on an average change in temperature by one degree Celsius produces the change in altitude by 157.8 meters. So we can say the variation in altitude is higher than temperature. 9. Plotting equation on scatter diagram If we plot the above equation on scatter diagram we get the curve

Temperature vs Altitude
10000 8000 Altitude (meters) 6000 y = -157.8x + 2361.8 R = 0.9888 4000 2000




0 -10 0 -2000 Temp. (Degree Celsius) -20




Linear (Altitude(meters))

The graph is plotted for 30 data samples. The downward slope indicates the inverse relation between the variables. At different temperature the value of altitude changes. We can observe that coefficient of determination is same in both the cases. It is not affected if the variables are interchanged. 10. Estimate the values of y by changing the value of x We have equation, y = -157.8x + 2361.8 where x = Temperature ( c ) and y = Altitude (meters)

By using above equation we can estimate different values of y by changing values of x.

Temp.( c ) Altitude(meters)

-29198.2 -13418.2 -5.2 18141.8 33921.8

200 100 15 -100 -200

Scatter Diagram: Temperature vs Altitude

250 200 150 Altitude (meters) 100 50 -30000 -20000 0 -10000 -50 0 -100 -150 -200 -250 Temp. ( c ) 10000 20000 30000 40000


Plotting scatter diagram of above values

Here we take some values of temperature and accordingly estimate the values of altitude. It would help in decision making process. The value of temperature is negative indicating it is decreasing if the altitude is high. As the sunlight comes on surface of the earth then most of its heat is absorbed on the ground that is why the temperature is higher in lower region. If we keep the temperature -5.2 c i.e. lower than freezing point then we can estimate the altitude value will be 15m. Similarly if we go deep inside the earth then temperature increase because earth is filled with a hot substance called magma. For temperature value of 18141.8 c, the estimated value of altitude will be -100 meters because we are moving down the earths surface.

11. Utility of the concept of correlation and regression Utility of correlation: Utility of correlation can be observed in many field like physics, economics, biology, social-science etc... Correlation reduces the uncertainty which helps in taking better decision Correlation helps in finding the factors which can stabilize the disturbed situations The relationship between the variable is helpful in opening the new frontier of knowledge and promoting the research It is useful in knowing the relation between the two variables so deficiencies can be reduced by it Utility of regression: Regression analysis is very useful in real life scenario, the utility of studying regression is given below: The cause and effect relationship can be calculated by regression It helps in predicting and forecasting the values which is very essential in todays scenario It helps in minimizing the errors in get more accurate data By using regression we can estimate the value of one variable using other It provides the cost benefit and time saving

So both correlation and regression is very helpful in analysing any situation.

Correlation and regression both are interrelated. It helps in finding the fluctuation of one variable with another variable. Correlation gives the information whether the variables hold any relation or not. It also indicates direct and inverse relationship between them. By looking at the trend of graph we can analyse it. The coefficient of correlation indicates the degree of relatedness, generally it is consider that if the value is greater than 0.75 then both the variables are highly related. While regression analysis gives us the information of cause and effects of one variable with other. By using regression, errors can be minimized because we can interpret the relation between the variables and it also increase the confidence level. So independent variable can be varied to get the desired value of dependent variable. Correlation and regression gives a better picture of any situation.