ETC1000 / ETC9000 Business and Economic Statistics
Demonstration Lecture Week 4: Simple Regression
This lecture provides examples of the material taught in this weeks lectures, to help you see its potential for real world application, and to reinforce the ideas being communicated.
Case Study: The Relationship Between Anonymous Revenue and GDP: Is It Stable?
Background: We are looking at the performance a former Government-owned enterprise, which has recently been privatised. Its name is Anonymous. Market analysts are very interested in this company, as it is very large, and has many shareholders. Being able to predict its future performance would be extremely valuable, as share traders could then act accordingly.
A group of market analysts have undertaken research that shows a very strong relationship between the total revenue of Anonymous and Australias Gross Domestic Product (GDP). They argue that this relationship can be used to predict future performance of Anonymous very accurately.
Our task is to assess whether the analysis that has been performed is sensible. How will we do that?
So, the evidence is there for a very strong relationship.
Now, lets do some critique.
1. Checking the Data
First, notice a big jump in revenue in 1991/92. Revenue grew by more than 28% in just one year, compared to typical growth of about 10-15%. This is a sign of something unusual happening, a one-off type event. To ignore this unusual jump will potentially distort the rest of the analysis.
On enquiring further, we discover that in 1991/92, there was a change in definition of total revenue of this company some subsidiary that was previously not included, was now included in revenue. So this jump is just an anomaly associated with this change in definition. We have to correct for this.
To do that, we use some extra information and statistical techniques (too complex to go into here) that adjust the revenue data so that it is comparable across the whole sample period.
Heres the scatter plot with the adjusted revenue data.
Adjusted anonymous revenue GDP(nom) Adj ust ed anonymous r evenue 1 GDP( nom) 0. 997573 1
2. Does correlation mean causality and forecastability?
Recall the motivation the analysts had for this work was to be able to predict future company performance so they could plan share market trading. So the real test of the analysis is in whether we can use information about GDP to predict the future.
Lets estimate a simple regression model first.
SUMMARY OUTPUT
Regression Statistics Mul t i pl e R 0. 997573 R Squar e 0. 995151 Adj ust ed R Squar e 0. 994931 St andar d Er r or 330. 1253 Obser vat i ons 24
ANOVA df SS MS F Significance F Regr essi on 1 4. 92E+08 4. 92E+08 4514. 997 5. 87E- 27 Resi dual 22 2397619 108982. 7 Tot al 23 4. 94E+08
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% I nt er cept - 1330. 3 144. 4322 - 9. 21052 5. 27E- 09 - 1629. 83 - 1030. 76 GDP( nom) 0. 028052 0. 000417 67. 19373 5. 87E- 27 0. 027186 0. 028918
What do we learn from this?
Note first the Multiple R value of 0.997573. This is the same as the correlation analysis. These are, in simple regression, the same quantity.
Next, look at the R squared value: 99.5% is very high!
Now we turn to the estimated coefficients (intercept and slope). To interpret these, we need to be reminded of the units of the original data. Both GDP and revenue are measured in billions of dollars. b1=0.028052: the model predicts that a $1billion increase in GDP would on average lead to an increase in company revenue of $0.028 billion ($28 million).
b0=-1330.3: the model predicts that if GDP was zero, revenue of this company would be -$1330 billion. As is often the case, this is not very sensible: GDP is unlikely to be anywhere near zero, and negative revenues are not possible!
Note also that the slope coefficient is clearly not zero: the p-value is extremely small, indicating there is clearly a relationship between GDP and revenue.
A useful gauge of the accuracy of a model is forecasting performance. This can give quite a different picture to that suggested by such a good within-sample fit (R- squared). In particular, it is of interest to assess how well the model can forecast growth rates in Anonymous Revenue. The simple regression model of Revenue on GDP was re-estimated with a sequence of moving samples 1975/76 to 1982/83 then moving ahead a year each time and one-year ahead forecasts of revenue generated. These forecasts were then used to calculate forecasts of the percentage growth in revenue. The table below shows these forecasts compared to actual growth rates. Whilst the forecasts are reasonable, they often vary quite substantially from actual revenue growth. The story is of a much less accurate model than a 99.8% R 2 would suggest.
Note also that the model is consistently under-predicting revenue.
3. What is the underlying behavioural relationship we need to quantify?
The industry we are studying has undergone significant changes in the past decades. The opening up of the market to competition has had a massive impact on Anonymous performance. Anonymous share of total market revenue has declined from around 99.5% in 1991/92 to 81.5% in 1998/99. At the same time, the industry has enjoyed unprecedented growth, with the creation of a wide range of new products and growth in demand for services. Data on Total Market Revenue suggests that the industry has grown steadily from 2.0% of GDP in 1976/77 to 2.7% in 1989/90, and then leapt to 3.75% in 1998/99.
Such changes in the market are bound to have some impact on the relationship between Anonymous Revenue and GDP. We would argue that the general level of economic activity (GDP) is a key driver of the overall market (Total market revenue), and that in turn within-market competition determines the market share which Anonymous claims. These two behavioural links drive the reduced form relationship between GDP and Anonymous revenue. This reduced form relationship can only be adequately understood and assessed by evaluating the two component relationships. The stability of the reduced form relationship is only as stable as the two component relationships.
The graph below shows how market share has declined over the sample period as more and more competition has entered.
The reason we are interested in the stability of relationships over time is that this is a test of how robust they are. If there is a change in the relationship over time, then it means there are other complexities that are not captured in the model. The model will not forecast well in these cases.
To look at the question of stability, we estimate the model for various sub-samples, much as we did to produce the 1-step ahead forecasts above. We then calculate elasticities from our regression estimates: an elasticity tells us the % response in revenue to a 1% change in GDP. The graph below shows the results:
Fi gure El asti ci ti es of Revenue to GDP 0.0000 0.2000 0.4000 0.6000 0.8000 1.0000 1.2000 1.4000 1.6000 1.8000 8 2 / 8 3 8 4 / 8 5 8 6 / 8 7 8 8 / 8 9 9 0 / 9 1 9 2 / 9 3 9 4 / 9 5 9 6 / 9 7 9 8 / 9 9 Anonymous total
What do we learn from this graph? First, the elasticity of Anonymous revenue to GDP appears quite stable at a little over 1%. This is what the market analysts found initially.
BUT: what we also see is that total market revenue has a much bigger elasticity in the latter half of the sample, and it has changed quite a lot during the sample period. Why is this? As noted above, this is a dynamic and growing industry. During the 1990s especially, it grew very rapidly, much faster than GDP. Changes in technology have led this rapid growth.
What does all this mean for the relationship between revenue of Anonymous and GDP?
The evidence suggests that the relationship between GDP and Anonymous Revenue, which takes place via Total Market Revenue, comes from a confounding of two quite unstable relationships, and hence is itself unstable.
The appearance of stability in relationship between Anonymous Revenue and GDP as seen in the graph above, is a coincidence of these two unstable factors cancelling each other out. This is why forecasts based on a belief in a stable relationship between Anonymous Revenue and GDP are likely to be very poor.
(Chapman & Hall - CRC Texts in Statistical Science) Paul Roback and Julie Legler - Beyond Multiple Linear Regression-Applied Generalized Linear Models and Multilevel Models in R-CRC Press (2020)