Sie sind auf Seite 1von 127

STA2604/1

Department of Statistics
STA2604

Forecasting

Study guide for STA2604

STA2604/1

Table of contents
UNIT 1: An Introduction to Forecasting
1.1 Introduction

1.1.1 Forecasting

1.1.2 Data

1.1.3 Components of a time series

10

1.1.4 Applications of forecasting

14

1.2 Forecasting methods

15

1.2.1 Qualitative methods

15

1.2.2 Quantitative methods

16

1.3 Errors in forecasting and forecast accuracy

18

1.3.1 Absolute deviation

22

1.3.2 Mean absolute deviation

22

1.3.3 Squared error

23

1.3.4 Mean squared error

23

1.3.5 Absolute percentage error (APE)

24

1.3.6 Mean absolute percentage error (MAPE)

25

1.3.7 Forecasting accuracy

25

1.4 Choosing a forecasting technique

26

1.4.1 Factors to consider

26

1.4.2 Strike the balance

28

1.5 An overview of quantitative forecasting techniques

29

1.6 Conclusion

30

ii
UNIT 2: Model Building and Residual Analysis
2.1 Introduction

31

2.2 Multicollinearity

33

2.2.1 Clarification of multicollinearity

33

2.2.2 The variation inflation factor (VIF)

34

2.2.3 Comparing regression models

38

2.3 Basic residual analysis

41

2.3.1 Residual plots

42

2.3.2 Constant variation assumption

43

2.3.3 Correct functional form assumption

45

2.3.4 Normality assumption

45

2.3.5 Independence assumption

47

2.3.6 Remedy for violations of assumptions

47

2.4 Outliers and influential observations

48

2.4.1 Leverage values

49

2.4.2 Residuals

50

2.4.3 Studentised residuals

52

2.4.3.1 Deleted residuals

53

2.4.4 Cooks distance

54

2.4.5 Dealing with outliers and influential observations

54

2.5 Conclusion

54

iii

STA2604/1

UNIT 3: Time Series Regression


3.1 Introduction

56

3.2 Modeling trend by using polynomial functions

57

3.2.1 No trend

58

3.2.2 Linear trend

58

3.2.3 Quadratic and higher order polynomial trend

59

3.3 Detecting autocorrelation

64

3.3.1 Residual plot inspection

64

3.3.2 First-order autocorrelation

66

3.3.2.1 Durbin-Watson test for positive autocorrelation

67

3.3.2.2 Durbin-Watson test for negative autocorrelation

69

3.3.2.3 Durbin-Watson test for autocorrelation

70

3.4 Seasonal variation types


3.4.1 Constant and increasing seasonal variation
3.5 Use of dummy variables and trigonometric function

71
75
76

3.5.1 Time series with constant seasonal variation

76

3.5.2 Use of dummy variables

77

3.5.3 High season and low season

78

3.5.4 Use of trigonometric on a model with a linear trend

82

3.6 Growth curve models

83

3.7 AR(1) and AR(p)

84

3.8 Use of trend and seasonality and forecast development

84

3.9 Conclusion

85

iv
UNIT 4: Decomposition of a Time Series
4.1 Introduction

86

4.2 Multiplicative decomposition

87

4.2.1 Trend analysis

87

4.2.2 Seasonal analysis

89

4.2.3 Analysis of random variations in a time series

91

4.2.4 Obtaining a forecast

91

4.3 Additive decomposition

94

4.5 Conclusion

95

UNIT 5: Exponential Smoothing


5.1 Introduction

96

5.2 Simple exponential smoothing

97

5.3 Tracking signals

101

5.4 Holts trend corrected exponential smoothing

103

5.5 Holt-Winters methods

105

5.5.1 Additive Holt-Winters method

105

5.5.2 Multiplicative Holt-Winters method

108

5.6 Damped trend exponential

109

5.7 Conclusion

110

STA2604/1

ABOUT THIS MODULE


Prologue
Forecasting is the process of making statements about events whose actual outcomes (typically)
have not yet been observed. A commonplace example might be estimation of the expected value for
some variable of interest at some specified future date. Prediction is similar, but more general term.
Both might refer to formal statistical methods employing time series, cross sectional or longitudinal
data, or alternatively to less formal judgemental methods. More will be seen at various parts of the
presentation of the module.
The module is about Forecasting, which deals with the methods used to predict the future, i.e. to
forecast. Can you think of a situation where predictions of the future are needed or cases where
forecasting is done? By its nature it is a quantitative method that uses numeric data. There are
various forecasting methods, some of them being qualitative because they are based on non-numeric
data. Even though qualitative methods feature in some of our discussions, they are not dealt with in
depth in this module.

This module presents fundamental aspects of Time Series analysis used in forecasting.

The

prescribed textbook for this module is Bowerman, OConnell and Koehler (2005). We will not study
all the chapters in the book for this module, but will focus on Chapters 1, 5, 6, 7 and 8.

The module is done in one semester. Make sure that you are registered for the right semester and
the material you receive is the correct one.
About the book
The prescribed book is reader-friendly and contains limited mathematical theory. It is geared towards
the practice of forecasting. The authors are experienced practitioners in the field of time series. The
book will assist you in understanding concepts and methodology, and in applying these in practice
(i.e. in real-life situations).

The computer and the calculator


We recommend that you acquire a non-programmable scientific calculator of your own.

It is

imperative to have your own calculator in the examination. It is important, although not compulsory,
to have access to a computer in order to undertake the tasks in this module. You may visit a Regional
Centre to use a computer. The text contains output from Excel, MINITAB, JMP IN and SAS. However,
we encourage the use of any software to which you may have access. The above list of computer
software/packages may be used, as well as R, SPSS, Stata, S-Plus and EViews. Your ability to use
such software will increase your marketability in the workplace. You are encouraged to experiment
with the packages at your disposal.

vi

REFERENCES
The prescribed book must be purchased. Refer to the study guide regularly. We shall also refer to a
number of user-friendly textbooks on Time Series that are available in the Unisa library. You do not
need to buy the recommended books for this module.

PRESCRIBED BOOK
Bowerman, B. L., OConnell, R. T. & Koehler, A. B. (2005) Forecasting, time series and regression:
an applied approach, 4th edition. Singapore: Thomson Brooks/Cole.

ADDITIONAL USEFUL BOOKS FOR THIS MODULE


Crosby, J. V. (2000). Cycles, trends, and turning points: practical marketing and sales forecasting
techniques. Lincolnwood, IL: NTC Business Books.
Chapter 4 of this book deals specifically with Time Series, while chapters 1, 2, 3, 7, 10 and 20 deal
with other topics that are very relevant in this module. The remaining chapters illustrate applications
that may expose you even more to time series. It is useful.
Curwin, J. & Slater, R. (2002). Quantitative methods for business decisions (Chapter 14). London:
Thomson Learning.
This book also presents measures that we use in statistics and in time series applications. It can be
used for other modules as well. Find time to read it.
Dexter, B. (1996). Business mathematics (Chapter 15). London: Macdonald and Evans.
Only chapter 15 presents Time Series, and in not more than 12 pages. Production planning and
forecasting are presented in Chapter 4 of this book to expose you to real-life applications. I seriously
advise you to look at these two chapters.
Hair, J. R., Anderson, R. E., Tatham, R. L. & Black, W. C. (1998). Multivariate data analysis, 5th
edition. Prentice-Hall, Inc.
Appendix 4A of this book presents some distance measures that are useful in this module. Cooks
distance is presented on pages 225 and 234 of this appendix. You are urged to read them. This
book is very useful in exposing various applications of multivariate statistics. Read and enjoy it.
Kendall, M. G. (1990). Time series, 3rd edition. London: Edward Arnold.
Simply the best! Kendall exposes us to time series. His is one of the greatest names remembered
when Time Series are mentioned. Even his previous editions still present good information about the
topic. Why not cash in on time series from the horses mouth!

vii

STA2604/1

THE PRESENTATION OF THE MODULE


This study guide summarises the five prescribed chapters of the textbook.
Prior knowledge
It is important that you are familiar with a section before moving to the next one. This will serve
as a foundation for the forthcoming work. Leaving out work without understanding it can only add
to the accumulation of problems during the examination. This is also true about the prerequisites
from first-year statistics and the knowledge you have acquired through the years. Sensible or smart
application is based on the use of the accumulated techniques, experiences and knowledge. Plotting
of graphs, fitting a linear model, and so on, are needed in some places. You are urged, therefore,
to incorporate all the useful techniques in the solutions to exercises. We advise you to revisit these
topics in your first-year module.
It is necessary to realise that numbers alone do not provide all the answers. It should be clear to
you that aspects of a qualitative nature add value to the predictions made so that the data context is
clear.
This study guide
In this study guide we attempt to present explanations of the concepts in the textbook. It contains
easy examples as well as activities for you to practise. You are encouraged to do the activities
in order to learn effectively. Reading of feedback alone leaves gaps in your learning. There are
discussions following the activities so that the feedback is immediate. Do not just read through them;
try to explore them by testing that you can do them as well, even if you use alternative methods.
The exercises selected for assignments are important in reinforcing what you need to understand in
this module. Take time to understand the aspects that go with them. Analyse the postulates in the
given statements and thereafter the requirements so that it becomes easy to recall what is necessary
in compiling a solution. In that way you do not only solve the problem, you understand it and enjoy
solving it. At the end of the semester there is a two-hour closed-book examination. The discussions
in the study guide and the textbook prepare you for that examination.
This study guide is prepared to guide you through the prescribed book. Therefore, we will always
use it together with the prescribed book. Read them together. The textbook presents the concepts,
study guide attempts to bring the concepts closer to you.
Each study unit starts with the outcomes in order to show you what you need to know and to evaluate
yourself. The table of outcomes also gives each outcome together with the way the outcome will
be assessed, the content needed for that outcome, the activities that will be used to support the
understanding of the content and the way feedback will be given. Your input in the form of positive
criticism to improve the presentation will be of importance in the review of this study guide. You are
therefore encouraged to suggest ways that you believe can improve the presentation of this module.

viii
Module position in the curriculum
We have been offering a postgraduate module on Time Series at Unisa, but have become aware of
the need to introduce the module at undergraduate level due to its necessity in the workplace and in
order to fill the gap that is evident when students attempt the postgraduate time series module.
This module is part of the whole Statistics curriculum at Unisa. Its position on the curriculum structure
is as follows:
1st year

STA1501

STA1502

STA1503

2nd year

STA2601

STA2602

STA2603

3rd year

STA3701

STA3702

STA3703

STA2604
FORECASTING
We are here
STA3704 STA3705

STA2610
STA3710

You should already be familiar with some of the modules mentioned above. Knowledge from
STA2604 will help you in STA3704 (Forecasting III).

ASSIGNMENTS
There are two assignments for this module, which are intended to help you learn through various
activities. They also serve as tests to prepare you for the examination. As you do the assignments,
study the reading texts, consult other resources, discuss the work with fellow students or tutors or
do research, you are actively engaged in learning. Looking at the assessment criteria given for
each assignment will help you to understand what is required of you more. The two assignments
per semester prescribed for this module form part of the learning process. The typical assignment
question is a reflection of a typical examination question. There are fixed submission dates for the
assignments and each assignment is based on specific chapters (or sections) in the prescribed book.
You have to adhere to these dates as assignments are only marked if they are received on or before
the due dates.
Both assignments are compulsory as
they are the sole contributors towards your year mark and
they form an integrated part of the learning process and indicate the form and nature of the

questions you can expect in the examination.

Please note that the submission of assignment 01 is the guarantee for examination entire . If you
do not submit assignment 01, UNISA not the Department of Statistics will deny you examination
entry.

You are urged to communicate with your lecturer(s) whenever you encounter difficulties in this
module. Do not wait until the assignment due date or the examination to make contact with lecturers.
It is helpful to be ready long in advance. You are also encouraged to work with your own peers,
colleagues, friends, etc. Details about the assignments will be given Tutorial letter 101.

ix

STA2604/1

Time series has its own useful terminology that should be understood. In order to familiarise yourself
with it, let us start with an easy activity. Activities help in the creation of a mind map of the module.
The more you attempt these activities, the better you will understand the work.

GLOSSARY OF TERMS
ACTIVITY 0.1
(a) Make a list of all the concepts that are printed in bold type in Chapters 1, 5, 6, 7 and 8 of the
prescribed book. They serve as your glossary.
(b) Attempt meanings of these concepts before you deal with the various sections so that you have
an idea before we get there.

DISCUSSION OF ACTIVITY 0.1


(a) There is a missing concept/term among the ones you listed, which is absolutely fundamental. It
appears with other terms or phrases. The term is data. You came across the term many times
when you studied other modules and in some other contexts. It is emphasised that it is a useful
aspect in forecasting. If you do not have data, you will not be able to make forecasts.
(b) Do not worry if the meanings you gave do not match the content in the tutorial letter or textbook.
The intention was to make you aware of aspects on which to focus in your learning.What is
required from you is a step-by-step journey through the prescribed material.
ACTIVITY 0.2
What is the meaning of the word data?
DISCUSSION OF ACTIVITY 0.2
There is a general misconception that data and information are the same concepts. This is not
necessarily the case. Data are records of occurrences from which we obtain information. It is not
necessarily information on its own, but may sometimes be information. The truth is, data possess
information that is seen after some analysis. They are often the raw answers we receive from an
investigation.

WHAT TO EXPECT IN THE MODULE


In this module we use a scientific calculator to perform calculations. We will also draw graphs, form
mathematical models (equations) that are used to develop forecasts and make decisions based on
time series data. Most of these aspects stated were taught at first-year level. The new topic is the
pattern of time series data. The way time series data appear is unique because without this form
they cannot qualify to be time series data.

PREREQUISITES
The ability to use a scientific calculator.
Access to a computer package and the ability to use it are highly recommended.
First-year statistics. These topics appear below and there will be a quick reminder whenever we

need them. We will need

- Simple linear regression


- Correlation measures
- Polynomials
- Graph plotting

When you draw plots required for statistical analysis, these plots should be accurate. Hence, use
a ruler and a lead pencil (not a pen) to construct plots. If you have access to a computer, you are
also encouraged to practise using any statistical package of your choice. Assignments may also be
prepared by means of a computer. Just make sure that you use the correct notation. Avoid using a
computer if you cannot write the correct notation. Remember that you are always welcome to contact
the lecturers whenever you have problems with any aspect of the module.

OUTCOMES
At the end of the module you should be able to do the following:
Define and apply components of time series.
Apply time series methods to develop forecasts.
Specify a prototype forecast model, estimate its parameters and then validate it.
Use the specified model to derive forecasts.

xi

STA2604/1

TABLE OF OUTCOMES
Outcomes - At the
end of the module
you should be
able to
- explain and expose
time series
components

Assessment
- analyse data
- plot graphs

Content
- trend
- seasonality
- cycles
- irregularity
- choosing a
technique

Activities

Feedback

- examine data
visually
- plot graphs

- discuss
likely
errors

- analyse errors
- plot graphs

- scrutinise
models

- select a model

- balance
factors

- develop a model

- forming an
equation

- regression
- exponential
smoothing

- small build-up
exercises

- emphasise
aptness

- estimate parameters

- perform
estimations

- estimation
methods

- perform
calculations

- discuss
alternatives

- validate a model

- statistical
tests

- hypothesis
testing

- test hypotheses

- peruse the
various tests

- develop forecasts

- demonstrate
patterns

- model
building

- form equations

- visit various
alternatives

You will know that you understand this module once you understand the above issues.
Feedback is not just a follow-up of the preceding concepts. It is an opportunity to reinforce some
concepts and revise others. Make use of this opportunity. Feedback is given after every activity,
sometimes with some discussion after the activity, but in many instances, it follows immediately after
the activity.

OVERVIEW
Two of the five study units comprising this module are presented in this study guide.
Unit 1: Narration of the forecasting domain and support elements
(Chapter 1 of Bowerman et al.)
In this unit we will learn more about
Situations requiring forecasts and forecasting
Issues about useful data and use of data in developing forecasts

xii
Basic types of data and approaches (quantitative and qualitative methods)
Errors, problems and pitfalls in forecasting, as well as depiction of good forecasts
Factors useful in choosing a forecast technique
More about quantitative methods

Do the above issues raise some response from you? Do you have any idea of what they mean or
imply? Think and chat with your colleagues, peers or family members. Remember that learning
becomes real and effective only when sharing is involved.
Unit 2: Building a forecast model and examining / verifying its strength
(Chapter 5 of Bowerman et al.)
In this study unit we will learn about
Multicollinearity of variables:

- Variance inflaction factors


- R2
- adjusted R2
- standard error
- interval length
- C-statistic
Residual analysis:

- residual plots
- the constant variance assumption
- assumption of correct functional form
- normality assumption
- the independence assumption
Outliers and influential observations:

- outliers
- influential data
- diagnostic methods to detect outliers and influential observations
- leverage points
- residuals
- Cooks distance measure

xiii

STA2604/1

The measures dealt in with this Unit ensure that the model built for use in forecasting has desirable
properties of limited error and is influenced to the minimum, if at all it is influenced. Also, it is
necessary to make a distinction between outliers and seasonal variations. Sometimes a mistake is
made with an effect of seasonality being misinterpreted as an outlier.
We hope you have come across some of the concepts or issues above. Discuss these with your
colleagues, peers, friends or family members.

DIFFICULTIES IN FORECASTING TECHNOLOGY


Nearly all futurists describe the past as unchangeable, consisting as a collection of knowable facts.
We generally perceive the existence of only one past. When two people give conflicting stories of
the past, we tend to believe that one of them must be lying or mistaken.
This widely accepted view of the past might not be correct. Historians often interject their own beliefs
and biases when they write about the past. Facts become distorted and altered over time. It may
be that past is a reflection of our current conceptual reference. In the most extreme viewpoint, the
concept of time itself comes into question.
The future, on the other hand, is filled will uncertainty. Facts give way to opinions. The facts of the
past provide the raw materials from which the mind makes estimates of the future. All forecasts are
opinions of the future (some more carefully formulated than others). The act of making a forecast is
the expression of an opinion. The future consists of a range of possible future phenomena or events.

DEFINING A USEFUL FORECAST


The usefulness of a forecast is not something that lends itself readily to quantification along any
specific dimension (such as accuracy). It involves complex relationships between many things,
including the type of information being forecast, our confidence in the accuracy of the forecast, the
magnitude of our dissatisfaction with the forecast, and the versatility of ways that we can adapt to or
modify the forecast. In other words, the usefulness of a forecast is an application sensitive construct.
Each forecasting situation must be evaluated individually regarding its usefulness.
One of the first rules is to consider how the forecast results will be used. It is important to consider
who the readers of the final report will be during the initial planning stages of a project. It is wasteful
to apply resources on an analysis that has little or no use. The same rule applies to forecasting. We
must strive to develop forecasts that are of maximum usefulness to planners. This means that each
situation must be evaluated individually as to the methodology and type of forecasts that are most
appropriate to the particular application.

xiv

FORECASTS CREATE THE FUTURE


Often the way we contemplate the future is an expression of our desire to create that future.
Arguments are that the future is invented, not predicted. The implication is that the future is an
expression of our present thoughts. The idea that we create our own reality is not a new concept. It
is easy to imagine how thoughts might translate into actions that affect the future.
Forecasting can, and often does, contribute to the creation of the future, but it is clear that other
factors are also operating. A holographic theory would stress the interconnectedness of all elements
in the system. At some level, everything contributes to the creation of the future. The degree to
which a forecast can shape the future (or our perception of the future) has yet to be determined
experimentally and experientially.
Sometimes forecasts become part of a creative process, and sometimes they do not. When two
people make mutually exclusive forecasts, both of them cannot be true. At least one forecast is
wrong. Does one persons forecast create the future, and the other does not? The mechanisms
involved in the construction of the future are not well understood on an individual or social level.

ETHICS IN FORECASTING
Are predictions of the future a form of propaganda, designed to evoke a particular set of behaviours?
Note that the desire for control is implicit in all forecasts. Decisions made today are based on
forecasts, which may or may not come to pass. The forecast is a way to control todays decisions.
The purpose of forecasting is to control the present. In fact, one of the assumptions of forecasting
is that the forecasts will be used by policy-makers to make decisions. It is therefore important to
discuss the ethics of forecasting. Since forecasts can and often do take on a creative role, no one
has the absolute right to make forecasts that involve other peoples futures.
Nearly everyone would agree that we have the right to create our own future. Goal setting is a form
of personal forecasting. It is one way to organize and invent our personal future. Each person has
the right to create their own future. On the other hand, a social forecast might alter the course of an
entire society. Such power can only be accompanied by equivalent responsibility.
There are no clear rules involving the ethics of forecasting. Value impact is important in forecasting,
i.e. the idea that social forecasting must involve physical, cultural and societal values. However,
forecasters cannot leave their own personal biases out of the forecasting process. Even the most
mathematically rigorous techniques involve judgmental inputs that can dramatically alter the forecast.

xv

STA2604/1

Many futurists have pointed out our obligation to create socially desirable futures. Unfortunately, a
socially desirable future for one person might be another persons nightmare. For example, modern
ecological theory says that we should think of our planet in terms of sustainable futures. The finite
supply of natural resources forces us to reconsider the desirability of unlimited growth. An optimistic
forecast is that we achieve and maintain an ecologically balanced future. That same forecast, the
idea of zero growth, is a catastrophic nightmare for the corporate and financial institutions of the free
world. The system of profit depends on continual growth for the well-being of individuals, groups,
and institutions.
Desirable futures is a subjective concept. It can only be understood relative to other information.
The ethics of forecasting certainly involves the obligation to create desirable futures for the person(s)
that might be affected by the forecast. If a goal of forecasting is to create desirable futures, then the
forecaster must ask the ethical question of desirable for whom?.
To embrace the idea of liberty is to recognise that each person has the right to create their own
future. Forecasters can promote libertarian beliefs by empowering people that might be affected by
the forecast. Involving these people in the forecasting process, gives them the power to become
co-creators in their futures.

BENEFITS OF FORECASTING
Forecasting can help you make the right decisions, and earn/save money. Here are a few examples.

Define better sale strategies


If a product is declining, maybe it is a good idea to consider stop producing it. But maybe not:
maybe it is just your sales that are declining, but not your competitors?
In this case, is there a chance that you can get your market share back?
Forecasting techniques provide answers to these questions vital questions to your business.

Size your inventories optimally


Time is money. Room is money. So what you want to do is use all means at your disposal in order
to reduce your stocks without experiencing any shortages, of course.

xvi
How? By forecasting!
Forecasting is designed to help decision making and planning in the present. Forecasts empower
people because their use implies that we can modify variables now to alter (or be prepared for)
the future. A prediction is an invitation to introduce change into a system. There are several
assumptions about forecasting:
There is no way to state what the future will be with complete certainty. Regardless of the

methods that we use there will always be an element of uncertainty until the forecast horizon
has come to pass.

There will always be blind spots in forecasts. We cannot, for example, forecast completely new

technologies for which there are no existing paradigms.

Providing forecasts to policy-makers will help them formulate social policy. The new social

policy, in turn, will affect the future, thus changing the accuracy of the forecast.

STA2604/1

STUDY UNIT 1: An Introduction to Forecasting


1.1 Introduction
Table of outcomes for the study unit
Outcomes - At the end
of the module you
should be able to
- define time series
terms

Assessment

Content

Activities

Feedback

- data plots and


measures

- time series
word list

- experiment
with data

- discuss each
activity

- decompose time
series

- graph, visual

- time series
components

- plot graphs

- critique the
graphs

- calculate time series


measures

- stepwise
exercises

- errors in
forecasting

- various
calculations

If you understand the above outcomes, it will be an indication that you understand this study unit. It
is based on Chapter 1 of the prescribed book.

Forecasting is the scientific process of estimation some aspects of the future in usually unknown
situations. Prediction is a similar, but is more general term. Both can refer to estimation of time
series, cross-sectional or longitudinal data. Usage can differ between areas of application: for
example in hydrology, the terms "forecast" and "forecasting" are sometimes reserved for estimates of
values at certain specific future times, while the term "prediction" is used for more general estimates,
such as the number of times floods will occur over a long period. It is essential that one notes
the emphasis that in this module, forecasting also envelops that it is scientific. This is to ensure
that we do not consider subjective predictions and spiritual prophecies as part of our scope for this
forecasting module. Risk and uncertainty are central to forecasting and prediction. Forecasting
is used in the practice of Customer Demand Planning in every day business forecasting for
manufacturing companies. The discipline of demand planning, also sometimes referred to as supply
chain forecasting, embraces both statistical forecasting and a consensus process. Forecasting is
commonly used in discussion of time-series data. In this module the terms are fairly straightforward
from the prescribed book.

Forecasting has application in many situations:


Supply chain management - Forecasting can be used in Supply Chain Management to make sure

that the right product is at the right place at the right time. Accurate forecasting will help retailers

2
reduce excess inventory and therefore increase profit margin. Accurate forecasting will also help
them meet consumer demand.
Weather forecasting, Flood forecasting, and Metereology
Transport planning and Transport forecasting
Economic forecasting
Egain forecasting
Technology forecasting
Earthquake forecasting
Land use forecasting
Product forecasting
Player and team performance in sports
Telecommunications forecasting
Political forecasting
Sales forecasting

ACTIVITY 1.1
Consider the terms forecasting, cross-sectional data and time series, which are the main focus
of this study unit.
(a) Attempt to define these terms.
(b) Check the definitions in the book and compare your answers in (a).

Before we discuss the above activity, start by reading slowly through the following discussion. Make
sure you follow the discussion.

1.1.1 Forecasting
Study section 1.1 on page 2 up to the second bullet on page 3.
The few people with whom we discussed the term forecastingseemed to have an understanding
of the concept only in a nutshell. Many of them made reference to the weather forecast that
was presented on radio, television and the internet. A gap existed in the main understanding of
forecasting.
Various backgrounds exist that show that at every point in time when people lived, they were always
interested in the future. There are stories from history that inform us that when people dreamed,

STA2604/1

there were experts to explain the meanings of these dreams in terms of the future. When signs of
future drought arose, the implications of the drought were noted and plans were made to offsets the
impacts that were anticipated. Drought led to hunger. Thus, when predictions were made that there
was drought coming, preparations were made that at the time of the drought, there would be enough
food for every member of the community during the duration of the drought. Predicting the future
even as it was done during those days can be referred to as forecasting. The predicted future was
then used to plan for the future as explained above.
Modern practice has encouraged that the "anticipation of the furture" practice be conceptualised.
It was then formally termed forecasting. The current approaches are scientific in order to ensure
that forecasting is practised systematically. The predictions made are now called forecasts. In other
terms, forecasts are future expectations based on scientific guidelines.
DISCUSSION OF ACTIVITY 1.1
The first term we listed in Activity 1.1 was forecasting. Did you get that? The term forecasting is a
natural operation. We have always done it, sometimes unconsciously. As was explained, predicting
activities has always been practised, even in ancient times. For self-evaluation in terms of the time
series concept, did you define the term forecasting in line with predicting the future?
Forecasting indicates more or less what to expect in the future. Once the future is known, preparation
for equitable allocation of resources can be made. Wastages can thus be reduced or eliminated and
gains can be enhanced (or increased).
FURTHER DISCUSSION ON FORECASTING
Forecasting is applied in various real-life situations. Six examples of applications are listed on pages
2 and 3 of the prescribed book. We are close to them at different levels. But what about something
that we as students of the University of South Africa can appreciate?
The number of student enrolments at Unisa is the starting point. The trend pattern will give an
indication of whether there has been a decline or growth in the student numbers over the years. If
you are observant, you will realise that there has been an increase in student numbers over the past
few years. Our forecast for next year (2013) is that there will be more students than in 2012.
ACTIVITY 1.2
Weather forecasting was mentioned as a known example where forecasting is used abundantly.
There are many others.
(a) Provide an easy example of a situation where forecasting is needed.
(b) Attempt to explain the details of the example you provided in (a).

DISCUSSION OF ACTIVITY 1.2


We discussed the Unisa example. If you are interested in Southern African politics and elections you
will be interested in making predictions about political parties that are going to be in the forefront in

4
the next election. We might anticipate extreme growth of one party (MDC) and decline of others in
Zimbabwe, based on the trends in the previous elections and developments that prevail. Therefore,
(a) one can for example predict how the political parties will perform in the next election; and
(b) recent performance of the various parties in previous elections may be revisited and analysed,
the current activities of the parties may be analysed closely and one may interact with people to
determine their impressions about various parties.
N.B.: Here we assume normal election conditions where no intimidation and harassments take place.

1.1.2 Data
For this topic you need to study from the middle paragraph of page 3 to the end of page 4.
Data are important for forecasting. Quality data, which loosely refer to reliable and valid data, are the
ones needed for forecasting. We may be misled if we use data of poor quality because results are
likely to be poor as well, even if best methods are used by a proficient analyst. The term data refers
to groups of information that represent the qualitative or quantitative attributes of a variable or set of
variables. Data (plural of "datum", which is seldomly used) are typically the results of measurements
and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed
as the lowest level of abstraction from which information and knowledge are derived. Raw data refers
to a collection of numbers, characters, images or other outputs from devices that collect information
to convert physical quantities into symbols, that are unprocessed.
Without data there will not be forecasting. However, it is important that data be correct (reliable, valid,
realistic, etc). Data need to be both valid for the exercise, and be reliable. If one of these is missed,
then be warned that your forecasts may mislead you or any user. Also, collection of data may
be inadequate to help in supporting the reasoning behind some findings. Experience shows that
when data are collected under certain contexts, explanations and contexts become clearer when
findings are associated with those contexts. Thus, if you assist in data collection of time series or
any statistical data, whenever possible, advise on the inclusion of details of the occurrences of the
data. Giving details around happenings assists in reducing the extent of making assumptions which
may sometimes be incorrect.
The type of information used in forecasting determines the quality of the forecasts. Not all of us like
boxing, but let us discuss the next scenario. Imagine that two boxers were going to fight on the next
Saturday. We were required to make a prediction in order to win a million rand competition. Many
participants looked at the past records of these boxers. They were informed that in the previous
seven years boxer Kangaroo Gumbu had won 25 out of 27 fights while boxer Boetie Blood had won
22 of the 30 fights he had in the same period. Gumbu was known for winning well while Blood had

STA2604/1

lost dismally in a recent fight. Let us pause and enjoy the predictions (forecasts) made, just to make
a good point..
ACTIVITY 1.3
Either as a person interested in boxing or someone hoping to win the money, you may be tempted to
take a chance at the answer. Make a prediction of the outcome of the fight based on the explanation
given.

DISCUSSION OF ACTIVITY 1.3


Let us determine the odds as statisticians. Using frequencies, Gumbu had a probability of 0.93 of
winning the fight while Blood had probability of 0.73 of winning the fight. On the basis of these odds,
many participants predicted that Gumbu was going to win.
Do you know how the probabilities 0.93 and 0.73 have been obtained? If it is not clear, divide the
number of successes (wins) of each boxer by the total number of fights that each boxer had fought.
The data given were based on certain assumptions. Among others, there was the impression that
the opponents of the two boxers were of the same quality. If they were not, then the prediction would
be carrying some inaccuracies. Among other omissions, we were not told that the boxing bout
was going to be held in the catchweight division, where boxers came from different weight divisions
and could not both fall within a single previously defined weight division. Blood had fought only
world-class opponents and came from two weight divisions heavier than the weight to which Gumbu
belonged. That is, there was a difference between the original weights of the two boxers. Gumbu,
on the other hand, was a boxer who talked too much. He had fought some mediocre opponents and
wanted to pretend he was an excellent boxer. He had asked for the fight. In insisting on the fight,
he had called Blood a coward until the bout was sanctioned. At the time he was preparing for an
elimination bout in his weight division after which he was going to fight for a world title if he won.
The planned elimination bout was probably going to be the first real test for Gumbu as a professional
fighter. It was going to come after I am done with Blood, boasted Gumbu.
In the street some people were predicting that Gumbu was going to lose, but they did not bet as
money was required. None of those who paid to enter the competition predicted correctly. The fight
ended with a first-round knockout. Blood was the winner. Gumbu was no match.

DISCUSSION OF THE BOXING SCENARIO


The records given were correct, but not complete. Records are past data. We need complete
data and the exact context in which they occurred in order to be able to make accurate forecasts.
The analyses that were made about the boxers were correct, but some assumptions were wrong.
Assumptions are used to build cases, and methods are developed on conditions that are given as

6
assumptions. Wrong assumptions may lead to inappropriate methods for data analysis. In cases
where information can be found to limit the use of assumptions, this should be done. However, many
cases provide inadequate information, leaving us with no choice but to depend on assumptions.
Analysis should depend on reasonable assumptions. If in actual practice assumptions are made
for the sake of doing something, decisions and results reached may lead to improper actions. The
analyst should learn the art of making appropriate or reasonable assumptions.
In the case of the example/scenario given, the details were missing, such as that the two boxers were
of different weights. If we knew, this would have helped in our analysis. Sometimes in predicting
about forthcoming games, one needs to also know the quality of opposition that the two opponents
have met in the accumulation of their records. This was also missing in the example. We will insist
on use of the valid assumptions because as we saw, wrong or invalid assumptions are likely to give
inaccurate predictions. The paragraph after the last bullet of the prescribed book on page 3 explains
possible repercussions that come with the wrong assumptions (Bowerman, 2005: 3).
Types of data that are common in real life are cross-sectional data and time series data. Study
the definition of cross-sectional data in the rectangle on page 3. Cross-sectional data refers to data
collected by observing many subjects (such as individuals, firms or countries/regions) at the same
point of time, or without regard to differences in time. Analysis of cross-sectional data usually consists
of comparing the differences among the subjects. For example, we want to measure current obesity
levels in a population. We could draw a sample of 1,000 people randomly from that population (also
known as a cross section of that population), measure their weight and height, and calculate what
percentage of that sample is categorized as obese. Even though we may analyse cross-sectional
data for quality forecasts, in this module we use time series data.

STA2604/1

Study the definition of time series on page 4.


We will have to be careful when we collect time series data. If the data are listed without time
specification, then we should consider the data to be time series.
SCENARIO
Read the following scenario carefully and make notes as we will keep on referring back to it.
Suppose that Jabulani is a milk salesperson during the week, serving the Florida, Muckleneuk and
VUDEC UNISA campuses. Very fortunately for Jabulani, his milk cows increased and his market
in these campuses also increased from year to year. Jabulanis business runs from Mondays to
Sundays. (In a time series analysis a typical question would be: what can we say about the trend
of the sales?) Asked differently: should we believe that the sales have a decreasing or increasing
trend? It will be clear later on that the sales levels differ according to days, high on some days and
low on others. The pattern of low sales or high sales on different days have an important connotation
in time series analysis. This will be discussed.

8
ACTIVITY 1.4
You have done some first-year statistics modules/courses and some of you did mathematics modules
as well. Let us consider the following data sets and look at them quite closely.
Data set 1.1

16
18
21
24

14
15
15
17

19
21
20
24

26
24
27
31

Data set 1.2

16
14
19
26
11
24
10

18
15
21
24
12
21
9

21
15
20
27
13
25
11

24
17
24
31
14
27
13

11
12
13
14

24
21
25
27

10
9
11
13

(a) The two data sets have exactly the same numbers. There is something strange about their
appearances though. Compare the two data sets.
(b) Can these two data sets be classified as time series data sets? Explain.

DISCUSSION OF ACTIVITY 1.4


On whether data are time series or not
When information about the data presented is limited, there also tends to be a limited feedback from
an analysis made from them. You probably realised that the rows of data set 1.1 are the same as the
columns of data set 1.2 and vice versa. Or, in short, that the data sets are transposes of each other.
The data in their current form cannot be classified as time series data since no chronological pattern
of the time at which they were collected is given. This will become clearer as we proceed.
Discussion
The data above do not necessarily represent time series data, but it can be presented in another way
to form time series data - provided they were collected chronologically over regular time intervals.
Suppose data set 1.1 represents the sales of milk sold by Jabulani from Monday to Sunday for four
weeks. Let 1 = Monday, 2 = Tuesday, ..., 7 = Sunday as given in data set 1.3. The data sets should
therefore be presented as follows:
Data set 1.3

Litres of milk sold by Jabulani

Week

1
2
3
4

1
16
18
21
24

2
14
15
15
17

3
19
21
20
24

Day
4
26
24
27
31

5
11
12
13
14

6
24
21
25
27

7
10
9
11
13

STA2604/1

We emphasise that in the initial presentation there was simply no information to explain or
demonstrate the chronological sequence with respect to time and that the data were therefore not
time series data.
ACTIVITY 1.5
You are required to use graphs in addition to other methods to detect patterns in time series data.
Graphical plots reveal information visually, but cannot always be done with ease. The example
that follows, is one of the easy cases where we can draw graphical plots. Analyse the data about
Jabulanis business by answering the following questions. Make any comments that you believe are
relevant.
(a) Are they time series data? Justify your answer.
(b) Plot the data to reveal the pattern using the following approaches:
(i) Plot the data for each week separately.
(ii) Plot the data of all the weeks in one graphical display.
(iii) Compare the shapes of the graphs.
(c) Which plot provides us with a better idea of comparison?
DISCUSSION OF ACTIVITY 1.5
The emphasis about whether data sets form time series or not, depends entirely on the form, which
is the chronological order in which the various data points should be presented. Did you answer
"yes" in question (a)? If not, what did you reveal? How did you reveal it?
(b) Graphs of the activity
(i) Graphs for separate weeks
Week 2
30

25

25
Litres of milk

Litres of milk

Week 1
30

20
15
10

20
15
10
5

0
1

Week 4

Week 3
35

30

30

25

Litres of milk

Litres of m ilk

4
Days

Days

20
15
10
5

25
20
15
10
5
0

0
1

4
Days

4
Days

10

(ii) Graph for data of all the weeks

35

Litres of milk

30
25

Week 1

20

Week 2

15

Week 3
Week 4

10
5
0
1

Days

(iii) In terms of the pattern, the graphs reveal that milk sales were highest on Thursdays, Saturdays
and Wednesdays (in order from highest to lowest). The lowest sales were revealed for
Sundays, Fridays, Tuesdays and Mondays (in the order from lowest to highest).
(c) The graphs can be difficult to compare when they are on separate systems of axes. The last
graph makes comparison very easy, revealing that the patterns for all four weeks are similar.
The patterns of the highest activity and lowest activity about a phenomenon are important in time
series. Jabulani will easily know when he does more business, when he does least business and he
can plan to find better ways to improve business. Let us start formalising these patterns.

1.1.3 Components of a time series


The components of a time series serve as the building blocks of a time series and describe its pattern
(study p. 5-7 of textbook up to the end of section 1.2).
Components are important because they enable us to see the salient features of a structure. Through
them we can make descriptions of what we need to analyse. When we deal with something that we
can describe, we are better able to know the requirements for dealing with it. Time series also has
components that need to be considered and taken care of in their analyses.
Trend
The first component we discuss is trend. The term trend is about long-term decline or growth of
an activity. It is defined formally as the upward and downward movements that characterise a time
series over a period of time.

11

STA2604/1

Time series data may show upward trend or downward trend for a period of years. This may be
due to factors such as increase in population, change in technological progress, large scale shift in
consumers demands, and so on. For example, population increases over a period of time, price
increases over a period of years, production of goods on the capital market of the country increases
over a period of years. These are the examples of upward trend. The sales of a commodity may
decrease over a period of time because of better products coming to the market. This is an example
of declining trend or downward trend. The increase or decrease in the movements of a time series
is called trend.
Usually one would not be able to determine from looking at the data whether there is a decreasing
or increasing trend. There are times (but rarely) when we can see the pattern by inspection. Often a
graphical plot clearly shows the trend. The trend may be given in shapes such as linear, exponential,
logarithmic, polynomial, power function, quadratic, and other forms. In general, we use the graphical
displays to find out if there is a decline or increase in the activity. Some examples of trend applications
that we must look at are given on page 5 of Bowerman et al. (2005). Study them.
- Technological changes in the industry
Currently, companies increase ICT usage in their activities for competitive edge over those that do
not incorporate it. Institutions of higher learning have aggressively incorporated ICT in facilitating
learning, especially the distance education ones.
- Changes in consumer tastes
Housing is very expensive and scarce, but for obvious reasons remains a priority for households.
Recently, cities such as Cape Town, Durban, East London, Johannesburg, Port Elizabeth and
Pretoria have experienced a high influx of people from other areas, and employment is biased
towards the youth. As a result housing in these cities is biased towards townhouses and flats.
- Increases in total population
There is an increase since there are more births than deaths. In SA, there is also an influx of
people from other countries. In other countries, natural deaths and deaths that resulted from
holocausts, wars, terrorism and natural disasters such as the tsunami and others, have resulted
in many deaths but much fewer deaths than the births that have occurred over the years. That is
why there is an increase in the worlds population.
- Market growth
In Gauteng, the market of umbrellas decreases in the period April to July. During the rainy season,
which in Gauteng happens to be the summer season, the sales of umbrellas increase.
- Inflation or deflation (price changes)
If we consider one item for simplicity, maize is produced in the period October to May,

12
approximately. During entry period, the price of maize is high because there are more people
looking for a less available commodity. During the periods November to January, maize is in
abundance and the prices drop. As the production level declines, the prices start increasing
again.
ACTIVITY 1.6
Discuss what a time series is, and discuss the meaning of trend effects, seasonal variations, cyclical
variations, and irregular effects.
DISCUSSION OF ACTIVITY 1.6
You should mention a sequence of observations of a variable presented in chronological form
when you describe a time series. Trend should imply a long-term tendency of that time series.
Seasonality should include a periodic pattern in the data. Describing cycles should imply up and
down movements of observations around trend levels. Irregular pattern is the portion of the time
series which cannot be accounted for by the three patterns discussed above.
Exploration data set
The next data set is important for exploration. ENJOY IT. It represents the litres of milk that were
demanded from Jabulani. Whether there was stock or not is not an issue here. The data set will be
revisited time and again.
Data set 1.4

Day
1
2
3
4
5
6
7
1 16 14 19 26 11 24 10
Week 2 18 15 21 24 12 21
9
3 21 15 20 27 13 25 11
4 24 17 24 31 14 27 13
In general, methods of forecasting that depend on non-numeric information are qualitative forecasting
methods. (Do you remember this from first-year Statistics?) Qualitative data are nominal/words data.
Quantitative forecasting methods on the other hand depend on numerical data.
Bowerman et al. (2005: 7) present a graphical plot Figure 1.1 (a) to display an example of a trend
in a time series. There is no trend line to describe the trend, but can you explain whether there is a
decreasing or increasing trend in the plot to which we are referring?
Cycle
The next component of time series that we discuss is cycle. When trends have been identified, there
may be some recurring up and down movements visible around trend levels. These movements are
called cycles. Cycles occur over long and medium terms. Page 5 of Bowerman et al. (2005) presents
this component.
Some interesting explanation is presented by Bowerman et al. (2005: 5) about business cycles.
Study it in detail. Bowerman et al. (2005: 7) present Figure 1.1 (c) to display an example of a cycle

13

STA2604/1

in a time series. We need to note that generally, natural occurrences have shown some cyclical
patterns over the years.
The impact of cycles on a time series is either to stimulate or depress its activity, but in general,
their causes are difficult to identify and explain. Certain actions by institutions such as government,
trade unions, world organisations, and so on, can induce levels of pessimism and optimism into the
economy which are reflected in changes in the time series levels. Economic indices are usually used
to describe cyclical fluctuations.
Cyclical variations are recurrent upward or downward movements in a time series but the period of
cycle is greater than a year. This restriction makes it different from trend. Also, cyclical variations
are not regular as seasonal variation. There are different types of cycles of varying in length and
size. The ups and downs in business activities are the effects of cyclical variation. A business
cycle showing these oscillatory movements has to pass through four phases-prosperity, recession,
depression and recovery. In a business, these four phases are completed by passing one to another
in this order. Together, they form a cycle.
Cycles are useful in long-term forecasting. Usually it means centuries and millenniums. Our
capabilities and interest in this module do not require us to look beyond a decade. Hence, methods
for developing forecasts that include cycles (or cyclical components) are not in the scope of this
module. However, you still need to understand when cycles are discussed or implied in a forecasting
situation.
Seasonality
The example about milk is given over weekly periods. The definition given by Bowerman et al. (2005:
6) is somewhat misleading! The impression it gives is that observations being investigated, must run
over a year. This is simply not the case. Even the values occurring within a day can be seen to be
seasonal, as you will soon see. First, we provide a more useful and realistic definition of seasonality,
which will be used in the module. The one given in Bowerman et al. shall work when the periods are
over yearly periods. Let us define the concept in the next line:
Seasonal variations are systematic variations that occur within a period and which are tied to some
properties of that period. They are repeated within the period. They are indeed periodic patterns in a
time series that complete themselves within a calendar period and are repeated on the basis of that
period.
Seasonal variations are short-term fluctuations in a time series which occur periodically in a period,
such as a year. In this case it would continue to be repeated year after year. The major factors that
are responsible for the repetitive pattern of seasonal variations are weather conditions and customs
of people. More woolen clothes are sold in winter than in the season of summer. Regardless of the
trend we can observe that in each year more ice creams are sold in summer and very little in winter

14
season. The sales in the departmental stores are more during festive seasons that in the normal
days.
Irregular fluctuations
We have not mentioned whether Jabulani was ever robbed of his revenue or stock for his business.
Now we are giving you bad news.
Irregular fluctuations are variations in time series that are short in duration, erratic in nature and
follow no regularity in the occurrence pattern. These variations are also referred to as residual
variations since by definition they represent what is left out in a time series after trend, cyclical and
seasonal variations have been accounted for. Irregular fluctuations results due to the occurrence of
unforeseen events like floods, earthquakes, wars, famines, and so on.
Remember that Jabulani was a smart entrepreneur who would make some estimations of revenue
each morning he left for work. One Tuesday afternoon after he had counted what he thought was his
revenue for the day, he was robbed by two thugs. Fortunately he was neither hurt nor discouraged
to continue with his business. It was happening for the first time. Could he have anticipated being
robbed on that day? We also could not have predicted that event.
The point is, that irregular event changed what could have been the revenue and/or profit for that
day. In time series, irregular fluctuations, which are also called irregular variations, refer to random
fluctuations that are attributed to unpredictable occurrences. Bowerman et al. (2005: 6) appropriately
define them as erratic movements in a time series that follow no recognisable or regular pattern. The
presentation about this concept simply implies that these patterns cannot be accounted for. They
are once-off events. Examples are natural disasters (such as fires, droughts, floods) or man-made
disasters (strikes, boycotts, accidents, acts of violence and so on).
Note that all the components of a time series influence the time series and can occur in any
combination.

The most important problem to be solved in forecasting is trying to match the

appropriate model to the pattern of the time series data.

1.1.4 Applications of forecasting


Forecasting has application in many situations. Among others, it can be applied in:
Supply chain management - Forecasting can be used in Supply Chain Management to make sure

that the right product is at the right place at the right time. Accurate forecasting will help retailers
reduce excess inventory and therefore increase profit margin. Accurate forecasting will also help
them meet consumer demand.

Weather forecasting, Flood forecasting and Meteorology

15

STA2604/1

Transport planning and Transportation forecasting


Economic forecasting
Technology forecasting
Earthquake prediction
Land use forecasting
Product forecasting
Player and team performance in sports
Telecommunications forecasting
Political Forecasting

1.2 Forecasting Methods


This topic is discussed on pages 7 to 12. Study these pages. On page 7 there is a reminder that
there is no single best forecasting method. There are, however, appropriate methods for any time
series situation. The forecasting methods are described along the same line as types of data that you
dealt with in your Statistics courses/modules at first year level. They are qualitative and quantitative
in nature.

1.2.1 Qualitative methods


Study this topic from page 8 to page 11.
The textbook explains on page 8 that generally, qualitative forecasting methods become an option
to develop forecasts in situations where there are no historical numeric data or where time series
trained statisticians are not available. Opinions of experts are generally used to make predictions
in such cases. Predictions are necessary in all situations, even where there is no data. When this
occurs, qualitative methods are involved.

Common examples of qualitative forecasting methods are judgemental methods.

Judgmental

forecasting methods incorporate intuitive judgements, opinions and subjective probability estimates.
Composite forecasts
Surveys
Delphi method
Scenario building
Technology forecasting
Forecast by analogy

16
You do not need to learn more about these for the requirements of this module. However, you
may come across them in applications. Hence, your encounter with them may be of help in future
applications.

1.2.2 Quantitative methods


Quantitative forecasting methods are used (and only possible) when historical data that occur in
numeric form are available. These methods may occur as univariate forecasting methods or as
causal methods (Bowerman et al., 2005: 11).
Univariate forecasting methods depend only on past values of the time series to predict future
values. In this method, data patterns are identified from historical data, the assumption is made
that the patterns will continue in the future and then the pattern is extrapolated in order to develop
forecasts. Study this topic on page 11.

Causal forecasting models, start by identifying variables that are related to the one to be predicted.
This is followed by forming a statistical model that describes the relationship between these
variables and the variable to be forecasted. The common ones are regression models and ordinary
polynomials. Study this topic on page 11.
In the causal forecasting method, the variable of interest, which is the one whose forecasts are
required, depends on other variables. It is thus the dependent variable. The ones on which the
variable of interest depends are known as the independent variables.
Discussion about dependence/independence
Note that Jabulanis customers are mostly people who received wages on a weekly basis. Some are
paid on Saturday afternoon, but an overwhelming majority is paid on Friday afternoon. In addition,
on Saturday afternoon, there is an item P that is also liked by many milk buyers. If item P is available
before milk arrives, then this item is bought in large quantities, leaving limited disposable income for
the milk purchases. Fortunately for Jabulani, he has in the past four weeks, managed to deliver milk
before item P was delivered. However, most of the buyers who are paid on Saturday tend to meet
the P seller before their milk purchases on Sunday morning.

It is necessary to understand dependencies and correlations when dealing with forecasting. If you fail
to understand them, you may fall in the trap of making wrong assumptions because influences that
may affect your forecasts and constraints coming with correlated variables may lead to developing
inaccurate models and thus leading to wrong forecasts.
Useful common examples are time series and causal methods. There are others as well, but the
following may be of help in your development.

17

STA2604/1

Time series methods


Time series methods use historical data as the basis of estimating future outcomes.

Rolling forecast is a projection into the future based on past performances, routinely updated

Moving average

Extrapolation

Trend estimation

on a regular schedule to incorporate data.

Exponential smoothing

Linear prediction

Growth curve

Causal / econometric methods


Some forecasting methods use the assumption that it is possible to identify the underlying factors
that might influence the variable that is being forecasted. For example, sales of umbrellas might
be associated with weather conditions. If the causes are understood, projections of the influencing
variables can be made and used in the forecast.

Regression analysis using linear regression or non-linear regression

Autoregressive integrated moving average (ARIMA), e.g. Box-Jenkins

Autoregressive moving average (ARMA)

Econometrics

Other methods

Simulation

Probabilistic forecasting and ensemble forecasting

Prediction market

Reference class forecasting

These methods are given to you so that when you make references from other forecasting sources,
you will be able to understand where they belong in your module. However, they are not necessarily
required to the extent that is presented in those other sources.
ACTIVITY 1.7
Do you see any dependence of the variables?

Hint: Focus on milk purchases and disposable income.

18
DISCUSSION OF ACTIVITY 1.7
Keeping to the hint, the purchase of an item that is in high demand depends on the availability of
disposable income.

ACTIVITY 1.8
(a) Classify the milk sales in the latest scenario as a dependent or independent variable.
(b) Explain your choice in (a) above. Here confine your response to milk purchases and disposable
income.
(c) Identify the dependent variable and the independent variable.

DISCUSSION OF ACTIVITY 1.8


Regarding (a), milk sales depend on the availability of disposable income. Hence, (b) milk sales
represent the dependent variable. This leads to (c) that sales are the dependent variable and
disposable income is the independent variable.

1.3 Errors in forecasting and forecast accuracy


When it was said that the pattern of information given, such as Jabulanis milk sales, can help you
make future predictions, no one said your predictions would be perfect.
It is time to note that if the forecasts prepared/developed are not accurate, they may be useless since
they are probably going to mislead the user. When we insist on a scientific method in forecasting, it
was to ensure that we can monitor the methods and test the models so that the inaccuracies in them
are reduced, or ideally, eliminated.
It is important to know the likely errors when you attempt to make predictions or develop forecasts.
If you know them, you can avoid or minimise them. Error is as simple as when you thought Jabulani
was going to sell 500 litres in a specific week and he ends up selling 520 litres. (Note that you could
make an error in litres of milk by overestimating as well.)
The next sections require your learned skill of drawing graphs and interpreting them.

The

most common ones you should expect to encounter (draw and interpret) are scatter diagram (or
scatterplot) and time plot. Revise them if you have already forgotten how they are drawn.
Further, you are soon going to engage in a number of calculations. Thus, ensure that you are
ready to perform them, and that you remember descriptive statistics your learnt in your early years

19

STA2604/1

of Statistics. It is also very important to be able to know why the calculations are necessary in any
exercise of building a forecast model.
Bowerman et al. (2005: 12) name two types of forecasts, the point forecast and the prediction
interval. A point forecast is a single number that estimates the actual observation. A prediction
interval is a range of values that gives us some confidence that the actual value is contained in the
interval.
The forecast error as defined in Bowerman et al. (2005: 13) requires that the estimate be found and
be paired with the actual observation.
In statistics, a forecast error is the difference between the actual or real and the predicted or forecast
value of a time series or any other phenomenon of interest. In simple cases, a forecast is compared
with an outcome at a single time-point and a summary of forecast errors is constructed over a
collection of such time-points. Here the forecast may be assessed using the difference or using
a proportional error. By convention, the error is defined using the value of the outcome minus the
value of the forecast. In other cases, a forecast may consist of predicted values over a number of
lead-times; in this case an assessment of forecast error may need to consider more general ways of
assessing the match between the time-profiles of the forecast and the outcome. If a main application
of the forecast is to predict when certain thresholds will be crossed, one possible way of assessing
the forecast is to use the timing-errorthe difference in time between when the outcome crosses
the threshold and when the forecast does so. When there is interest in the maximum value being
reached, assessment of forecasts can be done using any of:

the difference of times of the peaks;

the difference in the peak values in the forecast and outcome;

the difference between the peak value of the outcome and the value forecast for that time point.
Forecast error can be a calendar forecast error or a cross-sectional forecast error, when we want to
summarize the forecast error over a group of units. If we observe the average forecast error for a
time-series of forecasts for the same product or phenomenon, then we call this a calendar forecast
error or time-series forecast error. If we observe this for multiple products for the same period, then
this is a cross-sectional performance error.
To calculate the forecast errors we subtract the estimates (
yi ) from the actual observation (yi ). The
difference is the forecast error. Can you tell what the values of the forecast errors imply? For
example, some may be smaller than others, some negative and others positive!
When Jabulani plans his sales, he makes some estimation of litres of milk that he hopes to sell. In
yi ):
Week 3 prior to getting to the market, he had made the following estimations (

20
Week
3

Day
1
2
3
4
5
6
7

Litres of milk estimated (


y)
27
11
20
26
14
22
9

Remember to refer to the appropriate week of the table of Data set 1.4 for observed values (yi ).
ACTIVITY 1.9
(a) On which days were there overestimation?
(b) On which days were there underestimation?
(c) Calculate the forecast errors for these estimates.
(d) Identify the day on which the milk sales were most disappointing! Explain.
(e) On which day did he make the best prediction? Why?

DISCUSSION OF ACTIVITY 1.9


We have not defined the terms overestimation and underestimation formally. They have been
defined in other modules, but we wish to make a reminder. If you make a prediction and the actual
observation turns out to be smaller, we will have overestimated. What is the sign of the forecast
error? Can you now define the term underestimation? What about the sign of the forecast error?
Let us get into the questions of the activity. The setup of week 3 is as follows:
Actual observations (y1 )
Estimates observations (
y1 )

21
27

15
11

20
20

27
26

13
14

25
22

11
9

(a) Overestimations are visible after pairing by observing the pairs in which the actual observations
are lower than the estimates. These were on Day 1 and Day 5.
(b) Underestimations occurred on Day 2, Day 4, Day 6 and Day 7.
(c) The forecast errors are 6, 4, 0, 1, 1, 3 and 2 for the seven days, respectively.
(d) Day 1 was the most disappointing. This is because Jabulani expected to sell 27 litres but only
sold 21 litres. It is the day he made the biggest loss, that is with the largest negative error.
(e) He made the best prediction on Day 3, where the sales were equal to the estimates.

If there was no day when the sales and estimates were equal, then the day with the smallest forecast
error in absolute value would have been the one on which the best prediction was made. This means
that Day 4 and Day 5 are the days on which good predictions were made. However, we note that

21

STA2604/1

Day 5 was not a happy day for the seller because some stock was left unsold whereas on Day 4, all
stock was sold and one customer did not get milk.
Examining the forecast errors over time provides some information on the accuracy of the estimates.
- Random forecast errors demonstrate that patterns that existed in the data were considered when
the estimates were made (Figure 1.5 (a), Bowerman et al., 2005: 14).
- If there is an increasing (or decreasing) trend, and in making an estimation this trend was not
taken care of, then the scatter plot of forecast errors would reveal an increasing (or decreasing)
trend. On Figure 1.5 (b) of Bowerman et al., (2005: 14) an example is shown of a forecast error
plot that did not account for an increasing trend.
- If estimates of seasonal data did not account for seasonality, the scatter plot of forecast errors
would reveal the seasonal pattern that was not taken care of (Figure 1.5 (c), Bowerman et al.;
(2005: 14)).
- Similar arguments hold for cyclical data. In Bowerman et al. (2005: 14) Figure 1.5 (d) shows a
forecast error plot that did not account for cycles.

ACTIVITY 1.10
(a) Plot the forecast errors calculated in Activity 1.9.
(b) Do the data reveal any pattern that was not accounted for?

DISCUSSION OF ACTIVITY 1.10


(a) The plot is not difficult to draw. The forecast errors to be used were calculated in Activity 1.9. They
are
Forecast errors (e1 )

Plot of forecast errors of Activity 1.9

8
6
4
2
0
-2 0
-4
-6

22
(b) The plot looks almost random. This means that the forecasting technique provides a good fit to
the data.

1.3.1 Absolute deviation


Forecast errors are used to calculate absolute deviations. The absolute deviation (Bowerman et al.,
2005: 15) requires the forecast errors in absolute terms, i.e., a matter of how far is the estimate from
the actual observation.
ACTIVITY 1.11
Calculate the absolute deviations for the estimates in Activity 1.9.
DISCUSSION OF ACTIVITY 1.11
The calculation is fairly straightforward. We need the forecast errors, which were calculated as
Forecast errors (e1 )

The absolute deviations are the absolute values of the forecast errors, which we can recall from our
high-school days. The absolute deviations are thus
Absolute deviations (|e1 |)

1.3.2 Mean absolute deviation


The absolute deviations give us the mean absolute deviation (MAD) when we obtain their average in
the usual way. The MAD (Bowerman et al., 2005: 15) requires the following steps: take the absolute
deviations, add them, divide the sum by their number and the result in the MAD.

ACTIVITY 1.12
Calculate the MAD for the estimates in Activity 1.9.

DISCUSSION OF ACTIVITY 1.12


Absolute deviations (|ei |)

The MAD is therefore


7

M AD =
=

i=1

|ei |

n
17
7

= 2.42857.

23

STA2604/1

1.3.3 Squared error


Another way to get rid of positive and negative errors is squared errors (Bowerman et al. (2005: 15)).

ACTIVITY 1.13
Calculate the squared errors for the estimates in Activity 1.9.

DISCUSSION OF ACTIVITY 1.13


Forecast errors (ei )

36

16

The squared errors are therefore


Squared errors e2i

24

1.3.4 Mean squared error


The MSE is the average of the squared errors.

ACTIVITY 1.14
Calculate the MSE for the estimates in Activity 1.9.

DISCUSSION OF ACTIVITY 1.14


To calculate the MSE we need the squared errors, which were calculated as
Squared errors e2i

36

16

24

The MSE is therefore


7

M SE =
=

i=1

e2i

n
87
7

= 12.42857.

Now, let us pause a little. We have done a few useful calculations. We have also answered a few
questions about errors.

24
Do you recall the value of the forecast error on the day that the estimate was perfect? Do you also
see what is meant by a poor estimate? Now can you say what is meant by a good estimate? You
will recall that the errors need to be as small as possible. So far it is not absolutely clear what small
entails.
The MAD and MSE are the measures that we will use to determine if the errors are small which will
indicate a good model. The objective is to select a good forecast model. The model that will be
selected must produce forecasts that are close to the actual observations. The MAD and the MSE
will serve as our tools to select a forecast model.
We need to understand the MAD and the MSE as they relate to the forecast model. The steps are
as follows:
MAD steps
Calculate forecast errors
Determine absolute deviations
Add the absolute deviations
Divide by their number

MSE steps
Calculate forecast error
Determine squared errors
Add the squared errors
Divide by their number

MAD is not in any way mad. It is an objective route to good forecasting. The MSE serves the same
purpose.
Sometimes the effectiveness of a model is measured in percentages. Such measures are the
absolute percentage error (APE) and the mean absolute percentage error (MAPE) (Bowerman et
al., 2005: 18).

1.3.5 Absolute percentage error (APE)


APE is the absolute error divided by the corresponding actual observation multiplied by 100.

ACTIVITY 1.15
Calculate the APE for the estimates in Activity 1.9.

DISCUSSION OF ACTIVITY 1.15


To calculate the APE we need the absolute errors and the actual observations, which are
Absolute deviations (|ei |)
Actual observations (yi )

6
21

4
15

0
20

1
27

1
13

3
25

2
11

The APE is therefore


AP Ei

28.5714

26.6667

0.00

3.7037

7.6923

12.00

18.1818

25

STA2604/1

1.3.6 Mean absolute percentage error (MAPE)


MAPE is the mean of the APEs. It is defined as
n

AP Ei
M AP E =

i=1

ACTIVITY 1.16
Calculate the MAPE corresponding to the estimates in Activity 1.11.
DISCUSSION OF ACTIVITY 1.16
To calculate the MAPE we need the APE, which are
AP Ei

28.5714

26.6667

0.00

3.7037

7.6923

12.00

18.1818

We obtain
7

AP Ei = 96.8159.
i=1

The MAPE is therefore


M AP E =

96.8159
.
7

= 13.8308.

The intention when measuring the error is to reduce it to monitor and control to increase the accuracy
of these methods.

1.3.7 Forecasting accuracy


This section summarises the errors in forecasting methods presented above and present them as
the level of accuracy achieved. It is important to know that forecast accuracy starts with the forecast
error. As you have seen, the forecast error is the difference between the actual value and the forecast
value for the corresponding period:
et = yt Ft

where e is the forecast error at period t, y is the actual value at period t, and F is the forecast for
period t. The summary of the methods given is given in the next table.

26
Measures of aggregate error:
Mean Absolute Deviation (MAD)

Mean Absolute Percentage Error (MAPE)


Mean squared error (MSE)
Root Mean squared error (RMSE)

|et |
n
et
yt
MAPE =
n
e2t
MSE =
n
e2t
RMSE =
n

MAD =

Please note that business forecasters and practitioners sometimes use different terminology in the
industry. They refer to the PMAD as the MAPE, although they compute this volume weighted MAPE.
Please stick to the textbook notation.

1.4 Choosing a forecasting technique


We have to learn various forecasting techniques as well as to choose one of them during forecasting
for a number of obvious and good reasons. If you know a number of techniques and you do not
know how to decide on the appropriate one, you may end up using an inappropriate one to forecast.
Also, if you know only one method, you will use it in every forecasting exercise even where it is not
suitable. The measures defined in the previous section will be needed in selecting a model.
In this section we discuss important features in the selection process.

1.4.1 Factors to consider


Bowerman et al. (2005: 19-20) list factors that need to be considered when a forecasting method is
selected:
- Time frame
A forecasting method may take a long or short time to develop. The time frames, or the time
horizons, are short, medium, or long.
- Data patterns
The patterns we identified in the earlier discussion are trend, cycle and seasonality. If a forecasting
situation requires a pattern that the method does not take into account, then the method becomes
inappropriate.
- Forecasting cost
Costs could be the money or skills needed to develop a forecasting method. If the cost of

27

STA2604/1

developing forecasts is higher than the benefits, a cheaper method must be used or forecasts
should not be developed. Also, the more complex forecasting methods are more expensive to
develop while simple ones are usually less expensive.
- Desired accuracy
Obviously, it is ideal that forecasts be perfectly accurate. Some situations require the best possible
accuracy level because of their high sensitivity. As an example, life-threatening situations such
as HIV/AIDS, typhoid, cholera and others, due to risk of loss of life, require the best possible
forecasts with superior accuracy.
- Data availability
When there are no numeric data or no detail, we cannot develop quantitative forecasts. Some
situations though, may have limited data, or data of a form that is not required. The forecaster
will have to accommodate the data and choose an appropriate method that will suit the data even
though it is not ideal for the problem. We are warned that forecasting methods give inaccurate
forecasts if inaccurate, outdated or irrelevant data are used to develop the forecasts.
- Convenience
Convenience in this case means the ease of use by the forecaster as well as his understanding of
the method. If the forecaster lacks understanding of the methods he or she uses, then there will
not be much confidence assigned to the forecasts.

ACTIVITY 1.17
Suppose that you are to develop forecasts for the number of tourists using the services of a tourism
organisation in the country. You are given data of the number of tourists using these services for
the years 2002 to 2007, and they have been increasing annually. You also realise from the graphs
provided that in the months of January, March, June and December the tourists used this company
even more.
(a) As a time series specialist you are requested to develop forecasts and the marketing manager
insists on a specific method. How would you react?
(b) Is the pattern of the data clear? Explain.

DISCUSSION OF ACTIVITY 1.17


(a) One should not hesitate to differ from the marketing manager by refusing to use the method he or
she prescribed. When using the method, the user needs to be able to explain the rationale for it.
The marketing manager must give reasons for the choice, and these reasons must be consistent

28
with the time series methodology. The method must be able to account for the high tourism
numbers in January, March, June and December. It must also be able to show the increasing
numbers.
(b) The patterns are clear. The four months with high tourist numbers indicate seasonality while the
increasing numbers indicate an increasing trend.

1.4.2 Strike the balance


The discussion in Bowerman et al. (2005: 20) highlights that the forecasting technique chosen
should balance the factors we discussed. The situation will dictate the weight to be given to the
factors.
ACTIVITY 1.18
Develop a forecast model to predict the milk sales of Jabulanis business (Data set 1.4).
(a) Explain the patterns that exist from the record presented.
Hint: Take note of the seasonality pattern.
(b) If we assume that the display in the past four weeks will recur, can we expect growth in this
business? Explain.

DISCUSSION OF ACTIVITY 1.18


As per the explanations given, the data period is not enough to warrant the existence of cycles. The
irregular component also, by definition, cannot be accounted for. Hence (a) requires examination of
trend and seasonality.
Plots of Date set 1.4
Litre s of m ilk de m a nde d
35

Litres of m ilk

30
25
20
15
10
5
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Days during the observation

Figure (a) Examination of the trend

29

STA2604/1

Here the data for the different weeks were combined so that the trend can be examined. There is an
increasing trend that is demonstrated by the trend line.
Can we determine the rate of increase? Here, the rate of increase is given by the equation of the
trend line. You must be able to show that the equation of the trend line is
y = 0.1571x + 16.365.
Litres of milk demanded
35

Litres of milk

30
25
20
15
10
5
0
Week
1

Week
2

Week
3

Week
4

Weeks in the study

Figure (b) Examination of seasonal patterns


The milk sales are clearly high on Day 4 and Day 6 for all the weeks and low on Day 2, Day 5 and
Day 7. Therefore, from the graph, the seasonal pattern is very evident in the data set.

1.5 An overview of quantitative forecasting techniques


Points on regression analysis
Regression analysis is an important topic that requires adequate attention in the discipline of
statistics. Time series also tends to use regression analysis in model building for some forecasting
problems. We discuss some necessary points here, but for full knowledge on regression analysis, it
is better to enrol for a module in Regression Analysis.
Regression analysis (Bowerman et al., 2005: 21) relates variables through the use of linear
equations. It is a statistical methodology that has a wide range of applications. The variable of
interest, denoted by y , is made the subject of the formula. It is always made the dependent variable
because it is a function of the variables on which it depends. It is also called the response variable
because when anything is done to other variables, the variable of interest behaves in a certain way.
The variables that are related to the response variable are the independent variables that are allowed
to vary within their feasible values. These are the predictor variables often denoted by x1 , x2 , ..., xk .
The objective of regression is to build a regression model which is a prediction equation that relates y
to x1 , x2 , ..., xk . This model is used to describe, predict and control y on the basis of the independent
variables.

30
Depending on the application needed to address a problem, a regression model can use quantitative
independent variables (that assume numbers) or qualitative independent variables (that assume nonnumerical values). Make sure that you understand the exposition on p. 21 of Bowerman.

This module requires full manipulation of simple linear regression models and some applications of
multiple regression models. In addition to the regression models, the scope of the module covers
time series, decomposition methods and exponential smoothing.

1.6 Conclusion
We have acquired useful introductory knowledge for the module. We defined forecasting, explained
its necessity, and explained qualitative and quantitative forecasting methods. Time series data were
discussed, its components explained, errors in forecasting were defined, as well as measures to
detect them. Factors for choosing a forecasting technique were discussed, and use of regression
analysis in forecasting was discussed briefly. As we know the use of exercises, the next exercises
are also intended to make you fit for the tasks ahead.

Self-evaluation exercises
Do exercises 1.1 up to 1.6 on page 25 of Bowerman et al. (2005).
If you encounter any problems with these exercises, do not hesitate to contact your lecturer. Just
indicate what is difficult for you.
You are welcome to discuss your solutions with the lecturer, and you are encouraged to do so by
sending these solutions directly to the lecturer(s) of the module.

31

STA2604/1

STUDY UNIT 2: Model Building and Residual Analysis


2.1 Introduction
In order to do a good job, we need to be well equipped for it. In simple terms we need the knowledge
to do the job as well as the facilities or tools to use. To build a house, we need a good foundation and
we need to be able to construct a wall. The wall would normally require bricks, which are laid solidly
against one another. They are glued by cement. The cement is mixed with specific proportions of
water and sand. The mix requires a specific skill for an effective mix. A mistake in one of these
may lead to bad results, which may reveal itself only some years after construction. Developing a
forecast also requires an amount of knowledge mix. Fortunately for us, when forecasting is done,
there are also some tests or measures to indicate that the forecasts can be trusted. Good forecasts
will represent the actual truth well, with no or minor deviations. On the other hand, bad forecasts
would mislead the forecaster completely.
We need to know the future so that we can plan for it. If you remember the milk sales, Wednesdays
were good days for business and there was almost always more stock of milk to cater for the
increased market. If their predictions were inaccurate, it could happen that there was less stock
when the demand was high.
This study unit focuses on model building and some important aspects of residual analysis. The
main purpose of the unit is to learn to build forecasting models, while residual analysis measures the
accuracy of the model.
Outcomes table for the study unit
Outcomes - At the
end of the module
you should be able
to do the following:
- multicollinearity

Assessment

Content

Activities

Feedback

- analyse
covariance
matrix

- correlations
- variance
inflation

- calculate
- test
hypotheses

- discuss each
activity

- comparison of
regression models

- use selected
measures

- R2 , adjusted
R2 , s
C -statistic

- perform
calculations

- explain
calculations

- residual analysis

- use plots

- residual plots
- assuming
forms

- calculations
- graph
plotting

- link with
the patterns

- diagnostics

- compare
measures
with limits

- leverage
points
- residuals

- calculate
measures
- plot graphs

- calculate
and discuss
measures

32
Where there are concepts that are necessary for us to learn a skill, we will look for the skills wherever
they are in the book. As an example, R2 appears in earlier chapters before Chapter 5. Many of these
concepts were dealt with in first-year Statistics. Fortunately they are all in the prescribed book.
This study unit deals with parts of chapter 5 of Bowerman et al. (2005: 221-278). We will highlight
some of the concepts:
Section 5.1
multicollinearity (pp. 222-226) with reference to the variance inflation factor on p. 224
R2 (pp. 226-227)
adjusted R2 (p. 228)
the standard error s (p. 227)
the C -statistic (p.230)
stepwise regression and backward elimination (pp. 231-235) read for interest sake only, not for

examination purposes

Section 5.2
residual plots (pp. 236-238)
the constant variance assumption (pp. 238-239)
the assumption of correct functional form (pp. 239-240)
the normality assumption (pp. 240-243)
the independence assumption (pp. 242-245)

Section 5.3
can be omitted

Section 5.4
the leverage values (pp. 255-256)
all kinds of residuals (pp. 257-258)
Cooks distance (pp. 258-259)
outlying and influential values (pp. 259-260)

We suggest that you work thoroughly through these pages.

33

STA2604/1

Some explanations
Time series data in this study unit shall consist predominantly of numeric data collected over regular
intervals. Similar to building a house on a good solid foundation, with intact walls and roof, in
forecasting you also need an appropriate framework to use your data wisely and then develop useful
(and not misleading) forecasts.
The four basic steps for this are as follows:
Step 1: Specify a tentative model.
Step 2: Estimate any unknown parameters.
Step 3: Validate the model.
Step 4: Develop the required forecasts.
In forecasting using time series, model building is the foundation. The model is an equation with
unknown parameters. If the parameters are wrong, the model would not provide correct predictions.
In addition, when a statistical analysis of a time series has been completed, we will often find that
there exist relationships between the variables of interest. It is important to know what to do with
these relationships, otherwise we may build models that do not represent the actual pattern of the
activity. The next topic explains this aspect of relationships.

2.2 Multicollinearity
We learnt about the correlation coefficient in first-year Statistics. When more than two variables
are considered, the correlation coefficient is generalised to the correlation matrix. Bowerman et al.
(2005: 223) presents an example of a correlation matrix. We also came across the coefficient
of determination when we studied regression. The correlation coefficient and the coefficient of
determination are useful in measuring multicollinearity.
We know from regression analysis that we may express a variable of interest (dependent variable) as
a function of other variables (independent variables). When two independent variables are related,
there is collinearity. If more than two independent variables are related, there is multicollinearity. An
extreme case of multicollinearity is singularity, in which an independent variable is perfectly predicted
by another independent variable (or more than one). Do you recall the value of the correlation
measure under perfect correlation? Justify your answer.

2.2.1 Clarification of multicollinearity


Study p. 223 carefully to understand the concept of multicollinearity and its numerical implication.

34
ACTIVITY 2.1
Provide an example of a real-life case where multicollinearity can exist.

DISCUSSION OF ACTIVITY 2.1


This seems to be a difficult question at first glance, but no doubt it is very interesting. Let us take an
easy example as follows.
Define:

y=

productivity of the workforce

x1 =

approach used by management in motivating staff

x2 =

training received by staff

Surely, y depends on x1 and x2 . It is put to you that there are no grounds to believe that x1 and
x2 can be correlated. Do you have any counter reflection regarding this assertion? Think of other

examples. Your examples need not be in the form of mathematical equations. They should just get
you thinking.

2.2.2 The variation inflation factor (VIF)


We studied variances in the first year and this gave us an idea of variation. Another topic that we hear
about in economics is inflation. In the current discussion we are not going to discuss economics, just
in case you think it refers to that.
The variance inflation factor (VIF) is a measure we will use to determine the extent of multicollinearity
and is defined on p. 224 of the textbook.

ACTIVITY 2.2
Calculate the VIF for the Wednesday data.
Recall that in Data set 1.4 in Unit 1 we had the following data for Week 3:
Day
yi
yi

1
21
27

2
15
11

3
20
20

4
27
26

5
13
14

6
25
22

7
11
9

Hint: Recall the multiple coefficient of determination (p. 156 of the prescribed book).

DISCUSSION OF ACTIVITY 2.2


We need to have more than one independent variable. In the above case there is only one. The VIF
cannot be defined here. Did the exercise make you think?

35

STA2604/1

Further discussion about VIF


Let us pay more attention to the calculation.
Usually we calculate the regression line of y on x1 , x2 , ..., xj , ..., xn . Suppose we rather regress xj
(dependent variable) on the remaining x-variables, then Rj2 is the multiple coefficient of determination
for the regression model that relates to xj to the other indepdent variables.
V IFj =

explained variation
1
where Rj2 =
.
2
total variation
1 Rj

When we look for possible relationships among the independent variables, independent variables
take turns to assume the role of a dependent variable regressed on the rest of the independent
variables. Then the coefficient of determination is calculated for each independent variable.
In the example under discussion (data tables 4.2 and 5.1 in the textbook), y was regressed on
x1 , x2 , ..., x8 . This means that the focus is on x1 , x2 , ..., x8 .

Let us inspect Table 5.2, page 224 of Bowerman et al. (2004). The eight variables of interest are
displayed on page 222, in the paragraph before Table 5.1. The correlation matrix and the SAS output
where the VIFs appear in the last column are given on pp. 223-224.

ACTIVITY 2.3
Suppose that you are given the following data together with the corresponding estimates.
y
y-estimates

39
36.1

41
33.9

33
37.3

45
40.2

29
31.7

42
38.9

21
34.8

Calculate the coefficient of determination for the data.

DISCUSSION OF ACTIVITY 2.3


We use Excel to perform the calculations. If you have access to a statistical package, you are
welcome to use it.
These values are given:
yi
yi

39
36.1

41
33.9

33
37.3

45
40.2

The sample mean for the actual values (y) is


y = 35.7143.

29
31.7

42
38.9

21
34.8

36
The required squares are
i
(yi y)2
(
yi y)2

1
10.796
0.1488

2
27.939
3.2916

3
7.3673
2.5145

4
86.224
20.122

5
45.082
16.114

6
39.51
10.149

7
216.51
0.8359

Sum
433.4286
53.17571

so that the corresponding sums of squares are


(yi y)2 433.4286

(
yi y)2

53.1757

Thus, the coefficient of determination is


n

R2 =

i=1
n
i=1

(
yi y)2
(yi y)2

53.1757
433.4286

= 0.122686.

This is how we would calculate the coefficient of determination. The value of R2 is needed for VIF. In
calculating VIF though, only the independent variables are used. We alternate each one of them to
be regressed on the others.

NB: Rj2 = 0 implies that xj is not related to the other independent variables.

ACTIVITY 2.4
What is the value of V IFj when Rj2 = 0?

DISCUSSION OF ACTIVITY 2.4


If we can do our calculation right, it is easy to see that the V IFj = 1.
Therefore, we have discovered that if xj is regressed on x1 , x2 , ..., xj 1, xj + 1, ..., xk , the value
V IFj = 1 tells us that xj is not related to the other independent variables.

37

STA2604/1

What does Rj2 = 1 tell us?


This is a very rare occasion, but possible. It tells that xj can be explained perfectly by being regressed
on the other independent variables. When this value is attained, the VIF becomes very large, and
we write V IFj = .

The last case is used to explain the extent of multicollinearity. If the coefficient of determination of

one independent variable on others is very large (i.e., close to 1), the corresponding VIF is very large.
These two situations lead us to the guidelines for interpreting multicollinearity. To decide about the
severity of multicollinearity, we focus on the maximum VIF and the average of the VIFs. The guide
from Bowerman et al. (2005: 224) is to consider multicollinearity as severe if one of the following is
true:
The largest V IF > 10.
The mean of the V IF s is substantially greater than 1.

This means that if one of the above conditions is met, we can conclude that there is severe
multicollinearity between the independent variable that was regressed on and the others. However,
it is not easy to say what substantially greater than 1 means. We have to make it definite for the
sake of this module.
We rephrase the rule to be:
Consider multicollinearity as severe if one of the following is true:
The largest V IF > 10.
The mean V IF > 5.

ACTIVITY 2.5
Consider the sales territory performance data (p.222). Determine if we can conclude that there is
severe multicollinearity among the independent variables.

DISCUSSION OF ACTIVITY 2.5


The VIFs, from Figure 5.2, p.224, are
V IF s

3.34262

1.97762

1.91021

3.23576

1.60173

5.63932

1.81835

1.80856

We find that the maximum V IF = 5.63932. This value is clearly not larger than 10, and we cannot
decide until the second condition has been checked. Upon calculating the mean, we find that
V IF = 2.6667 which is much less than 5. We conclude that the independent variables are not

severely multicollinear.
We have used the coefficient of determination to calculate the VIF in order to test for multicollinearity.
You may ask: Do we need this measure for other purposes? Yes, it does have other uses as well.

38

2.2.3 Comparing regression models


The model developed must reflect the patterns in the data. It is quite possible to develop more than
one model that will reflect the data patterns. In this case the analyst needs to make a decision on
the best model. The question is how to choose one of the models over the others.
The measures that we will use for this purpose are R2 , adjusted R2 , s and predicted interval
length, and the C statistic (Bowerman et al., 2005:226-228). These values measure how well
the independent variables work together to describe, predict and control the dependent variable
accurately. In this module we examine this by examining if the overall model gives a high R2 and
2

adjusted R2 (denoted by R ), and a small s (or short prediction interval).


R2

This measure was dealt with to some extent earlier. It is explored further in this section. When
we add an independent variable to a regression model, it decreases the unexplained variation
and increases the explained variation thus, increasing the R2 . This is true even when it is an

unimportant independent variable.

ACTIVITY 2.6
Make sure that you understand the behaviour of the measure R2 when an additional independent
variable is added to the regression model.

DISCUSSION OF ACTIVITY 2.6


Study the second paragraph on p. 227.

Adjusted R2

The adjusted R2 is written as R . It is a measure used to avoid overestimation of the importance

of the independent variable and is provided by most computer packages.

ACTIVITY 2.7
How does this measure behave when an additional independent variable is included in the regression
model?

39

STA2604/1

DISCUSSION OF ACTIVITY 2.7


This measure depends only on R2 . Its formation is such that it would behave in the same way as R2 .
Since we saw in the previous activity that adding any independent variable increases the value of
2

R2 , it will also increase R . Since these two measures do not seem to provide adequate assistance,

let us try s, the standard error.


s

Consider the known notation used earlier. The sum of squared forecast errors (SSE) is defined
as
SSE = (yi yi )2 .

One criterion considered better than R2 and adjusted R2 for measuring the value of including an
additional independent variable is

s=

SSE
.
nk1

The guideline is that if s increases when we add another independent variable, then that
independent variable should not be added. It is desirable to have a small s. A large s is equivalent
to a long confidence interval. If we were to use the predicted interval length, short confidence
intervals are then indicators of a desired model. We will only use s in this module, but note that in
practice you may be required to use confidence intervals. Note the equivalence.
The next measure for comparing regression models that will be discussed is the C-statistic.
The C -statistic

The C -statistic, also called the Cp -statistic, is another valuable measure useful in comparing
regression models. Let s2p denote the mean square error based on a model using all p potential
independent variables. If SSE denotes the unexplained variation for another particular model that
has k independent variables, then the C -statistic for this model is

C=

SSE
[n 2 (k + 1)] .
s2p

40
ACTIVITY 2.8
Show that the C -statistic may be rewritten as
C=

SSE
+ 2k + 2 n.
s2p

DISCUSSION OF ACTIVITY 2.8


If you can remember the BODMAS rule, then the activity is reasonably straightforward. Can you
complete it by yourself ?
In the use of the C -statistic, we recall that we want SSE to be small. Thus, we want the C -statistic
to be small to trust in the model.

ACTIVITY 2.9
It says in the description of SSE that we want SSE to be small. Explain why we want this measure
to be small.

DISCUSSION OF ACTIVITY 2.9


If one looks at the formula for the measure, it may be written as
s2 =

(yi yi )2
SSE
=
.
nk1
nk1

In isolation we analyse
SSE = (yi yi )2 .

This is the sum of the squared differences between the actual values and the estimates. Ideally, if
the estimates are perfect predictions, they will replicate the actual values. Then the differences will
be zero. This will therefore result in SSE = 0, the smallest possible value of SSE. Therefore, if the
model used predicts the actual values satisfactorily, then the differences will be small and SSE will
be small.
Look at Example 5.1 (Bowerman et al. 2005: 228).
2

The output from MINITAB and SAS that appears on page 229 resulted from calculating R2 , R , s and
the Cp -statistic.
The MINITAB output gives the two best models of each size in terms of s, R2 and the C -statistic.
Thus, we find the two best one-variable models, the two best two-variable models, . . ., the two best
eight-variable models. Note that the adjusted R2 increases considerably when a second variable
is added. There is no problem with the inclusion of ACCTS because it is a good predictor of the
dependent variable.

41

STA2604/1

ACTIVITY 2.10
Use the output on p. 229 to answer the following.
(a) If a model with only two variables is to be used, which variables would you use?
(b) A model using five variables is the best. Do you agree? Justify your answer.

DISCUSSION OF ACTIVITY 2.10


(a) The model using ACCTS and ADVERT as predictors explains 77.5% of the variation, R2 = 77.5,
more than the model including MKTPOTEN and MKTSHARE.
(b) The models using five predictors have the smallest C -statistics (4.4) and C roughtly equals k + 1
(no. of predictors).

We now move on to residual analysis. If you have an interest in regression analysis you may study
stepwise regression on page 232 and backward elimination on page 235, but these two topics do not
form part of our syllabus.

Discussion
We know that most of the time series models we will develop in future as forecasters will not be 100%
accurate.
The error is e = y y, the deviation between the actual value and the estimate. In statistics we use

interesting terms, we speak of a residual when we mean an error.

There are methods to deal with these deviations in statistics so that our predictions remain useful
regardless of the presence of the errors. We refer to them as residual analysis.

2.3 Basic residual analysis


Bowerman et al. (2005: 236) explain the distribution of residuals used in simple regression. We
defined residuals earlier, and we have also used them in other calculations in this module. Do you
remember where?

42
ACTIVITY 2.11
Indicate if the following measures use residuals or not. You may explain in the space provided:
Measure

Involves residuals
Yes
No

Explanation

Chapter 1
Forecast error
Absolute deviation
MAD
Squared error
MSE
APE
MAPE
Chapter 2
Mean
Standard deviation
VIF
R2
Adjusted R2
Standard error
Mean square error
SSE
C -statistic
This is very interesting. There are links among these measures. Do you see the links? This activity
also ensures that we revise previous work. Can you see how much we have learnt so far?
If you answered "yes" it is an indication of the importance of residuals. The vehicle we will utilise in
this module to show this importance, is residual analysis.
Residual analysis assists us in the prediction task. It helps us to detect errors in the model we
develop, and gives us an indication of whether we are on the right track.
For this we use graphical plots of residuals. We call them residual plots.

2.3.1 Residual plots


Residuals are calculated for each observed y -value and then plotted against values of the
independent variable, the predicted values yi or the time variable. Now study table 5.2 and the
plots on p. 238. These plots are used to test the assumptions of constant error variance, correct
functional form, independence and normality.

43

STA2604/1

2.3.2 Constant variation assumption


Let us start with an easy activity from a previous scenario.

ACTIVITY 2.12
From Unit 1, Data set 1.4 Week 3 was as follows:
Day
yi
yi

1
21
27

2
15
11

3
20
20

4
27
26

5
13
14

6
25
22

7
11
9

Plot the residuals.

DISCUSSION OF ACTIVITY 2.12


We start by writing the data in the required form.
yi
yi
yi yi

21
27
6

15
11
4

20
20
0

27
26
1

13
14
-1

25
22
3

11
9
2

That is, the residuals are


Residuals

Residual plot of milk data


Residuals

Residual values

4
2
0
-2 0
-4
-6

3
1

0
1

-6

-8

Days

-1

2
7

44
Remember that we are using residual plots to test the assumption that e = (y y) has a normal

distribution with mean 0 and variance 2 . We use the above plot to test the constant variance
assumption. If the residuals are randomly distributed around the zero mean we can assume constant
error variance. If, however, the residual plot "fans out" or "funnels in" (see Figure 5.7 in textbook) we
have an increasing or decreasing error variance which implies that the assumption of constant error
variance is violated.
Let us share something with you about the residual plot for the milk data.
If you visually place the residual plot in the box below and use lines to explain its shape, it cannot be
appropriately explained by a parallel band of the following form:

Also, it does not look like it can be appropriately explained by a fan shape of the form

45

STA2604/1

Instead, it looks very much like it can be appropriately explained by a funnel shape of the form

Thus the residuals for the milk data violate the assumption of constant variance. You are urged to
view these shapes as presented in Figures 5.6 and 5.7 on p. 238 of Bowerman et al. (2005).

2.3.3 Correct functional form assumption


The model specified from the given data may be correct or incorrect. Using a residual plot, we can
determine whether this functional form is correct or not. If the functional form is incorrect, a correct
one can be found from the residual plot constructed from the derived model by displaying the pattern
of the appropriate model. For example, if we use a simple linear regression model when the true
relationship between y and x is curved, the residual plot would appear as a curve. Refer to Figure
3.4 in the textbook.

2.3.4 Normality assumption


We remember that a normal distribution exists when we can represent data by a bell shape with
symmetry around a central point. In the current setup, if the normality assumption holds, a histogram
and/or stem-and-leaf display of the residuals should assume a bell shape. Consider the residuals for
the milk data used in Activity 2.12.

46

Residuals
Residual values

6
4
2
0
-2
-4

-6
-8

Days

The above plot shows no evidence of a bell shape. The normality assumption is violated.
We can also employ a normal plot of the residuals to determine normality. The procedure for the
normal plot is explained on p. 240 of the textbook.

ACTIVITY 2.13
Use a normal plot for the data of Activity 2.12 to determine whether the data come from a normal
distribution or not.

DISCUSSION OF ACTIVITY 2.13


In Activity 2.12, n = 7 and the residuals are as follows:
ei

The ordered residuals are


e(i)

The

3i 1
, i = 1, 2, ..., 7 are
3n + 1

Value

0.090909

0.227273

0.363636

0.500000

0.636364

0.772727

The corresponding points from the normal probability table are


z(i)

1.336

0.747

0.345

0.345

0.747

1.336

0.909091

47

STA2604/1

Therefore, we plot
z(i)

1.336

0.747

0.345

0.345

0.747

1.336

e(i)

Does your graph give you a straight line?


Comment to conclude the activity.

Two normal plots appear in Bowerman et al. (2005: 241). The discussion on p. 242 seems to
suggest that the straight line shape is not evident. What is your observation from the graph? That is,
do you agree with the authors?

2.3.5 Independence assumption


In time series an important concept that is often discussed with independence, is autocorrelation.
Autocorrelation defines a pattern of the errors. We say that error terms that occur over time have
positive autocorrelation if a positive error term in some time period tends to produce or be followed
by another positive error term in a future period. On the basis of the above, how would you define
negative autocorrelation?
Since these are time series, the resulting error terms are also time-based. In the case where the
time-dependent errors do not display a cyclic or alternating pattern, we say that the error terms are
statistically independent.

2.3.6 Remedy for violations of assumptions


Bowerman et al. (2005: 246) deal with this topic. A common approach to remedy the problem is
to transform the dependent variable. One may raise the dependent variable to suitable powers. For
example, a rapidly increasing error variance which occurs when the dependent variable increases,
may be remedied by a root (such as the square root) or logarithmic transformation.
There may be other reasons for transforming data. Transformation of data may also be done by
multiplying by appropriate factors to restrain it. What example can you think of where multiplication
by a factor is needed? We leave the rest for your own reading.

48

2.4 Outliers and influential observations


Observations that lie far away from the bulk of your data, are called outliers. Some outliers influence
the measures derived from your data. They are called influential observations.
Influential observations have a serious effect on the analysis. To test for the effects caused by a
suspected data point, we perform calculations and estimations, e.g. leverage values, studentised
residuals and Cooks measure. Then we could remove the suspected data point and perform the
same calculations to observe the change in the findings.

Outliers are not necessarily errors, as we may be led to believe. They are often very high or very low
values that occur because of conditions that existed at the time they were observed. Some of them
may indicate a fortune while others may be an indication of a hardship. When high successes are
experienced, analysts may examine the factors that contribute to high levels of success. It is better
to take note of the conditions that are necessary to eliminate the outlier!
Be warned also that sometimes low values and high values may occur due to seasonality, not
because they are just outliers. Out of the time series context they may be judged as bad or good
while under the time series scope they may be normal values with a useful implication.
ACTIVITY 2.14
Are there outliers in the following data set? Identify them.
x
y

40
90

36
77

49
87

1207
46

23
290

38
79

27
58

44
66

45
87

30
66

DISCUSSION OF ACTIVITY 2.14


In the x data set, most values lie in the region of the twenties to the forties. The outlier is therefore
x = __________. The y -values lie in the forties to the nineties so that the outlier is y = __________.

ACTIVITY 2.15
Calculate the means and the standard deviations of the data in Activity 2.14.

49

STA2604/1

DISCUSSION OF ACTIVITY 2.15


Mean
Standard deviation

x
153.9
370.11

y
94.6
70.09

ACTIVITY 2.16
Remove the values which you said were outliers in Activity 2.14. Calculate the means and standard
deviations. Were these data points influential?
DISCUSSION OF ACTIVITY 2.16
The new data sets are
x
y

40
90

36
77

49
87

38
79

27
58

44
66

45
87

30
66

If you did not get the correct answers in Activity 2.14, this is the time to update your answers to that
question.
Mean
Standard deviation

x
38.65
7.520

y
76.25
11.780

Are there substantial differences from these measures based on the original data? Well, this is
obvious. What do you conclude?

ACTIVITY 2.17
Explain if outliers and influential observations are the same.

DISCUSSION OF ACTIVITY 2.17


This is a question given to remove a possible misconception that if a value lies far away from the
others, it will also influence measures calculated from the data set. There are some statistical
measures that are easily influenced by outliers, such as the mean and the standard deviation. But
the median and the mode are not influenced that easily. Do you see why?

2.4.1 Leverage values


Leverage values are used to identify outliers with respect to the x-values. The leverage value for
an observation xi is the distance value of that observation. In computer programs these values are
usually labelled "Hat Diag H" or H.1.

50
A leverage value is considered to be large if it is greater than twice the average of all the leverage
values which can be calculated as 2 (k + 1) /n, where k is the number of predictors and n the sample
size.

2.4.2 Residuals
In order to identify outliers with respect to their y -values, we can use residuals as before. The rule of
the thumb is that any residual that is substantially different from the others is suspect. This topic is
presented in Bowerman et al. (2005: 257). Before going any deeper, we should experiment with our
data and calculate the residuals.

ACTIVITY 2.18
This activity is included to give you a feeling for the calculations done when analysing residuals. It
is unrealistic data, just to prove the point. In real life this analysis will be done by a computer. Make
sure that you understand the computer output given for the exercises, p. 262-277.
The following data are given:
x
y

40
90

36
77

49
87

1207
46

23
290

38
79

27
58

44
66

45
87

30
66

(a) Find the regression equation y = a + bx using the method of least squares.
(b) Calculate the residuals.
(c) Identify residuals that are suspect.

DISCUSSION OF ACTIVITY 2.18


(a) The method of least squares provides the values of a and b as follows:

b =

nxy xy
nx2 (x)2

10 (86194) (1539) (946)


10 (1469709) (1539)2

593954
12328569

= 0.048

51

STA2604/1

and
y bx
n

a =

946 (0.048) (1539)


10

= 102.

The equation is therefore


y = 102 0.048x.

(b) To calculate the residuals, we estimate y-values using the equation above and the following xvalues:
x

40

36

49

1207

23

38

27

44

45

30

The estimates of y derived from the above are


y

100.08

100.272

99.648

44.064

100.896

100.176

100.704

99.888

99.84

100.56

79
100.176

58
100.704

66
99.888

87
99.84

66
100.56

21.176

42.704

33.888

The values to be used for calculating the residuals are


y
y

90
100.08

77
100.272

87
99.648

46
44.064

12.648

1.936

290
100.896

The residuals are


e

10.08

23.272

189.104

(c) The residuals that are suspect are the fourth and the fifth ones, namely
e

1.936

189.104

The figure 1.936 is extremely low while 189.104 is extremely high.

12.84

34.56

52

2.4.3 Studentised residuals


We also note that we "should" suspect that the fifth y -value is an outlier. It is only sensible that
we should find a way to confirm it. To identify outliers with respect to y, we can use residuals. A
studentised residual is the observations residual, divided by its standard error. If the value is > 2,
we can assume the observation to be an outlier.
In the computer output on p. 256 the studentised residuals are given under column SRES1. See if
you can identify the outlier with respect to y .

ACTIVITY 2.19
Use the data of Activity 2.14 to calculate the studentised residuals.

DISCUSSION OF ACTIVITY 2.19


First, we need SSE:
SSE = (yi yi )2
= 41346.94554.

Then
s =

SSE
n2
41346.94554
8

= 71.8914.

Now we want the distances Di so that we can evaluate

si = s 1 + Di .

Now
Di =

1 (xi x)2
+
n
SSxx

where
n

SSxx =
i=1

(xi x)2

= 1232856.9.

with x
=

xi
1539
=
= 153.9
n
10

53

STA2604/1

i
Di

1
0.1105229

2
0.111275

3
0.108926

4
0.999553

5
0.113898

i
Di

6
0.110896

7
0.113062

8
0.109797

9
0.109619

10
0.112452

Now we want

si = s 1 + Di .

They are
i
si

1
200.6332

2
200.7011

3
200.4889

4
269.2188

5
200.9379

i
si

6
200.6669

7
200.8624

8
200.5676

9
200.5516

10
200.8074

The studentised residuals are then derived from


estud
=
i

ei
.
si

The studentised residuals are


i
estud
i
i
estud
i

1
0.2046521
6
0.153787

2
0.147832
7
0.071044

3
0.17183

4
1.748726

5
1.233615

8
0.077081

9
0.179804

10
0.104926

Since no value is greater than 2, studentised residuals do not suggest any outliers with respect to y .

How can an obvious outlier not be justified by our measure? Let us find an additional guideline.
Studentised deleted residuals may also be used. Thereafter we will also consult Cook (with his
distance) to provide it (Bowerman, 2005: 257-258).

2.4.3.1 Deleted residuals


Let us try the deleted residual first. The deleted residual for observation i is calculated by subtracting
yi from the point estimate computed using least squares point estimation based on all n observations

except for observation i. This is done because if yi is an outlier with respect to its y -value, using this
observation to compute the usual least squares point estimates might draw the usual point prediction
yi towards yi and thus cause the resulting usual residual to be small. This would falsely imply that

observation i is not an outlier with respect to its y -value. Studentised deleted residuals are computed
by most software packages and denoted by RStudent (SAS) and TRES1 (Minitab) on p. 256.

54
ACTIVITY 2.20
Inspect the output on p. 256 of the textbook.

2.4.4 Cooks distance


Cooks distance (CD) is a useful statistic that is sensitive to both outliers and leverage points.
Because of this, it makes an effective measure for detecting them. There are other measures, but
this one is considered the single most representative measure of influence on overall fit (Hair et al.,
1998:225). Cooks distance (CDi ) measures the change in the regression coefficients that would
occur if the ith observation was omitted. It is defined as

CDi =

(yi yi )2
Di
.

(n + 1) SSE (1 Di )2

Cooks distance is compared to F-critical values to see if it is significant. To guide us further we shall
use the following rule of the thumb:
A value of CDi > 1.0 would generally be considered large.

ACTIVITY 2.21
Study the output on p. 256. Which observation has a significant Cooks D? Which critical values did
you use to make your decision?

2.4.5 Dealing with outliers and influential observations


When we analyse data, we do not want a great influence from only a few elements, since it is
important to get information about the majority. Therefore, these influential observations should first
be dealt with (Bowerman et al., 2005: 259).
In practical situations outliers could have important implications. The patterns of time series, such
as seasonality, could be the result of outlying elements in the data. To identify outliers we inspect
leverage points and residuals using the techniques studied above.

2.5 Conclusion
The study unit explained model building, and checking the model for usefulness by checking how
far it is deviant from real observations. Some useful statistics were introduced and experimentations
took place to appreciate them. These statistics are important and should be remembered. You are

55

STA2604/1

not required to memorise them. You are also not expected to derive them. However, you need to be
able to interpret computer output on this topic.

EXERCISES
Consider the values of the pair (X, Y ) given below:
i
1
2
3
4
5
6

X
2
15
11
100
25
9

Y
18
129
90
805
210
88

Calculate
(a) SSxx
(b) SSxy
where SSxy = xi yi

(xi ) (yi )
n

(c) Distance values Di


(d) 2(k + 1)/n where k = 1. Why is k = 1?
(e) Which Di s are larger than the value for 2(k + 1)/n?
(f) Can you conclude that there are outliers in the data? Explain.

Open questions
(a) Why do we, as forecasters, have to study residuals, outliers, influential observations and the
underlying measures?
(b) What is the role of residuals and of deleted residuals? Clarify your answer. Do residuals also
explain deleted residuals?
(c) Why do we need to identify influential observations?

Textbook exercises
Exercise 5.4
Exercise 5.5
Exercise 5.7
Exercise 5.16

56

UNIT 3: Time series regression


Outcomes table for the study unit
Outcomes - At the end
of the module you
should be able to
- use polynomial in
modeling trend

- detect
autocorrelation

- dummy variables to
model seasonality

Assessment
- data plots,
parameter
estimation
and
measures
- Durbin
-Watson
test, graphs
- regression
of
seasonality
using
dummy
variables

Content

Activities

Feedback

- model trend
using
polynomial
functions

- plot graphs,
experiment
with data and
interpret data

- discuss the
activity

- autocorrelation
detection,
DW statistic

- perform
exercises
with DW

- discuss the
activities

- find lengths of
seasonality,
develop
forecasts

- discuss the
activities

- modeling with
dummy
variables

3.1 Introduction
This unit is based on Chapter 6 of Bowerman et al. (2005), which is Time Series Regression. It
does not require full affluence in regression, your basic knowledge of polynomials will suffice. We
discussed regression models roughly in the past study units. There we stated that the variable
of interest (y), which is the dependent variable, is regressed on the variables (factors) on which it
depends. These factors vary freely, and the manner in which they vary breeds the manner in which
the dependent variable behaves. Since these factors vary randomly, they are random variables. In
the past two units we plotted and interpreted some graphs. Did you find them useful? Quadratic
equations were also dealt with at school. Do you remember the parabola? This is the graph of a
quadratic equation. You are welcome to refer to school textbooks for these graphs.
These topics, together with the ones we learnt in study units 1 and 2 such as the components of time
series, will be integrated in this study unit. Do you still remember the components of time series?
Attempt to name them.
We defined trend, seasonality and cyclic patterns in the earlier study units. We will treat trend as
it may occur in a linear pattern, a quadratic pattern and where there is no trend. The linear and
quadratic patterns will include decreasing and increasing trends.
One of the elements we dealt with in the previous study units is independence. Residuals are useful
in detecting if the data are independent or not. Time series data are observations of the same
phenomenon recorded over consecutive time periods. Hence, they cannot be fully independent. The

57

STA2604/1

usual relationship in time series data is autocorrelation. When the adjacent residuals have roughly
the same value and being correlated with each other we say they are autocorrelated.
Autocorrelation can be negative or positive. Positive correlation exists when over time, a positive
error term is followed by another positive error term and if over time, a negative error term is followed
by another negative error term. On the other hand, negative autocorrelation exists when over time,
a positive error term is followed by a negative error term and if over time, a negative error term is
followed by a positive error term. We will explore this idea further. Residual plots and the DurbinWatson statistic will be involved.
Do you remember that some data do not have a seasonal pattern? Analysing data will reveal the
presence or absence of seasonality and when present, we should be able to determine the pattern.
We will show how dummy variables and trigonometric functions may be used to deal with seasonality.
Growth curve models will also be studied. The unit will also show how to deal with autocorrelated
errors using first-order autocorrelated process.

3.2 Modeling trend by using polynomial functions


This section is based on Section 6.1, page 280 in Bowerman et al. (2005). A time series need not
have all the components and each component may be analysed separately. We start by assuming
that the time series being dealt with is described fully by using a trend model. Bowerman et al. (2005:
280) define the trend model as:
yt = T Rt + t

where
yt = the value of the time series in period t
T Rt = the trend in time period t
t = the error term in time period t

Time series yt can be represented by an average level t , which changes over time according to the
equation, t = T Rt and by the error term t . As we recall that random fluctuations do often occur
in a process, the error term represents random fluctuations that cause yt values to deviate from the
average level t . The three trends that we are going to study in this module are no trend, linear trend,
and quadratic trend.
ACTIVITY 3.1
What do you think no trend means?

DISCUSSION OF ACTIVITY 3.1


We said that trend describes long term growth or decline. Thus, "no trend" means there is no growth
or decline. Hence we anticipate a constant process.

58

3.2.1 No trend
See point number 1 in the second rectangular box on page 280 of the textbook. In qualitative terms
one may describe the condition as stable. This is a case of no deterioration and no improvement,
therefore a case of no trend. In this case there is a general constant pattern displayed with no long
run growth or decline over time. In this case the trend takes some constant value 0 , and is modeled
as T Rt = 0 . For depiction of the shape of a process that shows no trend, see Figure 6.1 (a) on
page 281 of the textbook. Generally the case of no trend is undesirable, but it may happen. Who
would not want to see change?
Note that the case of no trend does not necessarily mean absolutely no change. If the changes
are shown by fluctuations (the ups and downs) in such a way that the average seems constant in the
long run, then we have no trend.

3.2.2 Linear trend


This topic is point number 2 in the second rectangular box (p.280) of the textbook. The trend is linear.
The trend is not regressed on x, but on time t. Further, we denote the constant parameters using
subscripts. Specifically, we use a and b in the place of 0 and 1 , respectively. When we assume
that there is a linear trend, it means that we assume that the trend is of the form y = a + bx. The
parameters a and b are constants to be estimated. The equation yt = T Rt + t will now become:
yt = T Rt + t
= 0 + 1 t + t

The values 0 and 1 of the above equation provide us with the shape of the line graph. Try to recall
the values that lead to various shapes.
ACTIVITY 3.2
Discuss the implications of the parameters 0 and 1 on the shape of the linear graph.

DISCUSSION OF ACTIVITY 3.2


The intercept on the vertical axis can provide some information about the history of a time series. In
case of the slope, which is either increasing or decreasing, the sign explains the trend. A negative
slope shows a decline, ( 1 < 0). Can you explain the cases 1 > 0 and 1 = 0? There is an obvious
relationship between the case of 1 = 0 and the case we discussed of no trend. Do you see it?
Study Figure 6.1 (c) of Bowerman. When the values on the horizontal axis increase, they cause a
decrease in the values of the vertical axis.

59

STA2604/1

In Figure 6.1 (b) of Bowerman an increase in the values on the horizontal axis is accompanied by an
increase in the values of the vertical axis.

3.2.3 Quadratic and higher order ploynomial trend


Quadratic trend is point number 3 in the second rectangular box on page 280 of the textbook. We
recall that the quadratic equation has the form y = a + bx + cx2 . The equation for the trend is
T Rt = 0 + 1 t + 2 t2 . The highest exponent determines the overall shape, which is an indicator

of the behaviour of the dependent variable. The knowledge we acquired in the school years also
becomes handy here!
The quadratic trend may show either an increase or a decrease in the dependent variable. It is now
time to separate these two so that more details of each can be revealed.
Trend showing growth
The graphs showing growth are given in Figures 6.1 (d) and (e) on page 281 of the textbook. Growth
may occur at an increasing rate, which is shown in Figure 6.1 (d). It may also occur at a decreasing
rate, which is shown in Figure 6.1 (e).
Trend showing decline
The graphs showing decline are given in Figures 6.1 (f) and (g) on page 281 of the textbook. Decline
may occur at an increasing rate, which is shown in Figure 6.1 (f). It may also occur at a decreasing
rate, which is shown in Figure 6.1 (g).
A more general one is the pth-order polynomial trend given by:

yt = T Rt + t
= 0 + 1 t + 2 t2 + ... + p tp + t

ACTIVITY 3.3
Write down the equation for the 3rd -order polynomial trend model.

DISCUSSION OF ACTIVITY 3.3


See the paragraph discussing this on page 281 of Bowerman.
Regression models may be used to calculate the least squares point estimates of the parameters in
these trend models. The assumptions in the module are that the error term t satisfies the constant
variance, independence, and normality assumptions.

ACTIVITY 3.4
How would you identify the violations of the assumptions?

60
DISCUSSION OF ACTIVITY 3.4
We know that the behaviour of the residuals indicate what we missed in the estimation. A horizontal
band of the residual plot confirms a constant variance assumption. Fanning out indicates increasing
variance and fuelling in shows a decrease in variance. The normality assumption can be checked
using normal plots.

Apart from these, histograms and stem-and-leaf diagrams can reveal the

normality pattern as well. We leave the discussion of the independence assumption for Section
6.2.

DISCUSSION OF EXAMPLE 6.1


Let us discuss Example 6.1 on page 282 of Bowerman et al. (2005). When there is mention of
minimum and maximum, we must remember from first year that it is the confidence intervals that are
being implied. Note that the two samples have been combined so that we have observations from
month 1 to month 24. Now, do you recall interval estimation? In forecasting we speak of forecasts
when point estimates of future predictions are of interest, and of prediction interval forecasts for
confidence intervals of the predicted future confidence intervals. The example claims that the plot of
the data reveals a random fluctuation around a constant average level. Let us look at Figure 6.2 at
the bottom of page 282. Do you see the random fluctuation around some constant?
Study the plot carefully. You need to know how to read off the values from a graph. When looking
at this plot, what would you say are the minimum and the maximum values? To me it looks like the
minimum is 276. Do you see it the same way? The maximum is slightly difficult to read, and 425
seems to be too high. Let me settle for 405. Does it make sense to you? What would you choose?
No problem, these are in Table 6.1, page 282.
Now, since we assumed a random fluctuation around a constant level, it makes sense to believe that
a linear trend would best describe the model for the problem at hand. Hence the example came to
the conclusion that the regression model to be used in forecasting the cod catch in future months is:
yt = T Rt + t = 0 + t

The parameter is a constant. What type of trend is this?


= y,

and the 100 (1 ) % prediction interval is given by:


[n1]

y t
2

Do you remember the formula?

s 1+

1
1
[n1]
, y t s 1 +
.
n
n
2

61

STA2604/1

, the table value from a t-table t[n1]


The sample mean for
. This value is read from a t-table

0
2

[n1]

corresponding to n 1 degrees of freedom. It is common to write the table value t

in other books at your disposal. Here, is the sample size. What is the value of n?

as t ,n1 , as
2

Also,
n

s=

i=1

(yt y)

n1

We want the 95% prediction interval. What is the value of ? Immediately we write n = 24 and read
[23]

off t0.025 = 2.069 from the t-tables. Other calculations are:


Month t : 1
2
3
4
5
6
7
8
9
10
11
12
Cod catch : 362 381 317 297 399 402 375 349 386 328 389 343
(yi y)2 : 115 883 1176 2948 2276 2571 562 5.25 1205 543 1422 68.8
Month t : 13
14
15
16
17
18
19
20
21
22
23
24
Cod catch : 276 334 394 334 384 314 344 337 345 362 314 365
(yi y)2 : 5669 299 1824 299 1070 1391 53.2 204 39.6 115 1391 188
n

y = 351.2917;
i=1
n

s=

i=1

(yi y)2
n1

(yi y)2 = 26314.96;

26314.96
= 33.82497
23

We are left with the final calculation of the prediction interval.


The 95% prediction interval is:
[23]

y t0.025 s 1 +

1
1
[23]
, y + t0.025 s 1 +
n
n

351.2917 2.069 (33.82497)

= (279.8647;

1+

1
; 351.2917 + 2.069 (33.82497)
24

1+

1
24

422.7187)

DISCUSSION OF EXAMPLE 6.2


These are the sales of the Bismark X-12 calculator. The data in Table 6.2 are plotted in Figure 6.4,
page 284 of Bowerman. The figure gives an indication of the trend. Can you explain the trend type
of these data? The plot reveals a linear trend by close inspection! Therefore, we shall employ the
regression equation of the form:
yt = T Rt + t = 0 + 1 t + t

62
In general, since t has taken the role of x, we can show that the least squares point estimates of 1
and 0 , respectively, are:
b1 =

ntyt tyt
yi b1 t
and b0 =
.
2
2
n
nt (t)

Can you show these? Do you notice the equivalence with the equations on p.285 of Bowerman et
al. (2005)? In performing these calculations in detail we have:
Month t : 1
2
3
4
5
6
7
8
9
10
11
12
t2 : 1
4
9
16
25
36
49
64
81
100 121 144
Cod catch : 197 211 203 247 239 269 308 262 258 256 261 288
tyt : 197 422 609 988 1195 1614 2156 2096 2322 2560 2871 3456
Month t : 13
14
15
16
17
18
19
20
21
22
23
24
t2 : 169 196 225 256 289 324 361 400 441 484 529 576
Cod catch: 296 276 305 308 356 393 363 386 443 308 358 384
tyt : 3848 3864 4575 4928 6052 7074 6897 7720 9303 6776 8234 9216
The required ones are:
t = 300;

t2 = 4900;

yi = 7175;

tyi = 98973

Then the calculations are:


b1 =

ntyt tyt
24 (98973) (300) (7175)
= 8.0743
2 =
2
24 (4900) 3002
nt (t)

and
b0 =

7175 (8.0743) (300)


yt b1 t
=
= 198.0296.
n
24

The fitted regression equation is:


yt = 198.0296 + 8.0743t

We can forecast the cod catch for any future month in the year 3, year 4 and so on. For example, the
forecast for January of year 3 is the same as month t = 25 of the entire setup. Hence, this yields:
y25 = 198.0296 + 8.0743t = yt = 198.0296 + 8.0743 (25) = 399.8871

Suppose that we want a forecast for May of the seventh year. This is the 77th month (t = 77) in the
current model. Hence, the forecast is:
y77 = 198.0296 + 8.0743t = yt = 198.0296 + 8.0743 (77) = 819.7507

The point forecasts for January (i.e. y25 , ) and February (i.e. y26 , ) of year 3 have been calculated on
page 285 of Bowerman et al. (2005). Are you happy with the manner in which they are presented?
For linear trend, quadratic trend and polynomials of higher order, point estimation is adequate.

63

STA2604/1

DISCUSSION OF EXAMPLE 6.3


The values and the appropriate calculations are
Month t
cod catch y
t2
(t t)2
1
297
1
132.25
2
249
4
110.25
3
340
9
90.25
4
406
16
72.25
5
464
25
56.25
6
481
36
42.25
7
549
49
30.25
8
553
64
20.25
9
556
81
12.25
10
642
100
6.25
11
670
121
2.25
12
712
144
0.25
13
808
169
0.25
14
809
196
2.25
15
867
225
6.25
16
855
256
12.25
17
965
289
20.25
18
921
324
30.25
19
956
361
42.25
20
990
400
56.25
21
1019
441
72.25
22
1021
484
90.25
23
1033
529
110.25
24
1127
576
132.25
t = 300
y = 1729
t2 = 4900
The loan request can reasonably be predited using:

ty
297
498
1020
1624
2320
2886
3843
4424
5004
6420
7370
8544
10504
11326
13005
13680
16405
16578
18164
19800
21399
22462
23759
27048
ty = 258380

(y y)2
50606.1
31314.1
71801.49
111527.9
153631.1
167246.6
227488.9
231320.6
234215.3
324852.1
357553.8
409546.2
541634.2
543107.1
631958.2
613023.2
797374
720729.7
781381.7
842646.9
896729.5
900521.3
923440.3
1112936
(y y)2 = 1167587

y = 199.62 + 50.937t 0.5677t2 page 287 of bowerman et al (2005)

t = 300;

t2 = 4900;

yi = 1729;

(y y)2 = 11676587

The loan requests can reasonably be predicted using:


yi = 199.62 + 50.937t 0.5677t2

ACTIVITY 3.5
Determine forecasts for the loan requests for April of year 7.

DISCUSSION OF ACTIVITY 3.5


We note that December of year 6 is t = 72. Can you see why? It then becomes easy to realise that
April of year 7 is t = 76. Thus, we are required to determine the value of y76 . Thus:
yt = 199.62 + 50.937t 0.5677t2

= 199.62 + 50.937 (76) 0.5677 (76)2


= 791.80

64
The forecasts for January and February of year 3 have also been calculated in Bowerman. Are you
comfortable with the respective values given for the subscripts 25 and 26, i.e. y25 and y26 ?

3.3 Detecting autocorrelation


Study Section 6.2 p.288 of the textbook. Correlation analysis measures the strength of association
between variables. The prefix auto-, explains that the strength of the association is examined for the
same variable. Correlation is studied on the same variable with its observations collected over time.
We recall that correlation can be negative or positive. Infact it lies between 1 and 1, 1 r 1.

First, being the observation from the same variable, it is common for the time-ordered error terms
to be autocorrelated. When this happens, it violates the regression assumption that error terms
need to be independent. Interestingly, there is an easy way to determine if the error terms are
autocorrelated and to determine the direction (negative or positive) of the autocorrelation. The
pattern of autocorrelation has been discussed earlier.

3.3.1 Residual plot inspection


You may encounter residual plot inspection at various sections as you study. It is therefore an
important topic.
ACTIVITY 3.6
Consider the following residual plots of time series data. State in each case if the error terms are
negatively autocorrelated, positively autocorrelated or there is no autocorrelation. The space in
Verdict below the following graph allows you to fill in the answer.
20
15
10
5
0
0

10

12

-5
-1 0

Residual plot (a)


Verdict: The residual plot above shows a ____________ autocorrelation.

14

16

65

STA2604/1

10
8
6
4
2
0
-2 0

10

12

14

16

14

16

-4
-6
-8

Residual plot (b)


Verdict: The residual plot above shows a ____________ autocorrelation.

20
15
10
5
0
-5

10

12

-10
-15

Residual plot (c)


Verdict: The residual plot above shows a ____________ autocorrelation.

DISCUSSION OF ACTIVITY 3.6


You have made the verdicts by deciding the appropriate pattern for each graph given. Are you happy
with your answers? Residual plot (a) is fully characterised by the Positive autocorrelation clarification
phrase. Residual plot (b) cannot be related to any of the two phrases; hence it is an example of a
case where there is no autocorrelation. Lastly, Residual plot (c) is fully characterised by the Negative
autocorrelation clarification phrase. To convince ourselves even more, we read off the values from
these graphs. The three residual data sets used are:

66
residuals (a) 2 7 4
3
9 14
4 1 5 3 1
2
5 3
residuals (b) 2 7
4 3
9 0 4 1
5 3
1 1
5 3
residuals (c) 2
7 4
3 9 14 4
1 5
3 1
2 5 3
These residuals confirm the verdicts in the above discussion and also conform to the expressions
given by the paragraphs indicated as phrases? Later we will give a formula to calculate so that you
do not rely only on eye inspection. But before we get there, try the use of runs on page 290 of
Bowerman et al. (2005).
A run is simply a set of same signs following each other. If you can identify that signs of residuals
that follow each other appear as runs then we have a positive autocorrelation. If the signs alternate,
we have a negative autocorrelation. Where none of these patterns appear, then there is a random
pattern. This is the case where the assumption of independent errors is confirmed. The two cases
of autocorrelation are undesirable since they violate the assumption.

3.3.2 First-order autocorrelation


There is an informal discussion of this topic on p.290 of the textbook. Study it. AR stands for
autoregressive. It is interesting that residuals may be related to their immediate successors. In this
case, the error term in period t (t ) is related to the error term in period t 1, namely; t1 . This

is called first-order autocorrelation. We write AR(1) for this case and represent it by the equation
t = 1 t + at . Here we assume that:
1 is the correlation coefficient between error terms separated by one time period; and
a1 , a2 , ... are values randomly and independently selected from a normal distribution having mean

zero and a variance independent of time.

We promised to show how to determine negative or positive autocorrelation. The Durbin-Watson


test will assist in achieving this. This test can be one-sided (one-tailed) or two-sided (two-tailed). It is
important to note the meaning given by each version of a one-sided test. The Durbin-Watson (DW)
statistic is used for all the three versions. The DW statistic will not be used if the residuals are less
than 15 in number or more than 100. We also need the power (k) of the polynomial from which the
residuals were derived. Let e1 , e2 , ..., en be the time-ordered residuals. The DW statistic is:
n

d=

i=2

(ei ei1 )2
n
i=1

e2i

Positive autocorrelation is the first of the three versions that we look at in the use of the DW statistic.

67

STA2604/1

3.3.2.1 Durbin-Watson test for positive autocorrelation


This version is a one-sided test for positive autocorrelation. The test is in Bowerman page 291. It is
formulated in clearer detail as follows:
Positive autocorrelation test
1. H0 : The error terms are not autocorrelated.
vs
Ha : The error terms are positively autocorrelated.
2. Calculate d.
3. Set , the level of significance (usually 1% or 5%)
4. Read off the value from the table, Table A5 Bowerman, page 598
5. Make a decision based on the rule:
5.1 Reject H0 if d < dL,
5.2 Do not reject H0 if d < dU,
5.3 We are unable to reach a decision if dL, d dU,
An easy illustration follows:

ACTIVITY 3.7
Use the DW test to determine if the following residuals are positively AR(1). Assume that the model
for the residuals was of the fourth power.
Error terms: 2

4 3 9 14 4
ei
2
7
4
3
9
14
4
1
5
3
1
2
5
3
1
4
Total

e2i
4
49
16
9
81
196
16
1
25
9
1
4
25
9
1
16
462

ei1

2
7
4
3
9
14
4
1
5
3
1
2
5
3
1

(ei ei1 )2

25
9
49
36
25
100
25
16
4
4
9
9
4
16
9
340

1 2 5 3

68
DISCUSSION OF ACTIVITY 3.7
1.

H0 : The error terms are not autocorrelated.


vs
Ha : The error terms are positively autocorrelated.

2.

d=

3.

We choose = 0.01.

4.

Since we assume that the model used was of the fourth power, and using = 0.01,
from the table, Table A6 Bowerman, page 599 we read off the values
corresponding to row k = 4 and n = 16. These values are
dL,0.01 = 0.53 and dU,0.01 = 1.66.

5.

We are unable to reach a decision since dL, d dU,

(ei ei1 )2
340
= 0.7359
=
2
462
ei

While we are discussing this activity, one realises that the choice can be an important factor. To
illustrate this point, suppose in the above activity we chose = 0.05. We would therefore have
dL,0.05 = 0.74 and dU,0.05 = 1.93. The decision is to reject H0 since d < dL, . Interesting! What do

you think? In order to address the activity fully, the decision reached implies that at the 5% level of
significance we conclude that the error terms are positively autocorrelated.
DISCUSSION OF EXAMPLE 6.5, BOWERMAN (page 292)
At the beginning the example is based on a linear trend model. Hence, for that part, k = 1. For the
second part a quadratic trend is assumed, hence there k = 2. Do you realise why for the two models
we have different values of DW? We note that a wrong decision may be made if the error terms are
calculated from an incorrect model. We proceed to the DW test for negative AR(1).

69

STA2604/1

3.3.2.2 Durbin-Watson test for negative autocorrelation


This version is a one-sided test to test for negative autocorrelation. The test is in Bowerman, page
293. It is formulated in similar details as for positive correlation as follows:

Negative autocorrelation test


1. H0 : The error terms are not autocorrelated.
vs
Ha : The error terms are negatively autocorrelated.
2. Calculate d.
3. Set , the level of significance (usually 1% or 5%)
4. Read off the value from the table, Table A5 Bowerman
5. Make a decision based on the rule:
5.1 Reject H0 if (4 d) < dL,
5.2 Do not reject H0 if (4 d) < dU,
5.3 We are unable to reach a decision if dL, (4 d) dU,
An easy illustration follows.
ACTIVITY 3.8
Use the DW test to determine if the following residuals are negative AR(1). For arguments sake,
assume that the model used from where these residuals were derived had a quadratic equation. Use
= 0.05.

Error terms: -2 -7 4 -3 9 0 -4 -1 5 -3 1 -1 5 3 -4 9 -4
DISCUSSION OF ACTIVITY 3.8
1.

H0 : The error terms are not autocorrelated.


vs
Ha : The error terms are negatively autocorrelated.

2.

d=

3.

We were instructed to use = 0.05.

4.

With k = 2, n = 17 and = 0.05, then dL,0.05 = 1.02 and dU,0.05 = 1.54.

5.

Now, (4 d) = 3.6381. Thus, (4 d) > dU, .


We do not reject H0 at the 5% significance level. There is no evidence to
support the hypothesis that the error terms are negatively correlated.

(ei ei1 )2
359
= 0.3619
=
2
992
ei

When you are required to test any hypothesis, show the steps you follow. This is the reason we
formulated the steps formally for this test to make it easy. There is a tendency for students to start
with the statistics, then read of the table values and make a decision about a hypothesis that they

70
did not state. When this happens, note that it is meaningless. It is a serious academic offence. No
marks are awarded for it. You have been advised. We now move to the box on p.294.

3.3.2.3 Durbin-Watson test for autocorrelation


Positive or negative autocorrelation test
1. H0 : The error terms are not autocorrelated.
vs
Ha : The error terms are positively or negatively autocorrelated.
2. Calculate d.
3. Set , the level of significance (usually 1% or 5%)
4. Read off the value from the table, Table A5 Bowerman et al. (2005: 598)
5.

Make a decision based on the rule:


5.1 Reject H0 if d < dL, or if (4 d) < dL,
2

5.2 Do not reject H0 if d < dL, or if (4 d) < dL,


2

5.3 The test fails if dL, d dU, or if dL, (4 d) dU,


2

The pattern is similar for the three tests. We note that the steps are the same for all three different
statistical hypothesis tests. There is one possibility when the test does not give a clue. We say that
the test is inconclusive, or the test fails.
ACTIVITY 3.9
Use the DW test to determine if the following residuals are positively or negatively AR(1). For
arguments sake, assume that the model used from where these residuals were derived was to
the fifth power. Use = 0.10.
Error terms: -2 7 -4 3 -9 14 -4 1 -5 3 -1 2 -5 3 -9 5 -2 7 -1

71

STA2604/1

DISCUSSION OF ACTIVITY 3.9


1.

H0 : The error terms are not autocorrelated.


vs
Ha : The error terms are negatively or positively autocorrelated.

2.

d=

3.

= 0.10

4.

k = 5, n = 19 and thus dL,0.05 = 0.75 and dU,0.05 = 2.02.

5.

(a) We are tempted to reject H0 since d < dL,0.05 . Must we stop here?
(b) Clearly, 4 d = 3.7042 > dU,0.05 . We also must have d > dU,0.05 .
Since this is not the case, the decision to not reject H0 is not justified.

(ei ei1 )2
605
= 0.2958
=
2045
e2i

Conclusion: We reject H0 .

3.4 Seasonal variation types


Bowerman, page 295 discusses this topic. A times series is discussed in terms of itself, not in terms
of the residuals. We are limited to graphical plots of a time series to identify according to the types
of seasonal variation.
Seasonality was defined in the earlier units. Two types of seasonal variation are defined, namely
constant seasonal variation and increasing seasonal variation.
Before we get to the real stuff, recall the patterns of fluctuations in waves. Waves have peaks
and troughs, like the sine and cosine curves. The magnitude of the fluctuation in these patterns
is indicated by the minimum and maximum levels that peaks and troughs can reach. A swing is a
fluctuation shown by peaks and troughs.
Seasonal variation is a component of a time series which is defined as the repetitive and predictable
movement around the trend line in one year or less. It is detected by measuring time intervals in small
units, such as days, weeks months or quarters. Organizations facing seasonal variations, like the
motor vehicle industry is often interested in knowing their relative performance to the normal seasonal
variation. Same is with the Department of Labour in South Africa employment which expects
unemployment to increase in December (maybe even January to March) because recent graduates
are just arriving into the market and also schools have also been given a vacation for the summer.
The main point is whether the increase is more or less than expected. Organizations affected by
seasonal variation need to identify and measure this seasonality to help with planning for temporary
increases or decreases in labor requirements, inventory, training, periodic maintenance, and so forth.

72
Apart from these the organizations need to know if the seasonal variation they experience at more
or less than the average rate.
Reasons for studying seasonal variation
There are three main reasons for studying seasonal variation.
1. The description of the seasonal effect provides a better understanding of the impact this
component has upon a particular series.
2. After establishing the seasonal pattern methods can be implemented to eliminate it from the timeseries to study the effect of other components such as cyclical and irregular variations. This
elimination of the seasonal effect is referred to as deseasonalising or seasonal adjustment of
data.
3. To project the past patterns into the future knowledge of the seasonal variations is a must.
4. Prediction of the future trend.

Assumptions
A decision maker or analyst must select one of the following assumptions when treating the seasonal
component:
1. The impact of the seasonal component is constant from year to year.
2. The seasonal effect is changing slightly from year to year.
3. The impact of the seasonal influence is changing dramatically.

Seasonal Index
Seasonal variation is measured in terms of an index, called seasonal index. It is an average that
indicates the percentage of an actual observation relative to what it would be if no seasonal variation
in a particular period is present. It is attached to each period of the time series within a year. This
implies that if monthly data are considered there are 12 separate seasonal; indexes, one for each
month and 4 separate indexes for quarterly data. The following methods are used to calculate
seasonal indices to measure seasonal variations of a time-series data.
1. Method of simple averages
2. Ratio to trend method
3. Ratio-to-moving average method
4. Link relatives method

In this module you will be required to developed forecasts by focusing on only two of these methods,
namely; method of simple averages and ratio-to-moving average method.

73

STA2604/1

An example
Now let us try to understand the measurement of seasonal variation by using the Ratio-to-Moving
Average method. This technique provides an index to measure the degree of the seasonal variation
in a time series. The index is based on a mean of 100, with the degree of seasonality measured by
variations away from the base. For example if we observe the hotel rentals in a winter resort, we find
that the winter quarter index is 124. The value 124 indicates that 124 percent of the average quarterly
rental occurs in winter. If the hotel management records 1436 rentals for the whole of last year, then
the average quarterly rental would be 359(1436/4). As the winter-quarter index is 124, we estimate
the number of winter rentals as follows:
359 (124/100) = 445

In this example, 359 is the average quarterly rental, 124 is the winter-quarter index, and 445 the
seasonalised spring-quarter rental.
This is method is also called the percentage moving average method. In this method, the original
data values in the time-series are expressed as percentages of moving averages. The steps and the
tabulations are given below.
Steps
1. Find the centered 12 monthly (or 4 quarterly) moving averages of the original data values in the
time-series.
2. Express each original data value of the time-series as a percentage of the corresponding centered
moving average values obtained in step (1). In other words, in a multiplicative time-series model,
we get

(Original data values)/(Trend values) 100 = (T C S I)/(T C) 100 = (S I) 100

This implies that the ratioto-moving average represents the seasonal and irregular components.
3. Arrange these percentages according to months or quarter of given years. Find the averages over
all months or quarters of the given years.
4. If the sum of these indices is not 1200 (or 400 for quarterly figures), multiply then by a correction
factor = 1200/(sum of monthly indices). Otherwise, the 12 monthly averages will be considered
as seasonal indices.

74
Let us calculate the seasonal index by the ratio-to-moving average method from the following data:
Table data
Year/Quarter
2006
2007
2008
2009

I
75
86
90
100

II
60
65
72
78

III
53
53
66
72

IV
59
59
85
93

Now calculations for 4 quarterly moving averages and ratio-to-moving averages are shown in the
below table
Let Q = Quarter, MA = Moving Average, CMA Centered Moving Average, then we complete the
following table.
Year Q
y
2006 1 75
2 60

2007

2008

2009

4 MA total

53

(75 + 60 + 53 + 59) = 247

59

(60 + 53 + 59 + 86) = 258

1
2
3
4
1
2
3
4
1
2
3
4

86
65
53
59
90
72
66
85
100
78
72
92

263
263
263
267
274
287
313
323
329
335
343

4 MA

4 CMA (T)

274
= 61.75
4
258
= 64.50
4
65.75
65.75
65.75
66.74
68.50
71.75
78.25
80.75
82.25
83.75
85.75

61.75 + 64.50
= 63.125
2
64.50 + 65.75
= 65.125
2
65.125
65.125
66.245
67.62
70.125
75.00
79.50
81.50
83.00
84.75

Calculation of the Seasonal index


Year
Quarter
I
II
III
IV
2006

83.96 90.60
2007
132.05
99.81
80.01 87.25
2008
128.34
96.00
83.03 104.29
2009
120.48
92.04

T otal
380.87 287.81
247 282.14
Mean (Seasonal index) 126.96
95.94
82.33 94.05
The total for the seasonal index is 126.96 + 95.94 + 82.33 + 94.05 = 399.28
Quater
Value
400
126.96 = 127.19
I
399.28
400
95.94 = 96.11
Adjusted seasonal index
II
399.28
400
82.33 = 82.48
III
399.28
400
94.05 = 94.22
IV
399.28

y
T

100

53
100 = 83.96
63.125
59
100 = 90.60
65.125
132.00
99.81
80.01
87.25
128.34
96.00
83.02
104.29
120.48
92.04

75

STA2604/1

Now the total of seasonal averages is 399.28. Therefore the corresponding correction factor would
be 400/399.28 = 1.0018. Each seasonal average is multiplied by the correction factor 1.0018 to get
the adjusted seasonal indices as shown in the above table.
Remarks

1. In an additive time-series model, the seasonal component is estimated as S = Y (T + C + I)


Where S is for Seasonal values Y is for actual data values of the time-series T is for trend values
C is for cyclical values I is for irregular values.

2. In a multiplicative time-series model, the seasonal component is expressed in terms of ratio and
percentage as
Seasonal effect = (T S C I)/(T C I) 100 = Y /(T C I) 100
However in practice the detrending of time-series is done to arrive at S C I . This is done by

dividing both sides of Y = T S C I by trend values T so that Y /T = S C I .

3. The deseasonalized time-series data will have only trend (T ), cyclical (C) and irregular (I)
components and is expressed as:
(i) Multiplicative model: Y /S 100 = (T S C I)/S 100 = (T C I) 100.
(ii) Additive model: Y S = (T + S + C + I) S = T + C + I

3.4.1 Constant and increasing seasonal variation


If the magnitude of the seasonal swing does not depend on the level of the time series, the time series
is said to exhibit constant seasonal variation. Figure 6.13 in Bowerman displays such a time series.
Increasing seasonal variation, is displayed in Figure 6.14 because the magnitude of the seasonal
swing increases. Clearly, the magnitude of the seasonal fluctuation increases with the increasing the
level of the time series. The peaks and troughs are more distant apart on the right side of the display
than in the left.
This type of time series is more difficult to handle. We usually attempt to make it easier by using
transformation methods to make it constant. That is, we apply the transformation methods on
increasing seasonal variations to make them behave like constant seasonal variations.

ACTIVITY 3.10
Identify the types of seasonal variation for the time series:
(a) that takes the shape of an increasing linear trend
(b) described by y = ax2 + bx + c with a > 0, the y intercept is y = 1, one root is x = 1.

76
DISCUSSION OF ACTIVITY 3.10
(a) We are told that there is any seasonality. Then it will be constant seasonal variation. Look closely
at Figure 6.1, p.295 of the textbook. It is an example with this trend.
(b) You may plot a graph with these features. The description given is increasing seasonal variation.
Figure 6.1 (d), page 281 of the textbook is an example.

3.5 Use of dummy variables and trigonometric functions


Study Section 6.4, Bowerman.

We involve dummy variables and trigonometric functions to

approximate some seasonal time series. Earlier we said that a time series with constant seasonal
variation is easy to work with. In this section we introduce dummy variables and discuss time series
analysis where we model seasonal variation by using dummy variables and trigonometric functions.
Trigonometric functions include functions such as cosine, sine, tangent, secant, cosecant and
cotangent. Exposition is limited to sine and cosine functions. In addition, they are not discussed
in great depth.

3.5.1 Time series with constant seasonal variation


Every time series has a trend of some kind, increasing, decreasing or none. If in addition there is
seasonality, we determine if it is constant or increasing seasonal variation. For presenting a time
series with constant seasonal variation we use a model of the form:
yt = T Rt + SNt + t

where
yt =

the observed value of the time series at time t

T Rt =

the trend at time t

SNt =

the seasonal factor at time t

t =

the error term (irregular factor) at time t

ACTIVITY 3.11
What is the value of T Rt , the trend for a time series with no trend? Write down the above equation
when there is no trend.

77

STA2604/1

DISCUSSION OF ACTIVITY 3.11


We recall that if the trend is increasing it would be represented by positive values while a decreasing
trend is by negative values. No trend means neutral and that is zero. Hence, the above model
collapses to yt = SNt + t .
DISCUSSION OF THE MODEL
The model implies that the time series with constant seasonal variation can be written as an average
level t together with random fluctuations (as the error term t ). These random fluctuations cause
the observations to deviate from the average level. The average level changes over time according
to the equation:
t = T Rt + SNt

Now, the error term t is a random variable. The assumption made about the error term is that it
satisfies the usual regression assumptions. Thus, we assume that the error terms have a constant
variance, are identically and independently distributed (IID) with a normal distribution. There is also
a further implication that the magnitude of the seasonal swing is independent of the trend. Let trt
and snt be estimates of T Rt and SNt , respectively. Then the estimate of yt is:
yt = trt + snt

Seasonality is a somewhat complex part in a time series. In the next section we use dummy variables
to model seasonality.

3.5.2 Use of dummy variables


Seasonality of time series defines the seasons to be used. It is possible that a time series can be
studied from observations that are collected at different times of the day. An example is a pancake
vendor who confirms that sales are very high in the morning, low during the day and slightly higher in
the afternoon. Here the seasons are the times of the day, and they are three in this case. If we study
a time series collecting data over a five-day week, the seasons are five. For some activities we may
use a seven-day week, allowing seasons to be seven. If we use quarters of a year, there are four
seasons. A common tendency is to use months, in which case there will be twelve seasons. This
simply means that the number of seasons will differ from situation to situation, mainly depending on
data colletion pattern. In order to define dummy variables, we denote the number of seasons by L.
Study the second rectangular box on p.299 of the textbook. We consider the seasonal factor SNt .
We express this factor using dummy variables as:
SNt = s1 xs1,t + s2 xs2,t + ... + s(L1) xs(L1),t

78
where the constants s1 ,

s2 ,

...,

s(L1) are called the seasonal parameters and

xs1,t , xs2,t , ...., xs(L1)t are dummy variables defined as:

1 if time period t is season 1


xs1,t =
0 otherwise
xs2,t =

xs(L1),t =

1 if time period t is season 2


0 otherwise

1 if time period t is season (L 1)


0 otherwise

3.5.3 High season and low season


This topic is not in the textbook. When seasonal data are studied, there is always a set of values that
are high and others that are low. The seasonal parameter last season has been arbitrarily assigned
the value zero. The other parameters are defined with respect to the one that we set at 0. That is,
take the last parameter to be the one against which all others are gauged.
It is important that when we interpret seasonality, we exclude any contribution made by the trend.
Thus, in simple terms, when we obtain a negative i , then we know that the time series at the ith
season is lower than the level of the time series in the last season. The season is categorised as a
low season. Similarly, if we obtain a positive j , then we know that the time series at the j th season is
higher than the level of the time series in the last season. Such a season is called a high season. To
identify high seasons and low seasons from graphs we look for troughs and peaks. Troughs indicate
low seasons and peaks indicate high seasons.

NOTEWORTHY POINTS
One of the season parameters has to be set at 0, and not necessarily the last one. However, it

is often more convenient to set the last one as we did. If we fail to set one of them to zero, least
squares estimation may prove to be complex or require an unusual approach.

The dummy variable model is based on the time series that display constant seasonal variation.

It is also common to refer to constant seasonal variation as additive seasonal variation. We


often apply transformation methods to a time series that shows increasing seasonal variation to
equalise the seasonal variation before using dummy variables.

Let us discuss Example 6.7 briefly.

79

STA2604/1

DISCUSSION OF EXERCISE 6.7, p.300


Since the model is seasonal with 12 seasons, the values of t are related to the various months. For
example, when September is mentioned, we substitute the value t = 9 in the formula. We note also
that other terms will vanish. Why?
If you noted, the example used the logarithm transformation. That is, yt = ln yt . In the example the
seasonal parameter for January is 1 while for November is 12 . The example wanted a forecast for
January of the fifteenth year. Once we have decided on the month, we know that we will use the
model:
yt = 0 + 1 t + 2 M1 + 3 M2 + ... + 12 M11 + t
= 0 + 1 t + 2 (1) + 3 (0) + ... + 12 (0) + t
= 0 + 1 t + 2 + t

With the estimates being completed, the model we will use to forecast the averages for January is:
yt = 0 + 1 t + 2
= 6.28756 + 0.00276t 0.04161

Now for January of the 15th year we note that 14 years include January of year 1 up to December
of year 14, which makes t = 168 months (14 12 months). Thus, for January of year 15 we have
t = 169. The forecast required is therefore:

y169
= 6.28756 + 0.00276 (169) 0.04161

= 6.71239

Since these were transformed data, the required value is y169 = e6.71239 = 822.534. It is as simple as
this.

FURTHER DISCUSSION ON THE EXAMPLE


If you were to predict the monthly hotel average for the July of the seventh year, then we mean
t = [(6 12) + 7] = 79. The equation to be used for July is:
yt = 0 + 1 t + 3

and for July of the seventh year:

= 6.28756 + 0.00273t 0.08446

y79
= 0 + 1t + 3

= 6.28756 + 0.00273 (79) 0.08446


= 6.41877

The actual prediction required is therefore y79 = e6.41877 = 613.248.

80
ACTIVITY 3.12
Consider the model:
yt = 5 2M1 + 4M2 + 3M3 14M4 + M5 + t

(a) Identify the:


(i) Number of seasons. Provide any necessary explanation.
(ii) Trend term. Provide its nature.
(iii) Low seasons (and justify your choice).
(iv) High seasons (and justify your choice).
(b) Present models for each season.
(c) For simplicity and practicality, let us assume that the above model is based on a six-day week.
Prepare forecasts for:
(i) The Saturday of the first week
(ii) The Thursday of the fifth week
(iii) The Friday of the third week
(iv) The Monday of the tenth week

DISCUSSION OF ACTIVITY 3.12


(a) (i) We see five seasons being displayed in the model. This means that the sixth seasonal
parameter has been set to 0. Thus, six seasons are involved.
(ii) The trend term comes only from 5, which is a constant term. Therefore there is no trend.
(iii) The first and the fourth seasons are low seasons because they have negative seasonal
parameters.
(iv) The second, third and fifth season are high because they have positive seasonal parameters.
(b) The models for the various seasons are:
First season y1 = 5 2M1 + 4 (0) + 3 (0) 14 (0) + (0) + t
= 5 2M1 + t

Second season y2 = 5 2 (0) + 4M2 + 3 (0) 14 (0) + (0) + t


= 5 + 4M2 + t

81

STA2604/1

Third season y3 = 5 2 (0) + 4 (0) + 3M3 14 (0) + (0) + t


= 5 + 3M3 + t

Fourth season y4 = 5 2 (0) + 4 (0) + 3 (0) 14M4 + (0) + t


= 5 14M4 + t

Fifth season y5 = 5 2 (0) + 4 (0) + 3 (0) 14 (0) + M5 + t


= 5 + M5 + t

Sixth season y6 = 5 (0) + 4 (0) + 3 (0) 14 (0) + (0) + t


= 5 + t

(c) (i) The Saturday of the first week


Saturday coincides with the sixth season. In the first week this is t = 6.
y6 = 5

(ii) The Thursday of the fifth week


Thursday is what was the fourth season in the statements given. For Thursday of the fourth
week, t = [(4 6) + 4] = 28. This means M4 = 1. Hence,
y28 = 5 14 = 9

(iii) The Friday of the third week


Friday is the fifth season, and in the third week, the Friday is t = [(2 6) + 5] . This means

M5 = 1.

y17 = 5 + 1 = 6

(iii) The Monday of the tenth week


Monday is the first season. The Monday of the tenth week is t = 55. The forecast for this
Monday is:
y55 = 5 2 = 3

Note that the trend was given by T Rt = 5. This is a constant, which effectively implies that there is
no trend. We next look at the use of trigonometric functions.

82

3.5.4 Use of trigonometry in a model with a linear trend


Study the box on p.302 of the textbook. It is common that trigonometric terms are incorporated in
a time series regression model that shows either constant or increasing seasonal variation. The
general form of such incorporation is:
yt = T Rt + f (t) + t

where
f (t) = an expression of trigonometric functions at time t

Let us assume a linear trend and suppose that:


f (t) = t sin

2t
L

+ 3 cos

2t
L

then one application of trigonometric model for constant variation is:


yt = 0 + 1 t + t sin

2t
L

+ 3 cos

2t
L

+ t .

You may experiment with various values of t, but attend to the next exercise.

ACTIVITY 3.13
Simplify the model when:
(i) t = L
L
(ii) t =
2

DISCUSSION OF ACTIVITY 3.13


(i) yt = 0 + 1 t + 3 + t
(ii) yt = 0 + 1 t 3 + t
On the other hand, if we have:
f (t) = t sin

2t
L

+ 3 cos

2t
L

+ 4 sin

4t
L

+ 5 cos

4t
L

then we end with:


yt = 0 + 1 t + t sin

2t
L

+ 3 cos

2t
L

+ 4 sin

4t
L

+ 5 cos

4t
L

83

STA2604/1

ACTIVITY 3.14
Simplify the model when:
(i) t = L
(ii) t =

L
2

(iii) t =

L
4

DISCUSSION OF ACTIVITY 3.14


Appropriate substitutions give:
(i) yt = 0 + 1 t + 3 + 5 + t
(ii) yt = 0 + 1 t 3 + 5 + t
(iii) yt = 0 + 1 t + t 5 + t

3.6 Growth curve models


The models we discussed so far described trend and seasonal effects using deterministic functions
of time that are linear in the parameters. In this section we discuss the growth curve model, which is
one of the models that are not linear in the parameters. Using the usual notation, the growth curve
model is:
yt = 0 + t1 t

This is a complicated model that we are not going to strive to loosen.

However, since the

decomposition of time series may be multiplicative, we may end up with a form of a time series
that resembles growth curves. The question will be How do we handle it? since linear forms are
easier to handle, we will transform these to linear forms. If we assume that the parameters are
positive, we can transform the growth model to become:
ln yt = ln 0 + (ln 1 ) t + ln t

which can be written as:


yt = 0 + 1 t + t

This is a familiar form once you understand how the transformation is done. We are allowed to work
on the transformed data and reverse the answers using inverse of the transformation used.

84

3.7 AR(1) and AR(p)


In section 6.6, Bowerman, AR(1) is discussed. The first contact with the first-order autoregressive
models, or AR(1), was with the Durbin-Watson (DW) statistic. When the error terms for a regression
model are autocorrelated, the model is not adequate and the autocorrelation should be modelled.
The AR(1) is a special case of the general autoregressive process of order p given by:
t = 1 t1 + 2 t2 + ... + p tp +

ACTIVITY 3.15
Write a model for AR(2), a second-order autoregressive model.

DISCUSSION OF ACTIVITY 3.15


The activity requires AR(p) when p = 2, which is:
t = 1 t1 + 2 t2 +

3.8 Use of trend and seasonality in forecast development


Make sure that your mathematical model for forecasting reflect these components if they are
supposed to be there. Also, you will have to modify the model appropriately when you have decided
on the exact period when the forecast is needed. For example, if you know that it is the fourth quarter,
of the ninth month, write the model to suit that period before making any substitutions. To make sure
that you have the right period, your polynomial as a function of t should suit the given time period. As
an example, starting from January of the current year, to evaluate y20 when time is given as monthly
would imply august of the following year. This means that the equation used should be suitable for
august months. On the other hand, if quarters are used, starting from the current year, y20 would
mean the fourth quarter of the fifth year. These are the values of t on equations for such predictions.
On the other hand, if you deal with months, and your equation is given as a function t, any future
time given should be converted into months. For example, if you are required to predict a value for
February of the fourth year from the current year you should be able to determine that t = 38. Or, if
you are to find a prediction for the third quarter of the seventh year, you should be able to evaluate
that t = 31 for that prediction on the quarterly model. The appropriate forecasts are developed by
just substituting and solving the appropriate equations for those predictions.

85

STA2604/1

3.9 Conclusion
This unit introduced important aspects of time series. It used graphical plots to demonstrate some
patterns, incorporated some applications of estimation in parts of the unit. Trend and seasonality
were discussed. The AR(1) process was also introduced, and the DW statistic used to detect
autocorrelation of the AR(1) process. Two types of seasonal variation, constant and increasing
seasonal variations were discussed. Dummy variables and trigonometric ratios were also discussed.
Growth models were introduced. We are ready for the next unit.

86

UNIT 4: Decomposition of a time series


Outcome table for the study unit
Outcomes - At the end
of the module you
should be able to
- identify time series
components
- compute the trend

- deseasonalise a
time series

- forecast future
values of a
time series

Assessment
- describe, or
unpack a
series
- determine
moving
averages
- determine
seasonal
indices
- develop
forecasts

Content

Activities

- decompose
a time series

- work out
exercises

- MA,
trend analysis

- describe the
data trend

- MA,
seasonal indices
- centered MA

- isolate trend
and
seasonality
- incorporate
the trend in
the forecast

Feedback
- discuss the
activity
- discuss the
activities
- discuss the
activities
- discuss the
activities

4.1 Introduction
This is a continuation of concepts introduced in earlier chapters. Components of a time series should
now be at your fingertips. Are they? This unit deals with the decomposition of a time series, which
aims to isolate the influence of each of the components on the actual time series. It is presented as
Chapter 7 in the prescribed textbook.
Decomposition of time series is an important technique for all types of time, especially for seasonal
adjustment. It seeks to construct, from an observed time series, a number of component series (that
could be used to reconstruct the original time series by additions or multiplications) where each of
these has a certain characteristic or type of behaviour.
ACTIVITY 4.1
State the components of a time series.
DISCUSSION OF ACTIVITY 4.1
The answer you can find on page 325 of Bowerman, i.e., if you already forgot.
The components into which time series can be decomposed into are:
the Trend Component Tt that reflects the long term progression of the series
the Cyclical Component Ct that describes repeated but non-periodic fluctuations, possibly caused

by the economic cycle

87

STA2604/1

the Seasonal Component St reflecting seasonality (Seasonal variation)


the Irregular Component It (or noise) that describes random, irregular influences. Compared to

the other components it represents the residuals of the time series.

The above points clarify the activity looked at earlier.

4.2 Multiplicative decomposition


Multiplicative decomposition is applicable when a time series displays increase in the amplitude of
both the seasonal and irregular variations as the level of the trend rises.
Study Section 7.1, p.326 of the textbook. Multiplicative decomposition of a time series model grants
that the actual values of a time series yt be presented as a product of the trend component T Rt ,
a seasonal index SNt , a cyclical index CLt and an irregular measure IRt . The trend component
measured in actual units, the cyclical index is then expressed relative to the trend, and the seasonal
index is expressed relative to the trend and the cyclical index. Thus, the multiplicative decomposition
model is:
yt = T Rt SNt CLt IRt

When a time series exhibits increasing seasonal variation, it is represented in this form. Statistical
analysis is useful for effective isolation and analysis of the trend and the seasonal components.
Hence, we will examine statistical approaches to quantify trend and seasonal variations. These are
the components that usually account for a significant proportion of the actual values in a time series.
Isolating them is an opportunity to explain the actual time series values.

4.2.1 Trend analysis


Trend is discussed briefly on p.326 of the textbook. This discussion uses moving averages to analyse
trend. When averaging out the short-term fluctuations in a time series, the trend is identified. Either
a smooth curve or a straight line would emerge. Earlier we discussed time series regression. It is
one method used to isolate trend. The other is by use of moving averages (MAs). In this section we
will discuss the MA.

The term trend may be seen as a tendency or resulting behaviour of occurrence of something
observed over a long term. In a nutshell, trend analysis is a term referring to the concept of
collecting information and attempting to spot a pattern, or trend, in the information. In some fields of
study, the term trend analysis has more formally-defined meanings.

88
For example, in project management, trend analysis is a mathematical technique that uses historical
results to predict future outcome. This is achieved by tracking variances in cost and schedule
performance. In this context, it is a project management quality control tool.
Although trend analysis is often used to predict future events, it could be used to estimate uncertain
events in the past, such as how many ancient kings probably ruled between two dates, based on
data such as the average years which other known kings reigned.
Moving Average (MA)
A MA is a successive averaging of groups of observations as explained in Bowerman. The number
n of observations averaged in each group must be the same throughout. It is determined by the
number of periods that span the short-term fluctuations. MA removes the short-term fluctuations in
a time series, and it smooths it.
We explain the MA for a 3-period MA. The following steps are involved:
Add observations for the first three periods and find their average. Place the answer opposite the

middle time period, i.e. opposite the second measurement.

Remove the observation for the earliest period and replace it by the fourth measurement. Obtain

the new average and place it opposite the third measurement.

Repeat the process until you do not have enough observations to produce a MA of three periods.

Note that the above illustration used a case where you will be able to place the MA next to a middle
observation in the first average. The same will be easy when a 5-period MA is needed, or a 7-period
one. That is, for odd number MA we will not struggle to place the MA in the middle. There will be
practical cases where we need to use 2-period MA, 4-period MA, and so on. Study the examples in
Bowerman.

ACTIVITY 4.2
Consider the following data:
170, 140,

230, 176, 152, 233, 182, 161, 242

They were collected for three days over the regular time periods 812 noon, 124 p.m. and 48 p.m.
Calculate appropriate moving averages and explain the trend of the data.

DISCUSSION OF ACTIVITY 4.2


We can call these times morning, afternoon and evening for convenience. It seems obvious to use a
3-period MAs. According to the guideline given, we start with averages per day:

89

STA2604/1
Average

Day 1

Day 2

Day 3

Morning
Afternoon
Evening

170
140
230

540/3 = 180

Morning
Afternoon
Evening

176
152
233

561/3 = 187

Morning
Afternoon
Evening

182
161
242

585/3 = 195

The average for each day has been placed opposite the midpoint of that day, i.e., the afternoon
period.

We need a trend figure for every period, not just for the afternoons.

It is not yet a

clearly moving average. We make them move by removing oldest and replacing with the newest
observations. The table becomes:
Moving average = trend
Day 1

Morning

170

Afternoon

140

Evening

230

Day 2

Morning
Afternoon
Evening

176
152
233

186
187
189

Day 3

Morning
Afternoon
Evening

182
161
242

192
195

170 + 140 + 230


= 180
3
140 + 230 + 176
= 182
3

Now, we answer the question about the trend. We note that the MAs are clearly increasing. This
simply informs us that on average, the above observations are increasing. Hence, we have an
increasing trend.

4.2.2 Seasonal analysis


Seasonal analysis isolates the influence of forces that are due to seasonality in a time series. Useful
methods known in time series analysis for dealing with seasonality include the ratio-to-movingaverage method. It measures seasonal influences using index numbers, which are percentage
deviations from actual values of the series from base values. In this module we use seasonal
variation analysis to accomplish the task.
The steps for quantifying the seasonal variation are:
Calculate seasonal deviations by subtracting MAs from the corresponding observations (Actual

less Trend)

90
Group the deviations according to season
Average the deviations in the groups

ACTIVITY 4.3
Use the data in the previous activity to estimate seasonal deviations.

DISCUSSION OF ACTIVITY 4.3


We use the table above to calculate according to first instruction directive.
Step 1
Day 1

Morning
Afternoon
Evening

Actual minus trend

40
48

Day 2

Morning
Afternoon
Evening

10
35
44

Day 3

Morning
Afternoon
Evening

10
34

Due to random influences, values for the same periods differ. But it is clear that there is a common
pattern. For example, afternoon values of Actual Trend are similar in size and sign (40, 35, 34).
The same is true about the evening figures of 48 and 44; and with luck (probably due to the fact that
the sample size is very small) the morning figures are both 10. If in our analysis this pattern fails
to be visible, we either check our calculations, or we may have picked the wrong number of periods
over which to average in the first place.
In order to eliminate the random effect, we follow the instructions according to the second and third
bullets. That is, we collect together the Actual Trend corresponding to each period of the day and
find out what the average variation is for each period. See the next table:
Step 2
Day 1
Day 2
Day 3

Morning

10
10

Afternoon
40
35
34

Evening
48
44

Total
Average

20
10

109
36

92
46

The figures 10, 36 and 46 are called seasonal variations for the morning, afternoon and evening
periods. We have now isolated the trend and seasonal effects present in the time series. Knowledge
of seasonal effects is important for forecasting as well as for removing strong seasonal effects that
may conceal other important features or movements in a data set.

91

STA2604/1

Knowledge that there are random variations is of no use in forecasting, but being essentially
unpredictable, they serve as a guide to the reliability of a forecast. When there are very small random
influences a process is likely to produce reliable forecasts while large fluctuations may completely
upset even the carefully calculated forecasts.

4.2.3 Analysis of random variations in a time series


Our next step in analysing the time series is to extract the random variations from the figures. We use
the above example to illustrate the approach. A random variation is anything that is not accounted
for either by trend or by seasonal effects. The starting point is to work out what the figures would
have been if the trend and seasonal effects had operated in the absence of random influences.
We note that for the afternoon of day 1, we had a trend figure of 180. The trend values for afternoons,
however, are 36 lower, on the whole, than the trend. This would reduce the day 1 afternoon figure to
144. What actually occurred was 140, so some random effect reduced what we might have expected
to occur by 4 units. Therefore the random variation in this period was 4.
Proceeding in a similar manner, we obtain:
Actual
Expected (trend + seasonal)
Random (actual expected)

140
144
4

230
228
2

176
176
0

152
151
1

233
235
2

182
182
0

161
159
2

The random variations are very small, the largest in magnitude coming from 4, which is 4/180 or
about 2 percent of the corresponding trend figure. This means that any forecast obtained from this
analysis may be expected to be reasonably reliable. The data fit the pattern of trend plus seasonal
variation reasonably well. Once these issues are in order, we need to develop forecasts. The next
section is dedicated to forecasting.

4.2.4 Obtaining a forecast


The central purpose of time series is to develop forecasts. In this case the trend and seasonal effects
are used for this purpose. The broad principle in developing a forecast is to use:
period
period
yt+k = T Rt+k
+ SNadj

where
yt+k =

required forecast

period
T Rt+k
= trend for the period in question
period
SNadj
= appropriate seasonal adjustment

92
Thus, if a forecast is required for the afternoon of day 4, it would consist of two parts, namely:

Forecast = trend for day 4 afternoon + seasonal adjustment for afternoon period

Inspection of the trend figures shows that between Day 1 afternoon and Day 3 afternoon the trend
stretched from 180 to 195. This is an increase of 15, which happened during the lapse of 6 periods.
On average therefore, this means that the increase has been 2.5 units per period. We will assume
that this rate is going to continue to apply at least for the next few periods.
Now, this increase gives 197.5 as the trend figure for the evening of day 3; 200 for the morning of
day 4 and 202.5 for the afternoon of day 4. Adjusting this downwards by the seasonal variation of
36, this produces a forecast of 166.5, or 167 to the nearest unit.

ACTIVITY 4.4
The following data represent sales of pies on thousands in the various quarters of a year. Illustrate
by analysing the data to isolate trend and seasonality. Develop some forecasts for illustration.

Quarter
Value

1
142

2
54

3
162

4
206

1
130

2
50

3
174

4
198

1
126

2
42

3
162

4
186

DISCUSSION OF ACTIVITY 4.4


All the procedures for analysing time series with seasonal patterns have been explained. The
appropriate number of periods for the moving average is clearly four, so the first moving average
is:

142 + 54 + 162 + 206


= 141
4

This figure obviously belongs to the middle of the first year, which comes halfway in between the
second and third quarters. The table follows at the bottom of this explanation. This will be true
with other moving averages too. Therefore we need an additional step called centering the moving
averages. The results of this centering give rise to the centred moving averages (CMAs).
The moving average 141 applies to a point halfway between the second and the third quarters of
year 2001, while the figure 138 applies midway between the third and fourth quarters, we can obtain
the moving averages directly comparable with the fourth quarter by taking the average of 141 and
138, which is 139.5. Doing the same for all moving averages, we obtain:

93

STA2604/1

Quarter
1
2
3
4

Actual
142.00
54.00
162.00
206.00

Moving average

Centred moving average (trend)

Actual minus trend

141.00
138.00

139.50
137.50

22.50
68.50

1
2
3
4

130.00
50.00
174.00
198.00

137.00
140.00
138.00
137.00

138.50
139.00
137.50
136.00

-8.50
-89.00
36.50
62.00

1
2
3
4

126.00
42.00
162.00
186.00

135.00
132.00
129.00

133.50
130.50

-7.50
-88.50

We average the quarterly variations and round them up so that they add to zero.
2001
2002
2003
Total
Seasonal (Average)

Quarter 1
-8.5
-7.5

Quarter 2
-89.0
-88.5

Quarter 3
22.5
36.5
-

Quarter 4
68.5
62.0
-

-16
-8

-177.5
-88.75

59
29.5

130.5
65.25

Let Q = quarter, MA = moving average, CMA = centered MA, then we complete the table thus:
Q
1
2
3
4

Actual
142.00
54.00
162.00
206.00

MA

CMA (trend)

Actual minus trend

Expected

Random

141.00
138.00

139.50
137.50

22.50
68.50

169.5
203.5

-7.5
2.5

1
2
3
4

130.00
50.00
174.00
198.00

137.00
140.00
138.00
137.00

138.50
139.00
137.50
136.00

-8.5
-89.00
36.50
62.00

130.5
51.0
167.5
202.0

-0.5
-1.0
6.5
-4.0

1
2
3
4

126.00
42.00
162.00
186.00

135.00
132.00
129.00

133.50
130.50

-7.50
-88.50

126.0
42.5

0.5
-0.5

The greatest random variation is 7.5, only about 5% of the corresponding trend value of 139.5. Hence
the data fit the model well and the analysis would thus provide reliable forecasts.
From third quarter 2002 there has been a steady downward trend. Over the time from then to the
latest trend figure available (third quarter 2003) the trend declined from 139.5 to 130.5; that is, over
four quarters or one year its decrease was 8.5. In first quarter 2003 the trend value was 133.5. If
we assume that the annual decrease is going to persist at least for a while then we would expect the
trend in first quarter 2004 to be 133.5 7.5, or 126.

94
We now adjust this to allow for the fact that the spring quarter is, on average, 8 below trend, giving a
final forecast of 126 8 = 118 for first quarter 2004.

EXERCISES
(a) The following figures show the weekly demand at an electrical repair workshop for a certain type
of connector over a 10-week period:
Week number
Number demanded

1
27

2
23

3
23

4
25

5
26

6
29

7
25

8
221

9
22

10
24

(i) Obtain a forecast for week 11.


(ii) Comment on seasonality and trend of the above data.
(iii) Assume now that these data were collected in months 1, months 2 and the first two weeks of
month 3.
Perform appropriate calculations using moving averages to estimate forecasts for week

11.

Compare with the forecast you derived at (i).

(b) Consider the average number of calls received per day at a Computer Club Warehouse (CCW)
call centre for the past three years. You will also realise that the pattern of the call volumes can
be of help in the analysis. What can one observe by merely looking at these figures? Perform
appropriate time series analysis. The next table presents the data.
Year
1
1
1
1
2
2
2
2
3
3
3
3

Quarter
1
2
3
4
1
2
3
4
1
2
3
4

Call volume
6809
6465
6569
8266
7257
7064
7784
8724
6992
6822
7949
9650

4.3 Additive decomposition


Study from p.338 of the textbook. A time series that exhibits constant seasonal variation is presented
by using an additive decomposition model given by:
yt = T Rt + SNt + CLt + IRt

95

STA2604/1

with the known notation. The book enlightens that the centered moving average is estimates of
T Rt + CLt . In the last section we calculated some moving averages. We make an additional

illustration of this point.


Additive decomposition is applicable in a time series that displays constant (i.e. non-changing)
amplitude of both the seasonal and irregular variations.
ACTIVITY 4.5
Explore the following time series analysis and develop some few forecasts.
Year
2001

Season
Spring
Summer
Autumn
Winter

Number sold
142
54
162
206

2002

Spring
Summer
Autumn
Winter

130
50
174
198

2003

Spring
Summer
Autumn
Winter

126
42
162
186

Estimate the T Rt + CLt .

DISCUSSION OF ACTIVITY 4.5


Attempt the exercise following the same procedure as in Activity 4.4.

4.4 Conclusion
Decomposition was done in multiplicative and additive forms. Experimentation was then done with
various data sets. As we saw, it is just straightforward use of concepts as defined. Some additional
notes to cover explanations in the book have also been provided, we hope it makes the work easier.
We encourage you to do more exercises in the prescribed book to get used to the methods. What is
left is to continue by doing more exercises in the prescribed book.

96

UNIT 5: Exponential smoothing


Outcome table for the study unit
Outcomes - At the end
of the module you
should be able to
- explain methods of
smoothing

Assessment
- analyse data

- perform simple
exponential

- explore data
with various
smoothing
constants

- monitor the
forecasting system

- measure the
strength of
forecasts

- know various
smoothing
approaches

- determine
aptness of
various
methods

- forecast future
values of a
time series

- develop
forecast
values

Content
- exponential
- smoothing
constants
- damped trend
- simple
exponential
soothing
- tracking signals

- Holts trend
corrected
smoothing
- Holt-Winters
method
- Damped
trend
method
- Holts, HoltWinters and
damped trend

Activities
- perform
appropriate
calculations
- perform
calculations
- interpret
data
- calculate the
statistics
- interpret
them
- perform apt
calcuations
for each
method

- perform
calculations

Feedback
- discuss
likely
errors
- explain
alternative
methods
- discuss the
solutions

- discuss the
solutions

- discuss the
solutions

5.1 Introduction
Changing trend and seasonality of a time series over time makes forecasting difficult to undertake.
This is when exponential smoothing becomes useful. Exponential smoothing is presented in Chapter
8 of Bowerman. Smoothing constants are used to smooth a rough time series. In this module we
study various smoothing methods, and a tracking method to monitor the process. The methods are
simple exponential, Holts trend corrected exponential, Holt-Winters, and damped trend exponential.
A common way to characterise exponential smoothing is to consider it as a technique that can be
applied to time series data, either to produce smoothed data that are to be presented, or to develop
forecasts. The observed phenomenon may be an essentially random process, or it may be an
orderly, but noisy, process. Different smoothing techniques are available as presented in this unit,
and each one for a specific purpose. For example, simple moving average is one in which the past
observations are weighted equally, and exponential smoothing is one which assigns exponentially
decreasing weights over time. Exponential smoothing is commonly applied to financial market and
economic data, but it can be used with any discrete set of repeated measurements.

97

STA2604/1

5.2 Simple exponential smoothing


This method is used when data pattern is horizontal (i.e., there is neither cyclic variation nor trend in
the historical data). Let us first explore the following model.
The model
yt = 0 + t .

is used for forecasting when there is no trend or seasonal pattern and the mean of the time series
remains constant.
In some statistics textbooks, this equation is commonly written as yt = + t .
The least squares point estimates of the mean 0 is b0 = y, where:
y=

1 n
yt .
n i=1

Do you remember this formula? Equal weights are given to each observation as
y=

n 1
1 n
yt .
yt =
n i=1
i=1 n

Under these conditions, we require a model that would describe the data more suitably, and estimates
for the mean that may change from one time period to the next. SES is one such method; it does not
use equal weights. Instead, more recent observations are given more weight.

ACTIVITY 5.1
Indicate True or False for each the following statements about SES. In case of False, correct the
statement. Justify the correct statements.
(1) Estimate of the mean is constant.
(2) Estimate of the mean changes over time.
(3) Oldest observations receive the most weight.
(4) Newest observations receive the average weight.

98
DISCUSSION OF ACTIVITY 5.1
(1) Estimate of the mean is constant.
The statement is false. SES is used because of its changing mean to suit the no trend model that
has a mean that changes slowly over time.
(2) Estimate of the mean changes over time.
True. The nature (or formulation) of SES is such that it caters for a mean that changes over time.
(3) Oldest observations receive the most weight.
False. Oldest observations receive the least weight.
(4) Newest observations receive the average weight.
False. They receive largest weights.

We release you from suspense and define SES formally. Let y1 , y2 , ..., yn be a time series with a
mean that is changing slowly over time but having neither a trend nor seasonal pattern. Then the
estimate for the level (or mean) of the time series in period T is:
T

= yT + (1 )

T 1

where
=
T 1

smoothing constant between 0 and 1


estimate of the level (or mean) of the time series at time T 1

The value of determines the degree of smoothing and how responsive the model is to fluctuation in
the time-series data. This value is arbitrary and is determined both by the nature of the data and the
sensitivity of the forecaster as to what constitutes a good response rate. A smoothing constant close
to zero leads to a stable model while a constant close to one is highly reactive. Typically, constant
values between 0.01 and 0.3 are used.
Let us illustrate with the data we have seen before in order to feel comfortable at the early stage of
SES exploration.

DISCUSSION OF EXAMPLE 8.1


We discuss Example 8.1 in Bowerman. There we revisit the cod catch data that were discussed in
Unit 3.
Month t :
Cod catch:

1
362

2
381

3
317

4
297

5
399

6
402

7
375

8
349

9
386

10
328

11
389

12
343

Month t :
Cod catch:

13
276

14
334

15
394

16
334

17
384

18
314

19
344

20
337

21
345

22
362

23
314

24
365

99

STA2604/1

If you recall, the plots for these data showed that the data had no trend and no seasonality. The initial
estimate of the level is

0.

Why? It is easy to estimate it since we know that it is the sample mean,

which is:
0

1 12
1
(362 + 381 + 317 + ... + 343)
yt =
12 t=1
12

1
(4329) = 360.6667
12

In order to illustrate, we use = 0.1. Does it satisfy the given restriction? We want to explore by
determining levels from these data.
1

= y1 + (1 )

= (0.1) (362) + (0.9) (360.6667)


= 360.8000
2

= y2 + (1 )

= (0.1) (381) + (0.9) (360.8000)


= 362.8200

These can be calculated further to

24 .

Forecast errors can be calculated for all these mean levels.

Do you remember the forecast errors? These are shown in Figure 8.1, Bowerman et al. (2005: 348).
In SES, a point forecast at time T of any future value yT + is the last estimate

for the mean of the

time series. Why should it be like this? We have just said that a point forecast made in time period T
for yT + is:
yT + =

( = 1, 2, 3, ...)

ACTIVITY 5.2
Write down the point forecast in time period t 1 of the value yt1 .
DISCUSSION OF ACTIVITY 5.2
Because of no trend and no seasonal pattern we have y (t 1) = yt1 =

t1 .

We dealt with the standard error(s) and the sum of squares for error (SSE) in the earlier chapters.
The current version is that the standard error at time T is:
T

s=

SSE
=
T 1

t=1

(yt

2
t1 )

T 1

100
For any , a 95% prediction interval computed in time period T for yT + is:
T

z[0.025] s 1 + ( 1) 2 ;

+ z[0.025] s 1 + ( 1) 2

ACTIVITY 5.3
Write down the formula for a 95% prediction interval computed in time period T for yT + when:
(i) = 1
(ii) = 2.

DISCUSSION OF ACTIVITY 5.3


Substitute the values where appropriate. Then check your answers against point number 3 on page
351 of the textbook.
We can go on and experiment with different values of , but it is a theoretical exercise more than it
is an application. For this reason, its worth is limited in this module. Let us go ahead with the cod
catch data and explore more useful examples.

DISCUSSION OF EXAMPLE 8.2


These are the cod catch data that were discussed in Unit 3. From page 349 we saw that the value
= 0.034 is a desirable smoothing constant. Hence:
24

= y24 + (1 )

23

= (0.034) (365) + (0.966) (354.1719)


= 354.5400

Therefore, for y25 and other future monthly cod catch,


y24+ =

24

= 354.5400.

For prediction intervals we need the value of the standard error. Can you show that s = 34, 95? Now
verify the given 95% prediction intervals.
The example further assumes a new observation, y24 = 384. The calculations for
should be calculated anew. These are:
25

= y25 + (1 )

24

= (0.034) (384) + (0.966) (354.5400)


= 355.5416

25

and y24+

101

STA2604/1

The point forecast made in month 25 of the cod catch in month 36 and future ones is:
y25+ =

25

= 355.5416

The process of experimentation repeats itself.

ACTIVITY 5.4
Write down the model

= yT + (1 )

T 1

in terms of

T 1

and yT

T 1 .

DISCUSSION OF ACTIVITY 5.4


T

= yT + (1 )
= yT +

T 1

T 1

T 1

T 1

+ yT

T 1

T 1

+ (yT

T 1 )

This form is called the error correction form. We move to the next section.

5.3 Tracking signals


Sometimes when SES is used, the rate of change of the level could change over time. It may be
necessary that the smoothing constant change. It is possible that changing a smoothing constant
may improve the forecasts. A tracking signal will be used to decide when something is wrong with
a forecasting system, such as when an inappropriate smoothing constant is used. A forecasting
system is not expected to produce perfect forecasts, but if the forecasts deviate more than it is
acceptable, a tracking signal can tell us so.
A tracking signal is thus an indicator to inform us that things are right or wrong. It is an instrument
to monitor the performance of our forecasting procedures. Basically, it indicates if the forecast is
consistently biased high or low. Read Section 8.2, p.355 of the textbook for this topic.
Let e1 () , e2 () , ..., eT () be T single-period-ahead forecast errors, where () denotes the
particular value of employed to obtain the single-period-ahead forecast errors. The sum of forecast
errors is defined as:
T

Y (, T ) =

et ().

t=1

ACTIVITY 5.5
Determine the sum of forecast errors for T = 24 using Figure 8.1 on page 348 of Bowerman.

102
DISCUSSION OF ACTIVITY 5.5
Forecast errors are in column E.
ACTIVITY 5.6
Show that:
Y (, T ) = Y (, T 1) + eT () .

DISCUSSION OF ACTIVITY 5.6


Using the definition appropriately, we have:
T

Y (, T ) =

et () =

T 1

t=1

et () + eT ()

t=1

= Y (, T 1) + eT ()

ACTIVITY 5.7
Verify the above equation using the data in Bowerman given in Figure 8.1.

DISCUSSION OF ACTIVITY 5.7


Determine the two terms separately and add them.
Do you remember the mean absolute deviation (MAD)? From Unit 1 It is:
n

M AD =

t=1

|et |

We update it according to the current form. That is:


T

M AD (, T ) =

t=1

|et ()|
T

Now we define the smoothed MAD. It is:


M AD (, T ) = |et ()| + M AD (, T 1)

The simple cusum tracking signal C (, T ) is defined as:


C (, T ) =

Y (, T )
.
M AD (, T )

If C (, T ) is large, then the sum of forecast errors Y (, T ) is large relative to the mean absolute
deviation M AD (, T ). This means that the forecasting system produces errors that are either
consistently positive or consistently negative. This means that a large C (, T ) value shows that

103

STA2604/1

the forecasting system produces forecasts that are consistently smaller or consistently larger than
the actual time series value. If the forecasting system is accurate, it should produce (at least
approximately) an equal number of negative and positive errors. Thus, a large C (, T ) indicates
that the forecasting system does not perform accurately. Note that we have still have not quantified
what a large value of C (, T ) means. There are no hard and fast rules for it. It will be given with
every situation.

ACTIVITY 5.8
Determine the simple cusum tracking signal using the data in Figure 8.1 in Bowerman et al. (2005:
348). Suppose that the forecasting system will be considered accurate if the value of the simple
cusum tracking signal is below 255 in absolute value. Do you think that the forecasting system
needs to be improved?

DISCUSSION OF ACTIVITY 5.8


Determine MAD and the sum of forecast errors. Substitute in the formula and then compare the
answer with 255. Make a decision.
We will not pursue tracking signals any further in this module. In theory we need to study them.
However, in the modern day and age technology is so advanced that technology through computers
can determine the best smoothing constant and allowing no need to improve the forecasting system.

5.4 Holts trend corrected exponential smoothing


SES cannot handle a time series that displays a trend. Study section 8.3 from page 357 of the
textbook. If the time times is increasing or decreasing at a fixed rate it may be described by the linear
trend model:
yt = 0 + 1 t + t .

The level (or mean) at time T is 0 + 1 t and that at time T 1 is 0 + 1 (T 1).


ACTIVITY 5.9
Show that the change in level of the time series from time period T 1 to time period T is 1 .
DISCUSSION OF ACTIVITY 5.9
The change is simply the difference between the last one and the original one (in that order). If you
cannot do it yourself, refer to page 357.

104
Growth rate
Now, regardless of whether the change 1 is an increase or a decrease, it is called the growth rate.
Holts trend corrected exponential smoothing is appropriate when both the level and the growth
rate are changing. In this case a linear trend model is not useful. For the Holts trend corrected
exponential smoothing, let

be the estimate of the level of the time series in time period T 1

T 1

and bT 1 be the corresponding estimate of the growth rate. If we observe a new time series value yt
in time period T , these two estimates require two smoothing equations to be updated.

The estimate of the level of the time series in time period T uses the smoothing constant and is:
= yT + (1 ) (

T 1

+ bT 1 )

The estimate of the growth rate of the time series in time period T uses the smoothing constant
and is:
bT = (

T 1 ) + (1

) (bT 1 )

A point forecast made in time period T for yT + is:


yT + =

+ bT

( = 1, 2, 3, ...)

The standard error s is given as:


T

s=

SSE
=
T 2

t=1

[tt (

T 1

+ bT 1 )]2

T 2

If = 1, then a 95% prediction interval computed in time period T for yT + is:


[(

+ bT ) z0.025 s,

+ bT ) + z0.025 s]

In general, for 2, a 95% prediction interval computed in time period T for yT + is:
(

+ bT ) z0.025 s 1 +

1
j=1

2 (1 + j);

+ bT ) + z0.025 s 1 +

2 (1 + j)

j=1

ACTIVITY 5.10
Write down the formula for a 95% prediction interval computed in time period T for yT + when:
(i) = 2
(ii) = 3.

105

STA2604/1

DISCUSSION OF ACTIVITY 5.10


Substitute as appropriate, noting that the sum over one term is the first term. In the second exercise
expand the summation to make the expression clear. It they are not easy for you, check on page
358.

5.5 Holt-Winters methods


Holt-Winters methods are designed for time series that show linear trend. The trend could be locally
or over a range of the entire time series. In this section, two methods are presented. One is the
additive Holt-Winters method and the other is the multiplicative Holt-Winters method. The topic is
presented from Section 8.4 of Bowerman et al. (2005: 366). Study it.

5.5.1 Additive Holt-Winters method


The additive Holt-Winters method is used for time series with constant or additive seasonal variation.
We discussed this type of variation in Unit 3. The method is linear since all the components are
added. This reality makes this method the easier of the two that we are presenting. The additive
Holt-Winters method deals with a time series that has a linear trend with a fixed growth rate, 1 , and a
fixed seasonal term, SNt , with constant additive variation in which the time series may be described
by the model:
yt = ( 0 + 1 t) + SNt + t

In order to handle this model, it is easier to analyse the trend and the seasonal component separately.
The seasonal component can also be handled using the dummy variables if necessary. This method
is appropriate when a time series has a linear trend with an additive seasonal pattern for which the
level, the growth rate, and the seasonal pattern may be changing. Implementation of the additive
Holt-Winters method starts with estimates of the level, the growth rate and the seasonal factor. Let
T 1

denote the estimate of the level in time T 1, and bT 1 the estimate of the growth rate in

time T 1. Suppose that we observe a new observation yt in time period T and let snT L be the

latest estimate of the seasonal factor in time period T . As before, L is the number of seasons. The
subscript T L of snT L is to reflect that the time series value in time period T L is the most recent

time series value observed in the season being analysed. Thus, this most recent time series value
is used in determining snT L .

The estimate of the level of the time series in time period T uses the smoothing constant and is:
T

= (yT snT L ) + (1 ) (

T 1

+ bT 1 )

106
where (yT + snT L ) is the deseasonalised observation in time period T. The estimate of the growth

rate of the time series in time period T uses the smoothing constant and is:
bT = (

T 1 ) + (1

) (bT 1 )

The new estimate for the seasonal factor SNT in time period T uses the smoothing constant and
is:
snT = (yT

where (yT

T)

T ) + (1 ) snT L

is an estimate of the newly observed seasonal variation.

A point forecast made in time period T for yT + is:


yT + (T ) =

+ bT + snT + L

( = 1, 2, 3, ...)

where snT + L is the most recent estimate of the seasonal factor for the season corresponding to
time period T + .

A 95% confidence interval computed in time period T is:

yT + (T ) z0.025 s c ; yT + (T ) z0.025 s c
where
c

for = 1

=1
=1+

2 (1 + j)2

for = 2, 3, ..., L

j=1

=1+

1
j=1

[ (1 + j) + dj,L (1 ) ]2 for = L, L + 1, L + 2, ...

where
dj,L = 1 if j is a multiple of L
= 0 otherwise

ACTIVITY 5.11
Suppose that there is a well-known commodity that is transported by the largest international
shipping and transportation company from a foreign country, which is seasonal over the quarters
of a year.
(a) Determine the appropriate c .
(b) Evaluate dj,L when:
(i) j = 2
(ii) j = 12

107

STA2604/1

DISCUSSION OF ACTIVITY 5.11


(a) The quarters of a year are L = 4, the number of seasons. Hence:
c

for = 1

=1
=1+

2 (1 + j)2

for = 2, 3, 4

j=1

=1+

1
j=1

[ (1 + j) + dj,L (1 ) ]2 for = 4, 5, 6, ...

Few illustrations show that:


c

for = 1

=1
=1+

2 (1 + j)2

for = 2, 3, 4

j=1

=1+

1
j=1

[ (1 + j) + dj,L (1 ) ]2 for = 4, 5, 6, ...

c1 = 1
c2 = 1 + 2 (1 + )2
c3 = 1 + 2 (1 + )2 + 2 (1 + 2)2
c4 = 1 + 2 (1 + )2 + 2 (1 + 4)2

(b) Just using the definition of dj,L , then:


(i) d2,4 = 1
(ii) d12,4 = 0
The standard error s computed in time period T is:
T

s=

SSE
=
T 3

t=1

[yt (

t1

+ bt1 + snT L )]2

T 3

The error correction form for the smoothing equations in the additive Holt-Winters method is made
of:
T

bT
snT

T 1

+ bT 1 + [yT (

= bT 1 + [yT (

T 1

T 1

+ bT 1 + snT L )]

+ bT 1 + snT L )]

= snT 1 + (1 ) [yT (

T 1

+ bT 1 + snT L )]

108

5.5.2 Multiplicative Holt-Winters method


Multiplicative Holt-Winters method is used for a time series that has a linear trend with a fixed growth
rate, 1 , and a fixed seasonal pattern, SNt , with increasing or multiplicative variation. It is appropriate
when the level, growth rate and seasonal pattern may be changing rather than being fixed. This type
of time series may be described using the multiplicative model:
yt = ( 0 + 1 t) SNt IRt

In Unit 4 we showed how to estimate the fixed seasonal factors, SNt , by using centred moving
averages. The level at time period T 1 for this model is given by 0 + 1 (T 1), and the level at

time period T is given by 0 + 1 T . This shows a growth rate for the level is 1 .

Implementation of the multiplicative Holt-Winters method starts with estimates of the level, the growth
rate and the seasonal factor. Let

T 1

denote the estimate of the level in time T 1, and bT 1 the

estimate of the growth rate in time T 1. Then, suppose that we observe a new observation yT in

time period T , and let snT L be the latest estimate of the seasonal factor in time period T . As before,
L is the number of seasons. The subscript T L of snT L is to reflect that the time series value in

time period T L is the most recent time series value observed in the season being analysed. Thus,
this most recent time series value is used in determining snT L .

The estimate of the level of the time series in time period T uses the smoothing constant and is:
T

yT
snT L

+ (1 ) (

T 1

+ bT 1 )

where (yT + snT L ) is the deseasonalised observation in time period T . The estimate of the growth

rate of the time series in time period T uses the smoothing constant and is:
bT = (

T 1 ) + (1

) (bT 1 )

The new estimate for the seasonal factor SNT in time period T uses the smoothing constant and
is:
snT =

yT
T

where

yT

+ (1 ) snT L

is an estimate of the newly observed seasonal variation.

A point forecast made in time period T for yT + is:


yT + (T ) = (

+ bT ) snT + L

( = 1, 2, 3, ...; )

where snT + L is the most recent estimate of the seasonal factor for the season corresponding to
time period T + .

109

STA2604/1

A 95% confidence interval computed in time period T is:

yT + (T ) z0.025 sT c (snT + L ) ; yT + (T ) z0.025 sT c (snT + L )


where
c1 = (

+ bT )2

c2 = 2 (1 + )2 (

c3 = 2 (1 + 2)2 (

+ bT )2 + (
T

+ 2bT )2

+ bT )2 + 2 (1 + )2 (

+ 2bT )2 + (

+ 3bT )2

The standard error sr computed in time period T is:


T

sT =

t=1

yt yt (t 1)
yt (t 1)
T 3

t=1

yt ( t1 + bt1 ) snT L
( t1 + bt1 ) sntL
T 3

The error correction form for the smoothing equations in the additive Holt-Winters method is made
of:
T

bT

snT

T 1

+ bT 1 +

= bT 1 +

yT (

+ bT 1 ) sntL
snT L

T 1

yT ( T 1 + bT 1 ) snT L
( T 1 + bT 1 ) snT L

= snT 1 + (1 )

yT ( T 1 + bT 1 ) snT L
( T 1 + bT 1 ) snT L

5.6 Damped trend exponential


It is possible for a time series to have a growth rate that will not be sustained into the future and whose
effects needing to be dampened. The damped trend method called the Gardner and McKenzies
damped trend exponential smoothing may be used for this dampening. See page 387 of the textbook
for the equation of the damped trend method.

ACTIVITY 5.12
Which values of the damping factor are associable with:
(a) Meager (or weak) dampening?
(b) Substantial dampening?

110
DISCUSSION OF ACTIVITY 5.12
We know that the values of the dampening factor lies between 0 and 1. The values near 0 have less
dampening effect than the ones near 1. Hence:
(a) Meager (or weak) dampening will be effected with values near 0.
(b) Substantial dampening will be effected with values near 1.
One may need to know why the value 0 and 1 are excluded as possible.

ACTIVITY 5.13
What will happen if the dampening factor can be equated:
(a) to 0?
(b) to 1?

DISCUSSION OF ACTIVITY 5.13


(a) The best way to find out is to substitute and observe. You should discover simple exponential
smoothing.
(b) On this one you should obtain Holts trend.
It is possible that there can be a need to dampen the growth rate on a time series that requires
Holt-Winters method. In this case we use Holt-Winters method with damp trend. The Holt-Winters
method was discussed. The equations on pages 388-390 give the versions of Holt-Winters
method when there is damped trend.

5.7 Conclusion
The chapter discussed various forecasting models that are used under specific conditions. Important
conditions such as various forms of variation were all given with the methods and you should
familiarise yourself with them. Discussions are based on the textbook.

Das könnte Ihnen auch gefallen