Beruflich Dokumente
Kultur Dokumente
Department of Statistics
STA2604
Forecasting
STA2604/1
Table of contents
UNIT 1: An Introduction to Forecasting
1.1 Introduction
1.1.1 Forecasting
1.1.2 Data
10
14
15
15
16
18
22
22
23
23
24
25
25
26
26
28
29
1.6 Conclusion
30
ii
UNIT 2: Model Building and Residual Analysis
2.1 Introduction
31
2.2 Multicollinearity
33
33
34
38
41
42
43
45
45
47
47
48
49
2.4.2 Residuals
50
52
53
54
54
2.5 Conclusion
54
iii
STA2604/1
56
57
3.2.1 No trend
58
58
59
64
64
66
67
69
70
71
75
76
76
77
78
82
83
84
84
3.9 Conclusion
85
iv
UNIT 4: Decomposition of a Time Series
4.1 Introduction
86
87
87
89
91
91
94
4.5 Conclusion
95
96
97
101
103
105
105
108
109
5.7 Conclusion
110
STA2604/1
This module presents fundamental aspects of Time Series analysis used in forecasting.
The
prescribed textbook for this module is Bowerman, OConnell and Koehler (2005). We will not study
all the chapters in the book for this module, but will focus on Chapters 1, 5, 6, 7 and 8.
The module is done in one semester. Make sure that you are registered for the right semester and
the material you receive is the correct one.
About the book
The prescribed book is reader-friendly and contains limited mathematical theory. It is geared towards
the practice of forecasting. The authors are experienced practitioners in the field of time series. The
book will assist you in understanding concepts and methodology, and in applying these in practice
(i.e. in real-life situations).
It is
imperative to have your own calculator in the examination. It is important, although not compulsory,
to have access to a computer in order to undertake the tasks in this module. You may visit a Regional
Centre to use a computer. The text contains output from Excel, MINITAB, JMP IN and SAS. However,
we encourage the use of any software to which you may have access. The above list of computer
software/packages may be used, as well as R, SPSS, Stata, S-Plus and EViews. Your ability to use
such software will increase your marketability in the workplace. You are encouraged to experiment
with the packages at your disposal.
vi
REFERENCES
The prescribed book must be purchased. Refer to the study guide regularly. We shall also refer to a
number of user-friendly textbooks on Time Series that are available in the Unisa library. You do not
need to buy the recommended books for this module.
PRESCRIBED BOOK
Bowerman, B. L., OConnell, R. T. & Koehler, A. B. (2005) Forecasting, time series and regression:
an applied approach, 4th edition. Singapore: Thomson Brooks/Cole.
vii
STA2604/1
viii
Module position in the curriculum
We have been offering a postgraduate module on Time Series at Unisa, but have become aware of
the need to introduce the module at undergraduate level due to its necessity in the workplace and in
order to fill the gap that is evident when students attempt the postgraduate time series module.
This module is part of the whole Statistics curriculum at Unisa. Its position on the curriculum structure
is as follows:
1st year
STA1501
STA1502
STA1503
2nd year
STA2601
STA2602
STA2603
3rd year
STA3701
STA3702
STA3703
STA2604
FORECASTING
We are here
STA3704 STA3705
STA2610
STA3710
You should already be familiar with some of the modules mentioned above. Knowledge from
STA2604 will help you in STA3704 (Forecasting III).
ASSIGNMENTS
There are two assignments for this module, which are intended to help you learn through various
activities. They also serve as tests to prepare you for the examination. As you do the assignments,
study the reading texts, consult other resources, discuss the work with fellow students or tutors or
do research, you are actively engaged in learning. Looking at the assessment criteria given for
each assignment will help you to understand what is required of you more. The two assignments
per semester prescribed for this module form part of the learning process. The typical assignment
question is a reflection of a typical examination question. There are fixed submission dates for the
assignments and each assignment is based on specific chapters (or sections) in the prescribed book.
You have to adhere to these dates as assignments are only marked if they are received on or before
the due dates.
Both assignments are compulsory as
they are the sole contributors towards your year mark and
they form an integrated part of the learning process and indicate the form and nature of the
Please note that the submission of assignment 01 is the guarantee for examination entire . If you
do not submit assignment 01, UNISA not the Department of Statistics will deny you examination
entry.
You are urged to communicate with your lecturer(s) whenever you encounter difficulties in this
module. Do not wait until the assignment due date or the examination to make contact with lecturers.
It is helpful to be ready long in advance. You are also encouraged to work with your own peers,
colleagues, friends, etc. Details about the assignments will be given Tutorial letter 101.
ix
STA2604/1
Time series has its own useful terminology that should be understood. In order to familiarise yourself
with it, let us start with an easy activity. Activities help in the creation of a mind map of the module.
The more you attempt these activities, the better you will understand the work.
GLOSSARY OF TERMS
ACTIVITY 0.1
(a) Make a list of all the concepts that are printed in bold type in Chapters 1, 5, 6, 7 and 8 of the
prescribed book. They serve as your glossary.
(b) Attempt meanings of these concepts before you deal with the various sections so that you have
an idea before we get there.
PREREQUISITES
The ability to use a scientific calculator.
Access to a computer package and the ability to use it are highly recommended.
First-year statistics. These topics appear below and there will be a quick reminder whenever we
When you draw plots required for statistical analysis, these plots should be accurate. Hence, use
a ruler and a lead pencil (not a pen) to construct plots. If you have access to a computer, you are
also encouraged to practise using any statistical package of your choice. Assignments may also be
prepared by means of a computer. Just make sure that you use the correct notation. Avoid using a
computer if you cannot write the correct notation. Remember that you are always welcome to contact
the lecturers whenever you have problems with any aspect of the module.
OUTCOMES
At the end of the module you should be able to do the following:
Define and apply components of time series.
Apply time series methods to develop forecasts.
Specify a prototype forecast model, estimate its parameters and then validate it.
Use the specified model to derive forecasts.
xi
STA2604/1
TABLE OF OUTCOMES
Outcomes - At the
end of the module
you should be
able to
- explain and expose
time series
components
Assessment
- analyse data
- plot graphs
Content
- trend
- seasonality
- cycles
- irregularity
- choosing a
technique
Activities
Feedback
- examine data
visually
- plot graphs
- discuss
likely
errors
- analyse errors
- plot graphs
- scrutinise
models
- select a model
- balance
factors
- develop a model
- forming an
equation
- regression
- exponential
smoothing
- small build-up
exercises
- emphasise
aptness
- estimate parameters
- perform
estimations
- estimation
methods
- perform
calculations
- discuss
alternatives
- validate a model
- statistical
tests
- hypothesis
testing
- test hypotheses
- peruse the
various tests
- develop forecasts
- demonstrate
patterns
- model
building
- form equations
- visit various
alternatives
You will know that you understand this module once you understand the above issues.
Feedback is not just a follow-up of the preceding concepts. It is an opportunity to reinforce some
concepts and revise others. Make use of this opportunity. Feedback is given after every activity,
sometimes with some discussion after the activity, but in many instances, it follows immediately after
the activity.
OVERVIEW
Two of the five study units comprising this module are presented in this study guide.
Unit 1: Narration of the forecasting domain and support elements
(Chapter 1 of Bowerman et al.)
In this unit we will learn more about
Situations requiring forecasts and forecasting
Issues about useful data and use of data in developing forecasts
xii
Basic types of data and approaches (quantitative and qualitative methods)
Errors, problems and pitfalls in forecasting, as well as depiction of good forecasts
Factors useful in choosing a forecast technique
More about quantitative methods
Do the above issues raise some response from you? Do you have any idea of what they mean or
imply? Think and chat with your colleagues, peers or family members. Remember that learning
becomes real and effective only when sharing is involved.
Unit 2: Building a forecast model and examining / verifying its strength
(Chapter 5 of Bowerman et al.)
In this study unit we will learn about
Multicollinearity of variables:
- residual plots
- the constant variance assumption
- assumption of correct functional form
- normality assumption
- the independence assumption
Outliers and influential observations:
- outliers
- influential data
- diagnostic methods to detect outliers and influential observations
- leverage points
- residuals
- Cooks distance measure
xiii
STA2604/1
The measures dealt in with this Unit ensure that the model built for use in forecasting has desirable
properties of limited error and is influenced to the minimum, if at all it is influenced. Also, it is
necessary to make a distinction between outliers and seasonal variations. Sometimes a mistake is
made with an effect of seasonality being misinterpreted as an outlier.
We hope you have come across some of the concepts or issues above. Discuss these with your
colleagues, peers, friends or family members.
xiv
ETHICS IN FORECASTING
Are predictions of the future a form of propaganda, designed to evoke a particular set of behaviours?
Note that the desire for control is implicit in all forecasts. Decisions made today are based on
forecasts, which may or may not come to pass. The forecast is a way to control todays decisions.
The purpose of forecasting is to control the present. In fact, one of the assumptions of forecasting
is that the forecasts will be used by policy-makers to make decisions. It is therefore important to
discuss the ethics of forecasting. Since forecasts can and often do take on a creative role, no one
has the absolute right to make forecasts that involve other peoples futures.
Nearly everyone would agree that we have the right to create our own future. Goal setting is a form
of personal forecasting. It is one way to organize and invent our personal future. Each person has
the right to create their own future. On the other hand, a social forecast might alter the course of an
entire society. Such power can only be accompanied by equivalent responsibility.
There are no clear rules involving the ethics of forecasting. Value impact is important in forecasting,
i.e. the idea that social forecasting must involve physical, cultural and societal values. However,
forecasters cannot leave their own personal biases out of the forecasting process. Even the most
mathematically rigorous techniques involve judgmental inputs that can dramatically alter the forecast.
xv
STA2604/1
Many futurists have pointed out our obligation to create socially desirable futures. Unfortunately, a
socially desirable future for one person might be another persons nightmare. For example, modern
ecological theory says that we should think of our planet in terms of sustainable futures. The finite
supply of natural resources forces us to reconsider the desirability of unlimited growth. An optimistic
forecast is that we achieve and maintain an ecologically balanced future. That same forecast, the
idea of zero growth, is a catastrophic nightmare for the corporate and financial institutions of the free
world. The system of profit depends on continual growth for the well-being of individuals, groups,
and institutions.
Desirable futures is a subjective concept. It can only be understood relative to other information.
The ethics of forecasting certainly involves the obligation to create desirable futures for the person(s)
that might be affected by the forecast. If a goal of forecasting is to create desirable futures, then the
forecaster must ask the ethical question of desirable for whom?.
To embrace the idea of liberty is to recognise that each person has the right to create their own
future. Forecasters can promote libertarian beliefs by empowering people that might be affected by
the forecast. Involving these people in the forecasting process, gives them the power to become
co-creators in their futures.
BENEFITS OF FORECASTING
Forecasting can help you make the right decisions, and earn/save money. Here are a few examples.
xvi
How? By forecasting!
Forecasting is designed to help decision making and planning in the present. Forecasts empower
people because their use implies that we can modify variables now to alter (or be prepared for)
the future. A prediction is an invitation to introduce change into a system. There are several
assumptions about forecasting:
There is no way to state what the future will be with complete certainty. Regardless of the
methods that we use there will always be an element of uncertainty until the forecast horizon
has come to pass.
There will always be blind spots in forecasts. We cannot, for example, forecast completely new
Providing forecasts to policy-makers will help them formulate social policy. The new social
policy, in turn, will affect the future, thus changing the accuracy of the forecast.
STA2604/1
Assessment
Content
Activities
Feedback
- time series
word list
- experiment
with data
- discuss each
activity
- decompose time
series
- graph, visual
- time series
components
- plot graphs
- critique the
graphs
- stepwise
exercises
- errors in
forecasting
- various
calculations
If you understand the above outcomes, it will be an indication that you understand this study unit. It
is based on Chapter 1 of the prescribed book.
Forecasting is the scientific process of estimation some aspects of the future in usually unknown
situations. Prediction is a similar, but is more general term. Both can refer to estimation of time
series, cross-sectional or longitudinal data. Usage can differ between areas of application: for
example in hydrology, the terms "forecast" and "forecasting" are sometimes reserved for estimates of
values at certain specific future times, while the term "prediction" is used for more general estimates,
such as the number of times floods will occur over a long period. It is essential that one notes
the emphasis that in this module, forecasting also envelops that it is scientific. This is to ensure
that we do not consider subjective predictions and spiritual prophecies as part of our scope for this
forecasting module. Risk and uncertainty are central to forecasting and prediction. Forecasting
is used in the practice of Customer Demand Planning in every day business forecasting for
manufacturing companies. The discipline of demand planning, also sometimes referred to as supply
chain forecasting, embraces both statistical forecasting and a consensus process. Forecasting is
commonly used in discussion of time-series data. In this module the terms are fairly straightforward
from the prescribed book.
that the right product is at the right place at the right time. Accurate forecasting will help retailers
2
reduce excess inventory and therefore increase profit margin. Accurate forecasting will also help
them meet consumer demand.
Weather forecasting, Flood forecasting, and Metereology
Transport planning and Transport forecasting
Economic forecasting
Egain forecasting
Technology forecasting
Earthquake forecasting
Land use forecasting
Product forecasting
Player and team performance in sports
Telecommunications forecasting
Political forecasting
Sales forecasting
ACTIVITY 1.1
Consider the terms forecasting, cross-sectional data and time series, which are the main focus
of this study unit.
(a) Attempt to define these terms.
(b) Check the definitions in the book and compare your answers in (a).
Before we discuss the above activity, start by reading slowly through the following discussion. Make
sure you follow the discussion.
1.1.1 Forecasting
Study section 1.1 on page 2 up to the second bullet on page 3.
The few people with whom we discussed the term forecastingseemed to have an understanding
of the concept only in a nutshell. Many of them made reference to the weather forecast that
was presented on radio, television and the internet. A gap existed in the main understanding of
forecasting.
Various backgrounds exist that show that at every point in time when people lived, they were always
interested in the future. There are stories from history that inform us that when people dreamed,
STA2604/1
there were experts to explain the meanings of these dreams in terms of the future. When signs of
future drought arose, the implications of the drought were noted and plans were made to offsets the
impacts that were anticipated. Drought led to hunger. Thus, when predictions were made that there
was drought coming, preparations were made that at the time of the drought, there would be enough
food for every member of the community during the duration of the drought. Predicting the future
even as it was done during those days can be referred to as forecasting. The predicted future was
then used to plan for the future as explained above.
Modern practice has encouraged that the "anticipation of the furture" practice be conceptualised.
It was then formally termed forecasting. The current approaches are scientific in order to ensure
that forecasting is practised systematically. The predictions made are now called forecasts. In other
terms, forecasts are future expectations based on scientific guidelines.
DISCUSSION OF ACTIVITY 1.1
The first term we listed in Activity 1.1 was forecasting. Did you get that? The term forecasting is a
natural operation. We have always done it, sometimes unconsciously. As was explained, predicting
activities has always been practised, even in ancient times. For self-evaluation in terms of the time
series concept, did you define the term forecasting in line with predicting the future?
Forecasting indicates more or less what to expect in the future. Once the future is known, preparation
for equitable allocation of resources can be made. Wastages can thus be reduced or eliminated and
gains can be enhanced (or increased).
FURTHER DISCUSSION ON FORECASTING
Forecasting is applied in various real-life situations. Six examples of applications are listed on pages
2 and 3 of the prescribed book. We are close to them at different levels. But what about something
that we as students of the University of South Africa can appreciate?
The number of student enrolments at Unisa is the starting point. The trend pattern will give an
indication of whether there has been a decline or growth in the student numbers over the years. If
you are observant, you will realise that there has been an increase in student numbers over the past
few years. Our forecast for next year (2013) is that there will be more students than in 2012.
ACTIVITY 1.2
Weather forecasting was mentioned as a known example where forecasting is used abundantly.
There are many others.
(a) Provide an easy example of a situation where forecasting is needed.
(b) Attempt to explain the details of the example you provided in (a).
4
the next election. We might anticipate extreme growth of one party (MDC) and decline of others in
Zimbabwe, based on the trends in the previous elections and developments that prevail. Therefore,
(a) one can for example predict how the political parties will perform in the next election; and
(b) recent performance of the various parties in previous elections may be revisited and analysed,
the current activities of the parties may be analysed closely and one may interact with people to
determine their impressions about various parties.
N.B.: Here we assume normal election conditions where no intimidation and harassments take place.
1.1.2 Data
For this topic you need to study from the middle paragraph of page 3 to the end of page 4.
Data are important for forecasting. Quality data, which loosely refer to reliable and valid data, are the
ones needed for forecasting. We may be misled if we use data of poor quality because results are
likely to be poor as well, even if best methods are used by a proficient analyst. The term data refers
to groups of information that represent the qualitative or quantitative attributes of a variable or set of
variables. Data (plural of "datum", which is seldomly used) are typically the results of measurements
and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed
as the lowest level of abstraction from which information and knowledge are derived. Raw data refers
to a collection of numbers, characters, images or other outputs from devices that collect information
to convert physical quantities into symbols, that are unprocessed.
Without data there will not be forecasting. However, it is important that data be correct (reliable, valid,
realistic, etc). Data need to be both valid for the exercise, and be reliable. If one of these is missed,
then be warned that your forecasts may mislead you or any user. Also, collection of data may
be inadequate to help in supporting the reasoning behind some findings. Experience shows that
when data are collected under certain contexts, explanations and contexts become clearer when
findings are associated with those contexts. Thus, if you assist in data collection of time series or
any statistical data, whenever possible, advise on the inclusion of details of the occurrences of the
data. Giving details around happenings assists in reducing the extent of making assumptions which
may sometimes be incorrect.
The type of information used in forecasting determines the quality of the forecasts. Not all of us like
boxing, but let us discuss the next scenario. Imagine that two boxers were going to fight on the next
Saturday. We were required to make a prediction in order to win a million rand competition. Many
participants looked at the past records of these boxers. They were informed that in the previous
seven years boxer Kangaroo Gumbu had won 25 out of 27 fights while boxer Boetie Blood had won
22 of the 30 fights he had in the same period. Gumbu was known for winning well while Blood had
STA2604/1
lost dismally in a recent fight. Let us pause and enjoy the predictions (forecasts) made, just to make
a good point..
ACTIVITY 1.3
Either as a person interested in boxing or someone hoping to win the money, you may be tempted to
take a chance at the answer. Make a prediction of the outcome of the fight based on the explanation
given.
6
assumptions. Wrong assumptions may lead to inappropriate methods for data analysis. In cases
where information can be found to limit the use of assumptions, this should be done. However, many
cases provide inadequate information, leaving us with no choice but to depend on assumptions.
Analysis should depend on reasonable assumptions. If in actual practice assumptions are made
for the sake of doing something, decisions and results reached may lead to improper actions. The
analyst should learn the art of making appropriate or reasonable assumptions.
In the case of the example/scenario given, the details were missing, such as that the two boxers were
of different weights. If we knew, this would have helped in our analysis. Sometimes in predicting
about forthcoming games, one needs to also know the quality of opposition that the two opponents
have met in the accumulation of their records. This was also missing in the example. We will insist
on use of the valid assumptions because as we saw, wrong or invalid assumptions are likely to give
inaccurate predictions. The paragraph after the last bullet of the prescribed book on page 3 explains
possible repercussions that come with the wrong assumptions (Bowerman, 2005: 3).
Types of data that are common in real life are cross-sectional data and time series data. Study
the definition of cross-sectional data in the rectangle on page 3. Cross-sectional data refers to data
collected by observing many subjects (such as individuals, firms or countries/regions) at the same
point of time, or without regard to differences in time. Analysis of cross-sectional data usually consists
of comparing the differences among the subjects. For example, we want to measure current obesity
levels in a population. We could draw a sample of 1,000 people randomly from that population (also
known as a cross section of that population), measure their weight and height, and calculate what
percentage of that sample is categorized as obese. Even though we may analyse cross-sectional
data for quality forecasts, in this module we use time series data.
STA2604/1
8
ACTIVITY 1.4
You have done some first-year statistics modules/courses and some of you did mathematics modules
as well. Let us consider the following data sets and look at them quite closely.
Data set 1.1
16
18
21
24
14
15
15
17
19
21
20
24
26
24
27
31
16
14
19
26
11
24
10
18
15
21
24
12
21
9
21
15
20
27
13
25
11
24
17
24
31
14
27
13
11
12
13
14
24
21
25
27
10
9
11
13
(a) The two data sets have exactly the same numbers. There is something strange about their
appearances though. Compare the two data sets.
(b) Can these two data sets be classified as time series data sets? Explain.
Week
1
2
3
4
1
16
18
21
24
2
14
15
15
17
3
19
21
20
24
Day
4
26
24
27
31
5
11
12
13
14
6
24
21
25
27
7
10
9
11
13
STA2604/1
We emphasise that in the initial presentation there was simply no information to explain or
demonstrate the chronological sequence with respect to time and that the data were therefore not
time series data.
ACTIVITY 1.5
You are required to use graphs in addition to other methods to detect patterns in time series data.
Graphical plots reveal information visually, but cannot always be done with ease. The example
that follows, is one of the easy cases where we can draw graphical plots. Analyse the data about
Jabulanis business by answering the following questions. Make any comments that you believe are
relevant.
(a) Are they time series data? Justify your answer.
(b) Plot the data to reveal the pattern using the following approaches:
(i) Plot the data for each week separately.
(ii) Plot the data of all the weeks in one graphical display.
(iii) Compare the shapes of the graphs.
(c) Which plot provides us with a better idea of comparison?
DISCUSSION OF ACTIVITY 1.5
The emphasis about whether data sets form time series or not, depends entirely on the form, which
is the chronological order in which the various data points should be presented. Did you answer
"yes" in question (a)? If not, what did you reveal? How did you reveal it?
(b) Graphs of the activity
(i) Graphs for separate weeks
Week 2
30
25
25
Litres of milk
Litres of milk
Week 1
30
20
15
10
20
15
10
5
0
1
Week 4
Week 3
35
30
30
25
Litres of milk
Litres of m ilk
4
Days
Days
20
15
10
5
25
20
15
10
5
0
0
1
4
Days
4
Days
10
35
Litres of milk
30
25
Week 1
20
Week 2
15
Week 3
Week 4
10
5
0
1
Days
(iii) In terms of the pattern, the graphs reveal that milk sales were highest on Thursdays, Saturdays
and Wednesdays (in order from highest to lowest). The lowest sales were revealed for
Sundays, Fridays, Tuesdays and Mondays (in the order from lowest to highest).
(c) The graphs can be difficult to compare when they are on separate systems of axes. The last
graph makes comparison very easy, revealing that the patterns for all four weeks are similar.
The patterns of the highest activity and lowest activity about a phenomenon are important in time
series. Jabulani will easily know when he does more business, when he does least business and he
can plan to find better ways to improve business. Let us start formalising these patterns.
11
STA2604/1
Time series data may show upward trend or downward trend for a period of years. This may be
due to factors such as increase in population, change in technological progress, large scale shift in
consumers demands, and so on. For example, population increases over a period of time, price
increases over a period of years, production of goods on the capital market of the country increases
over a period of years. These are the examples of upward trend. The sales of a commodity may
decrease over a period of time because of better products coming to the market. This is an example
of declining trend or downward trend. The increase or decrease in the movements of a time series
is called trend.
Usually one would not be able to determine from looking at the data whether there is a decreasing
or increasing trend. There are times (but rarely) when we can see the pattern by inspection. Often a
graphical plot clearly shows the trend. The trend may be given in shapes such as linear, exponential,
logarithmic, polynomial, power function, quadratic, and other forms. In general, we use the graphical
displays to find out if there is a decline or increase in the activity. Some examples of trend applications
that we must look at are given on page 5 of Bowerman et al. (2005). Study them.
- Technological changes in the industry
Currently, companies increase ICT usage in their activities for competitive edge over those that do
not incorporate it. Institutions of higher learning have aggressively incorporated ICT in facilitating
learning, especially the distance education ones.
- Changes in consumer tastes
Housing is very expensive and scarce, but for obvious reasons remains a priority for households.
Recently, cities such as Cape Town, Durban, East London, Johannesburg, Port Elizabeth and
Pretoria have experienced a high influx of people from other areas, and employment is biased
towards the youth. As a result housing in these cities is biased towards townhouses and flats.
- Increases in total population
There is an increase since there are more births than deaths. In SA, there is also an influx of
people from other countries. In other countries, natural deaths and deaths that resulted from
holocausts, wars, terrorism and natural disasters such as the tsunami and others, have resulted
in many deaths but much fewer deaths than the births that have occurred over the years. That is
why there is an increase in the worlds population.
- Market growth
In Gauteng, the market of umbrellas decreases in the period April to July. During the rainy season,
which in Gauteng happens to be the summer season, the sales of umbrellas increase.
- Inflation or deflation (price changes)
If we consider one item for simplicity, maize is produced in the period October to May,
12
approximately. During entry period, the price of maize is high because there are more people
looking for a less available commodity. During the periods November to January, maize is in
abundance and the prices drop. As the production level declines, the prices start increasing
again.
ACTIVITY 1.6
Discuss what a time series is, and discuss the meaning of trend effects, seasonal variations, cyclical
variations, and irregular effects.
DISCUSSION OF ACTIVITY 1.6
You should mention a sequence of observations of a variable presented in chronological form
when you describe a time series. Trend should imply a long-term tendency of that time series.
Seasonality should include a periodic pattern in the data. Describing cycles should imply up and
down movements of observations around trend levels. Irregular pattern is the portion of the time
series which cannot be accounted for by the three patterns discussed above.
Exploration data set
The next data set is important for exploration. ENJOY IT. It represents the litres of milk that were
demanded from Jabulani. Whether there was stock or not is not an issue here. The data set will be
revisited time and again.
Data set 1.4
Day
1
2
3
4
5
6
7
1 16 14 19 26 11 24 10
Week 2 18 15 21 24 12 21
9
3 21 15 20 27 13 25 11
4 24 17 24 31 14 27 13
In general, methods of forecasting that depend on non-numeric information are qualitative forecasting
methods. (Do you remember this from first-year Statistics?) Qualitative data are nominal/words data.
Quantitative forecasting methods on the other hand depend on numerical data.
Bowerman et al. (2005: 7) present a graphical plot Figure 1.1 (a) to display an example of a trend
in a time series. There is no trend line to describe the trend, but can you explain whether there is a
decreasing or increasing trend in the plot to which we are referring?
Cycle
The next component of time series that we discuss is cycle. When trends have been identified, there
may be some recurring up and down movements visible around trend levels. These movements are
called cycles. Cycles occur over long and medium terms. Page 5 of Bowerman et al. (2005) presents
this component.
Some interesting explanation is presented by Bowerman et al. (2005: 5) about business cycles.
Study it in detail. Bowerman et al. (2005: 7) present Figure 1.1 (c) to display an example of a cycle
13
STA2604/1
in a time series. We need to note that generally, natural occurrences have shown some cyclical
patterns over the years.
The impact of cycles on a time series is either to stimulate or depress its activity, but in general,
their causes are difficult to identify and explain. Certain actions by institutions such as government,
trade unions, world organisations, and so on, can induce levels of pessimism and optimism into the
economy which are reflected in changes in the time series levels. Economic indices are usually used
to describe cyclical fluctuations.
Cyclical variations are recurrent upward or downward movements in a time series but the period of
cycle is greater than a year. This restriction makes it different from trend. Also, cyclical variations
are not regular as seasonal variation. There are different types of cycles of varying in length and
size. The ups and downs in business activities are the effects of cyclical variation. A business
cycle showing these oscillatory movements has to pass through four phases-prosperity, recession,
depression and recovery. In a business, these four phases are completed by passing one to another
in this order. Together, they form a cycle.
Cycles are useful in long-term forecasting. Usually it means centuries and millenniums. Our
capabilities and interest in this module do not require us to look beyond a decade. Hence, methods
for developing forecasts that include cycles (or cyclical components) are not in the scope of this
module. However, you still need to understand when cycles are discussed or implied in a forecasting
situation.
Seasonality
The example about milk is given over weekly periods. The definition given by Bowerman et al. (2005:
6) is somewhat misleading! The impression it gives is that observations being investigated, must run
over a year. This is simply not the case. Even the values occurring within a day can be seen to be
seasonal, as you will soon see. First, we provide a more useful and realistic definition of seasonality,
which will be used in the module. The one given in Bowerman et al. shall work when the periods are
over yearly periods. Let us define the concept in the next line:
Seasonal variations are systematic variations that occur within a period and which are tied to some
properties of that period. They are repeated within the period. They are indeed periodic patterns in a
time series that complete themselves within a calendar period and are repeated on the basis of that
period.
Seasonal variations are short-term fluctuations in a time series which occur periodically in a period,
such as a year. In this case it would continue to be repeated year after year. The major factors that
are responsible for the repetitive pattern of seasonal variations are weather conditions and customs
of people. More woolen clothes are sold in winter than in the season of summer. Regardless of the
trend we can observe that in each year more ice creams are sold in summer and very little in winter
14
season. The sales in the departmental stores are more during festive seasons that in the normal
days.
Irregular fluctuations
We have not mentioned whether Jabulani was ever robbed of his revenue or stock for his business.
Now we are giving you bad news.
Irregular fluctuations are variations in time series that are short in duration, erratic in nature and
follow no regularity in the occurrence pattern. These variations are also referred to as residual
variations since by definition they represent what is left out in a time series after trend, cyclical and
seasonal variations have been accounted for. Irregular fluctuations results due to the occurrence of
unforeseen events like floods, earthquakes, wars, famines, and so on.
Remember that Jabulani was a smart entrepreneur who would make some estimations of revenue
each morning he left for work. One Tuesday afternoon after he had counted what he thought was his
revenue for the day, he was robbed by two thugs. Fortunately he was neither hurt nor discouraged
to continue with his business. It was happening for the first time. Could he have anticipated being
robbed on that day? We also could not have predicted that event.
The point is, that irregular event changed what could have been the revenue and/or profit for that
day. In time series, irregular fluctuations, which are also called irregular variations, refer to random
fluctuations that are attributed to unpredictable occurrences. Bowerman et al. (2005: 6) appropriately
define them as erratic movements in a time series that follow no recognisable or regular pattern. The
presentation about this concept simply implies that these patterns cannot be accounted for. They
are once-off events. Examples are natural disasters (such as fires, droughts, floods) or man-made
disasters (strikes, boycotts, accidents, acts of violence and so on).
Note that all the components of a time series influence the time series and can occur in any
combination.
that the right product is at the right place at the right time. Accurate forecasting will help retailers
reduce excess inventory and therefore increase profit margin. Accurate forecasting will also help
them meet consumer demand.
15
STA2604/1
Judgmental
forecasting methods incorporate intuitive judgements, opinions and subjective probability estimates.
Composite forecasts
Surveys
Delphi method
Scenario building
Technology forecasting
Forecast by analogy
16
You do not need to learn more about these for the requirements of this module. However, you
may come across them in applications. Hence, your encounter with them may be of help in future
applications.
Causal forecasting models, start by identifying variables that are related to the one to be predicted.
This is followed by forming a statistical model that describes the relationship between these
variables and the variable to be forecasted. The common ones are regression models and ordinary
polynomials. Study this topic on page 11.
In the causal forecasting method, the variable of interest, which is the one whose forecasts are
required, depends on other variables. It is thus the dependent variable. The ones on which the
variable of interest depends are known as the independent variables.
Discussion about dependence/independence
Note that Jabulanis customers are mostly people who received wages on a weekly basis. Some are
paid on Saturday afternoon, but an overwhelming majority is paid on Friday afternoon. In addition,
on Saturday afternoon, there is an item P that is also liked by many milk buyers. If item P is available
before milk arrives, then this item is bought in large quantities, leaving limited disposable income for
the milk purchases. Fortunately for Jabulani, he has in the past four weeks, managed to deliver milk
before item P was delivered. However, most of the buyers who are paid on Saturday tend to meet
the P seller before their milk purchases on Sunday morning.
It is necessary to understand dependencies and correlations when dealing with forecasting. If you fail
to understand them, you may fall in the trap of making wrong assumptions because influences that
may affect your forecasts and constraints coming with correlated variables may lead to developing
inaccurate models and thus leading to wrong forecasts.
Useful common examples are time series and causal methods. There are others as well, but the
following may be of help in your development.
17
STA2604/1
Rolling forecast is a projection into the future based on past performances, routinely updated
Moving average
Extrapolation
Trend estimation
Exponential smoothing
Linear prediction
Growth curve
Econometrics
Other methods
Simulation
Prediction market
These methods are given to you so that when you make references from other forecasting sources,
you will be able to understand where they belong in your module. However, they are not necessarily
required to the extent that is presented in those other sources.
ACTIVITY 1.7
Do you see any dependence of the variables?
18
DISCUSSION OF ACTIVITY 1.7
Keeping to the hint, the purchase of an item that is in high demand depends on the availability of
disposable income.
ACTIVITY 1.8
(a) Classify the milk sales in the latest scenario as a dependent or independent variable.
(b) Explain your choice in (a) above. Here confine your response to milk purchases and disposable
income.
(c) Identify the dependent variable and the independent variable.
The
most common ones you should expect to encounter (draw and interpret) are scatter diagram (or
scatterplot) and time plot. Revise them if you have already forgotten how they are drawn.
Further, you are soon going to engage in a number of calculations. Thus, ensure that you are
ready to perform them, and that you remember descriptive statistics your learnt in your early years
19
STA2604/1
of Statistics. It is also very important to be able to know why the calculations are necessary in any
exercise of building a forecast model.
Bowerman et al. (2005: 12) name two types of forecasts, the point forecast and the prediction
interval. A point forecast is a single number that estimates the actual observation. A prediction
interval is a range of values that gives us some confidence that the actual value is contained in the
interval.
The forecast error as defined in Bowerman et al. (2005: 13) requires that the estimate be found and
be paired with the actual observation.
In statistics, a forecast error is the difference between the actual or real and the predicted or forecast
value of a time series or any other phenomenon of interest. In simple cases, a forecast is compared
with an outcome at a single time-point and a summary of forecast errors is constructed over a
collection of such time-points. Here the forecast may be assessed using the difference or using
a proportional error. By convention, the error is defined using the value of the outcome minus the
value of the forecast. In other cases, a forecast may consist of predicted values over a number of
lead-times; in this case an assessment of forecast error may need to consider more general ways of
assessing the match between the time-profiles of the forecast and the outcome. If a main application
of the forecast is to predict when certain thresholds will be crossed, one possible way of assessing
the forecast is to use the timing-errorthe difference in time between when the outcome crosses
the threshold and when the forecast does so. When there is interest in the maximum value being
reached, assessment of forecasts can be done using any of:
the difference between the peak value of the outcome and the value forecast for that time point.
Forecast error can be a calendar forecast error or a cross-sectional forecast error, when we want to
summarize the forecast error over a group of units. If we observe the average forecast error for a
time-series of forecasts for the same product or phenomenon, then we call this a calendar forecast
error or time-series forecast error. If we observe this for multiple products for the same period, then
this is a cross-sectional performance error.
To calculate the forecast errors we subtract the estimates (
yi ) from the actual observation (yi ). The
difference is the forecast error. Can you tell what the values of the forecast errors imply? For
example, some may be smaller than others, some negative and others positive!
When Jabulani plans his sales, he makes some estimation of litres of milk that he hopes to sell. In
yi ):
Week 3 prior to getting to the market, he had made the following estimations (
20
Week
3
Day
1
2
3
4
5
6
7
Remember to refer to the appropriate week of the table of Data set 1.4 for observed values (yi ).
ACTIVITY 1.9
(a) On which days were there overestimation?
(b) On which days were there underestimation?
(c) Calculate the forecast errors for these estimates.
(d) Identify the day on which the milk sales were most disappointing! Explain.
(e) On which day did he make the best prediction? Why?
21
27
15
11
20
20
27
26
13
14
25
22
11
9
(a) Overestimations are visible after pairing by observing the pairs in which the actual observations
are lower than the estimates. These were on Day 1 and Day 5.
(b) Underestimations occurred on Day 2, Day 4, Day 6 and Day 7.
(c) The forecast errors are 6, 4, 0, 1, 1, 3 and 2 for the seven days, respectively.
(d) Day 1 was the most disappointing. This is because Jabulani expected to sell 27 litres but only
sold 21 litres. It is the day he made the biggest loss, that is with the largest negative error.
(e) He made the best prediction on Day 3, where the sales were equal to the estimates.
If there was no day when the sales and estimates were equal, then the day with the smallest forecast
error in absolute value would have been the one on which the best prediction was made. This means
that Day 4 and Day 5 are the days on which good predictions were made. However, we note that
21
STA2604/1
Day 5 was not a happy day for the seller because some stock was left unsold whereas on Day 4, all
stock was sold and one customer did not get milk.
Examining the forecast errors over time provides some information on the accuracy of the estimates.
- Random forecast errors demonstrate that patterns that existed in the data were considered when
the estimates were made (Figure 1.5 (a), Bowerman et al., 2005: 14).
- If there is an increasing (or decreasing) trend, and in making an estimation this trend was not
taken care of, then the scatter plot of forecast errors would reveal an increasing (or decreasing)
trend. On Figure 1.5 (b) of Bowerman et al., (2005: 14) an example is shown of a forecast error
plot that did not account for an increasing trend.
- If estimates of seasonal data did not account for seasonality, the scatter plot of forecast errors
would reveal the seasonal pattern that was not taken care of (Figure 1.5 (c), Bowerman et al.;
(2005: 14)).
- Similar arguments hold for cyclical data. In Bowerman et al. (2005: 14) Figure 1.5 (d) shows a
forecast error plot that did not account for cycles.
ACTIVITY 1.10
(a) Plot the forecast errors calculated in Activity 1.9.
(b) Do the data reveal any pattern that was not accounted for?
8
6
4
2
0
-2 0
-4
-6
22
(b) The plot looks almost random. This means that the forecasting technique provides a good fit to
the data.
The absolute deviations are the absolute values of the forecast errors, which we can recall from our
high-school days. The absolute deviations are thus
Absolute deviations (|e1 |)
ACTIVITY 1.12
Calculate the MAD for the estimates in Activity 1.9.
M AD =
=
i=1
|ei |
n
17
7
= 2.42857.
23
STA2604/1
ACTIVITY 1.13
Calculate the squared errors for the estimates in Activity 1.9.
36
16
24
ACTIVITY 1.14
Calculate the MSE for the estimates in Activity 1.9.
36
16
24
M SE =
=
i=1
e2i
n
87
7
= 12.42857.
Now, let us pause a little. We have done a few useful calculations. We have also answered a few
questions about errors.
24
Do you recall the value of the forecast error on the day that the estimate was perfect? Do you also
see what is meant by a poor estimate? Now can you say what is meant by a good estimate? You
will recall that the errors need to be as small as possible. So far it is not absolutely clear what small
entails.
The MAD and MSE are the measures that we will use to determine if the errors are small which will
indicate a good model. The objective is to select a good forecast model. The model that will be
selected must produce forecasts that are close to the actual observations. The MAD and the MSE
will serve as our tools to select a forecast model.
We need to understand the MAD and the MSE as they relate to the forecast model. The steps are
as follows:
MAD steps
Calculate forecast errors
Determine absolute deviations
Add the absolute deviations
Divide by their number
MSE steps
Calculate forecast error
Determine squared errors
Add the squared errors
Divide by their number
MAD is not in any way mad. It is an objective route to good forecasting. The MSE serves the same
purpose.
Sometimes the effectiveness of a model is measured in percentages. Such measures are the
absolute percentage error (APE) and the mean absolute percentage error (MAPE) (Bowerman et
al., 2005: 18).
ACTIVITY 1.15
Calculate the APE for the estimates in Activity 1.9.
6
21
4
15
0
20
1
27
1
13
3
25
2
11
28.5714
26.6667
0.00
3.7037
7.6923
12.00
18.1818
25
STA2604/1
AP Ei
M AP E =
i=1
ACTIVITY 1.16
Calculate the MAPE corresponding to the estimates in Activity 1.11.
DISCUSSION OF ACTIVITY 1.16
To calculate the MAPE we need the APE, which are
AP Ei
28.5714
26.6667
0.00
3.7037
7.6923
12.00
18.1818
We obtain
7
AP Ei = 96.8159.
i=1
96.8159
.
7
= 13.8308.
The intention when measuring the error is to reduce it to monitor and control to increase the accuracy
of these methods.
where e is the forecast error at period t, y is the actual value at period t, and F is the forecast for
period t. The summary of the methods given is given in the next table.
26
Measures of aggregate error:
Mean Absolute Deviation (MAD)
|et |
n
et
yt
MAPE =
n
e2t
MSE =
n
e2t
RMSE =
n
MAD =
Please note that business forecasters and practitioners sometimes use different terminology in the
industry. They refer to the PMAD as the MAPE, although they compute this volume weighted MAPE.
Please stick to the textbook notation.
27
STA2604/1
developing forecasts is higher than the benefits, a cheaper method must be used or forecasts
should not be developed. Also, the more complex forecasting methods are more expensive to
develop while simple ones are usually less expensive.
- Desired accuracy
Obviously, it is ideal that forecasts be perfectly accurate. Some situations require the best possible
accuracy level because of their high sensitivity. As an example, life-threatening situations such
as HIV/AIDS, typhoid, cholera and others, due to risk of loss of life, require the best possible
forecasts with superior accuracy.
- Data availability
When there are no numeric data or no detail, we cannot develop quantitative forecasts. Some
situations though, may have limited data, or data of a form that is not required. The forecaster
will have to accommodate the data and choose an appropriate method that will suit the data even
though it is not ideal for the problem. We are warned that forecasting methods give inaccurate
forecasts if inaccurate, outdated or irrelevant data are used to develop the forecasts.
- Convenience
Convenience in this case means the ease of use by the forecaster as well as his understanding of
the method. If the forecaster lacks understanding of the methods he or she uses, then there will
not be much confidence assigned to the forecasts.
ACTIVITY 1.17
Suppose that you are to develop forecasts for the number of tourists using the services of a tourism
organisation in the country. You are given data of the number of tourists using these services for
the years 2002 to 2007, and they have been increasing annually. You also realise from the graphs
provided that in the months of January, March, June and December the tourists used this company
even more.
(a) As a time series specialist you are requested to develop forecasts and the marketing manager
insists on a specific method. How would you react?
(b) Is the pattern of the data clear? Explain.
28
with the time series methodology. The method must be able to account for the high tourism
numbers in January, March, June and December. It must also be able to show the increasing
numbers.
(b) The patterns are clear. The four months with high tourist numbers indicate seasonality while the
increasing numbers indicate an increasing trend.
Litres of m ilk
30
25
20
15
10
5
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
29
STA2604/1
Here the data for the different weeks were combined so that the trend can be examined. There is an
increasing trend that is demonstrated by the trend line.
Can we determine the rate of increase? Here, the rate of increase is given by the equation of the
trend line. You must be able to show that the equation of the trend line is
y = 0.1571x + 16.365.
Litres of milk demanded
35
Litres of milk
30
25
20
15
10
5
0
Week
1
Week
2
Week
3
Week
4
30
Depending on the application needed to address a problem, a regression model can use quantitative
independent variables (that assume numbers) or qualitative independent variables (that assume nonnumerical values). Make sure that you understand the exposition on p. 21 of Bowerman.
This module requires full manipulation of simple linear regression models and some applications of
multiple regression models. In addition to the regression models, the scope of the module covers
time series, decomposition methods and exponential smoothing.
1.6 Conclusion
We have acquired useful introductory knowledge for the module. We defined forecasting, explained
its necessity, and explained qualitative and quantitative forecasting methods. Time series data were
discussed, its components explained, errors in forecasting were defined, as well as measures to
detect them. Factors for choosing a forecasting technique were discussed, and use of regression
analysis in forecasting was discussed briefly. As we know the use of exercises, the next exercises
are also intended to make you fit for the tasks ahead.
Self-evaluation exercises
Do exercises 1.1 up to 1.6 on page 25 of Bowerman et al. (2005).
If you encounter any problems with these exercises, do not hesitate to contact your lecturer. Just
indicate what is difficult for you.
You are welcome to discuss your solutions with the lecturer, and you are encouraged to do so by
sending these solutions directly to the lecturer(s) of the module.
31
STA2604/1
Assessment
Content
Activities
Feedback
- analyse
covariance
matrix
- correlations
- variance
inflation
- calculate
- test
hypotheses
- discuss each
activity
- comparison of
regression models
- use selected
measures
- R2 , adjusted
R2 , s
C -statistic
- perform
calculations
- explain
calculations
- residual analysis
- use plots
- residual plots
- assuming
forms
- calculations
- graph
plotting
- link with
the patterns
- diagnostics
- compare
measures
with limits
- leverage
points
- residuals
- calculate
measures
- plot graphs
- calculate
and discuss
measures
32
Where there are concepts that are necessary for us to learn a skill, we will look for the skills wherever
they are in the book. As an example, R2 appears in earlier chapters before Chapter 5. Many of these
concepts were dealt with in first-year Statistics. Fortunately they are all in the prescribed book.
This study unit deals with parts of chapter 5 of Bowerman et al. (2005: 221-278). We will highlight
some of the concepts:
Section 5.1
multicollinearity (pp. 222-226) with reference to the variance inflation factor on p. 224
R2 (pp. 226-227)
adjusted R2 (p. 228)
the standard error s (p. 227)
the C -statistic (p.230)
stepwise regression and backward elimination (pp. 231-235) read for interest sake only, not for
examination purposes
Section 5.2
residual plots (pp. 236-238)
the constant variance assumption (pp. 238-239)
the assumption of correct functional form (pp. 239-240)
the normality assumption (pp. 240-243)
the independence assumption (pp. 242-245)
Section 5.3
can be omitted
Section 5.4
the leverage values (pp. 255-256)
all kinds of residuals (pp. 257-258)
Cooks distance (pp. 258-259)
outlying and influential values (pp. 259-260)
33
STA2604/1
Some explanations
Time series data in this study unit shall consist predominantly of numeric data collected over regular
intervals. Similar to building a house on a good solid foundation, with intact walls and roof, in
forecasting you also need an appropriate framework to use your data wisely and then develop useful
(and not misleading) forecasts.
The four basic steps for this are as follows:
Step 1: Specify a tentative model.
Step 2: Estimate any unknown parameters.
Step 3: Validate the model.
Step 4: Develop the required forecasts.
In forecasting using time series, model building is the foundation. The model is an equation with
unknown parameters. If the parameters are wrong, the model would not provide correct predictions.
In addition, when a statistical analysis of a time series has been completed, we will often find that
there exist relationships between the variables of interest. It is important to know what to do with
these relationships, otherwise we may build models that do not represent the actual pattern of the
activity. The next topic explains this aspect of relationships.
2.2 Multicollinearity
We learnt about the correlation coefficient in first-year Statistics. When more than two variables
are considered, the correlation coefficient is generalised to the correlation matrix. Bowerman et al.
(2005: 223) presents an example of a correlation matrix. We also came across the coefficient
of determination when we studied regression. The correlation coefficient and the coefficient of
determination are useful in measuring multicollinearity.
We know from regression analysis that we may express a variable of interest (dependent variable) as
a function of other variables (independent variables). When two independent variables are related,
there is collinearity. If more than two independent variables are related, there is multicollinearity. An
extreme case of multicollinearity is singularity, in which an independent variable is perfectly predicted
by another independent variable (or more than one). Do you recall the value of the correlation
measure under perfect correlation? Justify your answer.
34
ACTIVITY 2.1
Provide an example of a real-life case where multicollinearity can exist.
y=
x1 =
x2 =
Surely, y depends on x1 and x2 . It is put to you that there are no grounds to believe that x1 and
x2 can be correlated. Do you have any counter reflection regarding this assertion? Think of other
examples. Your examples need not be in the form of mathematical equations. They should just get
you thinking.
ACTIVITY 2.2
Calculate the VIF for the Wednesday data.
Recall that in Data set 1.4 in Unit 1 we had the following data for Week 3:
Day
yi
yi
1
21
27
2
15
11
3
20
20
4
27
26
5
13
14
6
25
22
7
11
9
Hint: Recall the multiple coefficient of determination (p. 156 of the prescribed book).
35
STA2604/1
explained variation
1
where Rj2 =
.
2
total variation
1 Rj
When we look for possible relationships among the independent variables, independent variables
take turns to assume the role of a dependent variable regressed on the rest of the independent
variables. Then the coefficient of determination is calculated for each independent variable.
In the example under discussion (data tables 4.2 and 5.1 in the textbook), y was regressed on
x1 , x2 , ..., x8 . This means that the focus is on x1 , x2 , ..., x8 .
Let us inspect Table 5.2, page 224 of Bowerman et al. (2004). The eight variables of interest are
displayed on page 222, in the paragraph before Table 5.1. The correlation matrix and the SAS output
where the VIFs appear in the last column are given on pp. 223-224.
ACTIVITY 2.3
Suppose that you are given the following data together with the corresponding estimates.
y
y-estimates
39
36.1
41
33.9
33
37.3
45
40.2
29
31.7
42
38.9
21
34.8
39
36.1
41
33.9
33
37.3
45
40.2
29
31.7
42
38.9
21
34.8
36
The required squares are
i
(yi y)2
(
yi y)2
1
10.796
0.1488
2
27.939
3.2916
3
7.3673
2.5145
4
86.224
20.122
5
45.082
16.114
6
39.51
10.149
7
216.51
0.8359
Sum
433.4286
53.17571
(
yi y)2
53.1757
R2 =
i=1
n
i=1
(
yi y)2
(yi y)2
53.1757
433.4286
= 0.122686.
This is how we would calculate the coefficient of determination. The value of R2 is needed for VIF. In
calculating VIF though, only the independent variables are used. We alternate each one of them to
be regressed on the others.
NB: Rj2 = 0 implies that xj is not related to the other independent variables.
ACTIVITY 2.4
What is the value of V IFj when Rj2 = 0?
37
STA2604/1
The last case is used to explain the extent of multicollinearity. If the coefficient of determination of
one independent variable on others is very large (i.e., close to 1), the corresponding VIF is very large.
These two situations lead us to the guidelines for interpreting multicollinearity. To decide about the
severity of multicollinearity, we focus on the maximum VIF and the average of the VIFs. The guide
from Bowerman et al. (2005: 224) is to consider multicollinearity as severe if one of the following is
true:
The largest V IF > 10.
The mean of the V IF s is substantially greater than 1.
This means that if one of the above conditions is met, we can conclude that there is severe
multicollinearity between the independent variable that was regressed on and the others. However,
it is not easy to say what substantially greater than 1 means. We have to make it definite for the
sake of this module.
We rephrase the rule to be:
Consider multicollinearity as severe if one of the following is true:
The largest V IF > 10.
The mean V IF > 5.
ACTIVITY 2.5
Consider the sales territory performance data (p.222). Determine if we can conclude that there is
severe multicollinearity among the independent variables.
3.34262
1.97762
1.91021
3.23576
1.60173
5.63932
1.81835
1.80856
We find that the maximum V IF = 5.63932. This value is clearly not larger than 10, and we cannot
decide until the second condition has been checked. Upon calculating the mean, we find that
V IF = 2.6667 which is much less than 5. We conclude that the independent variables are not
severely multicollinear.
We have used the coefficient of determination to calculate the VIF in order to test for multicollinearity.
You may ask: Do we need this measure for other purposes? Yes, it does have other uses as well.
38
This measure was dealt with to some extent earlier. It is explored further in this section. When
we add an independent variable to a regression model, it decreases the unexplained variation
and increases the explained variation thus, increasing the R2 . This is true even when it is an
ACTIVITY 2.6
Make sure that you understand the behaviour of the measure R2 when an additional independent
variable is added to the regression model.
Adjusted R2
ACTIVITY 2.7
How does this measure behave when an additional independent variable is included in the regression
model?
39
STA2604/1
R2 , it will also increase R . Since these two measures do not seem to provide adequate assistance,
Consider the known notation used earlier. The sum of squared forecast errors (SSE) is defined
as
SSE = (yi yi )2 .
One criterion considered better than R2 and adjusted R2 for measuring the value of including an
additional independent variable is
s=
SSE
.
nk1
The guideline is that if s increases when we add another independent variable, then that
independent variable should not be added. It is desirable to have a small s. A large s is equivalent
to a long confidence interval. If we were to use the predicted interval length, short confidence
intervals are then indicators of a desired model. We will only use s in this module, but note that in
practice you may be required to use confidence intervals. Note the equivalence.
The next measure for comparing regression models that will be discussed is the C-statistic.
The C -statistic
The C -statistic, also called the Cp -statistic, is another valuable measure useful in comparing
regression models. Let s2p denote the mean square error based on a model using all p potential
independent variables. If SSE denotes the unexplained variation for another particular model that
has k independent variables, then the C -statistic for this model is
C=
SSE
[n 2 (k + 1)] .
s2p
40
ACTIVITY 2.8
Show that the C -statistic may be rewritten as
C=
SSE
+ 2k + 2 n.
s2p
ACTIVITY 2.9
It says in the description of SSE that we want SSE to be small. Explain why we want this measure
to be small.
(yi yi )2
SSE
=
.
nk1
nk1
In isolation we analyse
SSE = (yi yi )2 .
This is the sum of the squared differences between the actual values and the estimates. Ideally, if
the estimates are perfect predictions, they will replicate the actual values. Then the differences will
be zero. This will therefore result in SSE = 0, the smallest possible value of SSE. Therefore, if the
model used predicts the actual values satisfactorily, then the differences will be small and SSE will
be small.
Look at Example 5.1 (Bowerman et al. 2005: 228).
2
The output from MINITAB and SAS that appears on page 229 resulted from calculating R2 , R , s and
the Cp -statistic.
The MINITAB output gives the two best models of each size in terms of s, R2 and the C -statistic.
Thus, we find the two best one-variable models, the two best two-variable models, . . ., the two best
eight-variable models. Note that the adjusted R2 increases considerably when a second variable
is added. There is no problem with the inclusion of ACCTS because it is a good predictor of the
dependent variable.
41
STA2604/1
ACTIVITY 2.10
Use the output on p. 229 to answer the following.
(a) If a model with only two variables is to be used, which variables would you use?
(b) A model using five variables is the best. Do you agree? Justify your answer.
We now move on to residual analysis. If you have an interest in regression analysis you may study
stepwise regression on page 232 and backward elimination on page 235, but these two topics do not
form part of our syllabus.
Discussion
We know that most of the time series models we will develop in future as forecasters will not be 100%
accurate.
The error is e = y y, the deviation between the actual value and the estimate. In statistics we use
There are methods to deal with these deviations in statistics so that our predictions remain useful
regardless of the presence of the errors. We refer to them as residual analysis.
42
ACTIVITY 2.11
Indicate if the following measures use residuals or not. You may explain in the space provided:
Measure
Involves residuals
Yes
No
Explanation
Chapter 1
Forecast error
Absolute deviation
MAD
Squared error
MSE
APE
MAPE
Chapter 2
Mean
Standard deviation
VIF
R2
Adjusted R2
Standard error
Mean square error
SSE
C -statistic
This is very interesting. There are links among these measures. Do you see the links? This activity
also ensures that we revise previous work. Can you see how much we have learnt so far?
If you answered "yes" it is an indication of the importance of residuals. The vehicle we will utilise in
this module to show this importance, is residual analysis.
Residual analysis assists us in the prediction task. It helps us to detect errors in the model we
develop, and gives us an indication of whether we are on the right track.
For this we use graphical plots of residuals. We call them residual plots.
43
STA2604/1
ACTIVITY 2.12
From Unit 1, Data set 1.4 Week 3 was as follows:
Day
yi
yi
1
21
27
2
15
11
3
20
20
4
27
26
5
13
14
6
25
22
7
11
9
21
27
6
15
11
4
20
20
0
27
26
1
13
14
-1
25
22
3
11
9
2
Residual values
4
2
0
-2 0
-4
-6
3
1
0
1
-6
-8
Days
-1
2
7
44
Remember that we are using residual plots to test the assumption that e = (y y) has a normal
distribution with mean 0 and variance 2 . We use the above plot to test the constant variance
assumption. If the residuals are randomly distributed around the zero mean we can assume constant
error variance. If, however, the residual plot "fans out" or "funnels in" (see Figure 5.7 in textbook) we
have an increasing or decreasing error variance which implies that the assumption of constant error
variance is violated.
Let us share something with you about the residual plot for the milk data.
If you visually place the residual plot in the box below and use lines to explain its shape, it cannot be
appropriately explained by a parallel band of the following form:
Also, it does not look like it can be appropriately explained by a fan shape of the form
45
STA2604/1
Instead, it looks very much like it can be appropriately explained by a funnel shape of the form
Thus the residuals for the milk data violate the assumption of constant variance. You are urged to
view these shapes as presented in Figures 5.6 and 5.7 on p. 238 of Bowerman et al. (2005).
46
Residuals
Residual values
6
4
2
0
-2
-4
-6
-8
Days
The above plot shows no evidence of a bell shape. The normality assumption is violated.
We can also employ a normal plot of the residuals to determine normality. The procedure for the
normal plot is explained on p. 240 of the textbook.
ACTIVITY 2.13
Use a normal plot for the data of Activity 2.12 to determine whether the data come from a normal
distribution or not.
The
3i 1
, i = 1, 2, ..., 7 are
3n + 1
Value
0.090909
0.227273
0.363636
0.500000
0.636364
0.772727
1.336
0.747
0.345
0.345
0.747
1.336
0.909091
47
STA2604/1
Therefore, we plot
z(i)
1.336
0.747
0.345
0.345
0.747
1.336
e(i)
Two normal plots appear in Bowerman et al. (2005: 241). The discussion on p. 242 seems to
suggest that the straight line shape is not evident. What is your observation from the graph? That is,
do you agree with the authors?
48
Outliers are not necessarily errors, as we may be led to believe. They are often very high or very low
values that occur because of conditions that existed at the time they were observed. Some of them
may indicate a fortune while others may be an indication of a hardship. When high successes are
experienced, analysts may examine the factors that contribute to high levels of success. It is better
to take note of the conditions that are necessary to eliminate the outlier!
Be warned also that sometimes low values and high values may occur due to seasonality, not
because they are just outliers. Out of the time series context they may be judged as bad or good
while under the time series scope they may be normal values with a useful implication.
ACTIVITY 2.14
Are there outliers in the following data set? Identify them.
x
y
40
90
36
77
49
87
1207
46
23
290
38
79
27
58
44
66
45
87
30
66
ACTIVITY 2.15
Calculate the means and the standard deviations of the data in Activity 2.14.
49
STA2604/1
x
153.9
370.11
y
94.6
70.09
ACTIVITY 2.16
Remove the values which you said were outliers in Activity 2.14. Calculate the means and standard
deviations. Were these data points influential?
DISCUSSION OF ACTIVITY 2.16
The new data sets are
x
y
40
90
36
77
49
87
38
79
27
58
44
66
45
87
30
66
If you did not get the correct answers in Activity 2.14, this is the time to update your answers to that
question.
Mean
Standard deviation
x
38.65
7.520
y
76.25
11.780
Are there substantial differences from these measures based on the original data? Well, this is
obvious. What do you conclude?
ACTIVITY 2.17
Explain if outliers and influential observations are the same.
50
A leverage value is considered to be large if it is greater than twice the average of all the leverage
values which can be calculated as 2 (k + 1) /n, where k is the number of predictors and n the sample
size.
2.4.2 Residuals
In order to identify outliers with respect to their y -values, we can use residuals as before. The rule of
the thumb is that any residual that is substantially different from the others is suspect. This topic is
presented in Bowerman et al. (2005: 257). Before going any deeper, we should experiment with our
data and calculate the residuals.
ACTIVITY 2.18
This activity is included to give you a feeling for the calculations done when analysing residuals. It
is unrealistic data, just to prove the point. In real life this analysis will be done by a computer. Make
sure that you understand the computer output given for the exercises, p. 262-277.
The following data are given:
x
y
40
90
36
77
49
87
1207
46
23
290
38
79
27
58
44
66
45
87
30
66
(a) Find the regression equation y = a + bx using the method of least squares.
(b) Calculate the residuals.
(c) Identify residuals that are suspect.
b =
nxy xy
nx2 (x)2
593954
12328569
= 0.048
51
STA2604/1
and
y bx
n
a =
= 102.
(b) To calculate the residuals, we estimate y-values using the equation above and the following xvalues:
x
40
36
49
1207
23
38
27
44
45
30
100.08
100.272
99.648
44.064
100.896
100.176
100.704
99.888
99.84
100.56
79
100.176
58
100.704
66
99.888
87
99.84
66
100.56
21.176
42.704
33.888
90
100.08
77
100.272
87
99.648
46
44.064
12.648
1.936
290
100.896
10.08
23.272
189.104
(c) The residuals that are suspect are the fourth and the fifth ones, namely
e
1.936
189.104
12.84
34.56
52
ACTIVITY 2.19
Use the data of Activity 2.14 to calculate the studentised residuals.
Then
s =
SSE
n2
41346.94554
8
= 71.8914.
si = s 1 + Di .
Now
Di =
1 (xi x)2
+
n
SSxx
where
n
SSxx =
i=1
(xi x)2
= 1232856.9.
with x
=
xi
1539
=
= 153.9
n
10
53
STA2604/1
i
Di
1
0.1105229
2
0.111275
3
0.108926
4
0.999553
5
0.113898
i
Di
6
0.110896
7
0.113062
8
0.109797
9
0.109619
10
0.112452
Now we want
si = s 1 + Di .
They are
i
si
1
200.6332
2
200.7011
3
200.4889
4
269.2188
5
200.9379
i
si
6
200.6669
7
200.8624
8
200.5676
9
200.5516
10
200.8074
ei
.
si
1
0.2046521
6
0.153787
2
0.147832
7
0.071044
3
0.17183
4
1.748726
5
1.233615
8
0.077081
9
0.179804
10
0.104926
Since no value is greater than 2, studentised residuals do not suggest any outliers with respect to y .
How can an obvious outlier not be justified by our measure? Let us find an additional guideline.
Studentised deleted residuals may also be used. Thereafter we will also consult Cook (with his
distance) to provide it (Bowerman, 2005: 257-258).
except for observation i. This is done because if yi is an outlier with respect to its y -value, using this
observation to compute the usual least squares point estimates might draw the usual point prediction
yi towards yi and thus cause the resulting usual residual to be small. This would falsely imply that
observation i is not an outlier with respect to its y -value. Studentised deleted residuals are computed
by most software packages and denoted by RStudent (SAS) and TRES1 (Minitab) on p. 256.
54
ACTIVITY 2.20
Inspect the output on p. 256 of the textbook.
CDi =
(yi yi )2
Di
.
(n + 1) SSE (1 Di )2
Cooks distance is compared to F-critical values to see if it is significant. To guide us further we shall
use the following rule of the thumb:
A value of CDi > 1.0 would generally be considered large.
ACTIVITY 2.21
Study the output on p. 256. Which observation has a significant Cooks D? Which critical values did
you use to make your decision?
2.5 Conclusion
The study unit explained model building, and checking the model for usefulness by checking how
far it is deviant from real observations. Some useful statistics were introduced and experimentations
took place to appreciate them. These statistics are important and should be remembered. You are
55
STA2604/1
not required to memorise them. You are also not expected to derive them. However, you need to be
able to interpret computer output on this topic.
EXERCISES
Consider the values of the pair (X, Y ) given below:
i
1
2
3
4
5
6
X
2
15
11
100
25
9
Y
18
129
90
805
210
88
Calculate
(a) SSxx
(b) SSxy
where SSxy = xi yi
(xi ) (yi )
n
Open questions
(a) Why do we, as forecasters, have to study residuals, outliers, influential observations and the
underlying measures?
(b) What is the role of residuals and of deleted residuals? Clarify your answer. Do residuals also
explain deleted residuals?
(c) Why do we need to identify influential observations?
Textbook exercises
Exercise 5.4
Exercise 5.5
Exercise 5.7
Exercise 5.16
56
- detect
autocorrelation
- dummy variables to
model seasonality
Assessment
- data plots,
parameter
estimation
and
measures
- Durbin
-Watson
test, graphs
- regression
of
seasonality
using
dummy
variables
Content
Activities
Feedback
- model trend
using
polynomial
functions
- plot graphs,
experiment
with data and
interpret data
- discuss the
activity
- autocorrelation
detection,
DW statistic
- perform
exercises
with DW
- discuss the
activities
- find lengths of
seasonality,
develop
forecasts
- discuss the
activities
- modeling with
dummy
variables
3.1 Introduction
This unit is based on Chapter 6 of Bowerman et al. (2005), which is Time Series Regression. It
does not require full affluence in regression, your basic knowledge of polynomials will suffice. We
discussed regression models roughly in the past study units. There we stated that the variable
of interest (y), which is the dependent variable, is regressed on the variables (factors) on which it
depends. These factors vary freely, and the manner in which they vary breeds the manner in which
the dependent variable behaves. Since these factors vary randomly, they are random variables. In
the past two units we plotted and interpreted some graphs. Did you find them useful? Quadratic
equations were also dealt with at school. Do you remember the parabola? This is the graph of a
quadratic equation. You are welcome to refer to school textbooks for these graphs.
These topics, together with the ones we learnt in study units 1 and 2 such as the components of time
series, will be integrated in this study unit. Do you still remember the components of time series?
Attempt to name them.
We defined trend, seasonality and cyclic patterns in the earlier study units. We will treat trend as
it may occur in a linear pattern, a quadratic pattern and where there is no trend. The linear and
quadratic patterns will include decreasing and increasing trends.
One of the elements we dealt with in the previous study units is independence. Residuals are useful
in detecting if the data are independent or not. Time series data are observations of the same
phenomenon recorded over consecutive time periods. Hence, they cannot be fully independent. The
57
STA2604/1
usual relationship in time series data is autocorrelation. When the adjacent residuals have roughly
the same value and being correlated with each other we say they are autocorrelated.
Autocorrelation can be negative or positive. Positive correlation exists when over time, a positive
error term is followed by another positive error term and if over time, a negative error term is followed
by another negative error term. On the other hand, negative autocorrelation exists when over time,
a positive error term is followed by a negative error term and if over time, a negative error term is
followed by a positive error term. We will explore this idea further. Residual plots and the DurbinWatson statistic will be involved.
Do you remember that some data do not have a seasonal pattern? Analysing data will reveal the
presence or absence of seasonality and when present, we should be able to determine the pattern.
We will show how dummy variables and trigonometric functions may be used to deal with seasonality.
Growth curve models will also be studied. The unit will also show how to deal with autocorrelated
errors using first-order autocorrelated process.
where
yt = the value of the time series in period t
T Rt = the trend in time period t
t = the error term in time period t
Time series yt can be represented by an average level t , which changes over time according to the
equation, t = T Rt and by the error term t . As we recall that random fluctuations do often occur
in a process, the error term represents random fluctuations that cause yt values to deviate from the
average level t . The three trends that we are going to study in this module are no trend, linear trend,
and quadratic trend.
ACTIVITY 3.1
What do you think no trend means?
58
3.2.1 No trend
See point number 1 in the second rectangular box on page 280 of the textbook. In qualitative terms
one may describe the condition as stable. This is a case of no deterioration and no improvement,
therefore a case of no trend. In this case there is a general constant pattern displayed with no long
run growth or decline over time. In this case the trend takes some constant value 0 , and is modeled
as T Rt = 0 . For depiction of the shape of a process that shows no trend, see Figure 6.1 (a) on
page 281 of the textbook. Generally the case of no trend is undesirable, but it may happen. Who
would not want to see change?
Note that the case of no trend does not necessarily mean absolutely no change. If the changes
are shown by fluctuations (the ups and downs) in such a way that the average seems constant in the
long run, then we have no trend.
The values 0 and 1 of the above equation provide us with the shape of the line graph. Try to recall
the values that lead to various shapes.
ACTIVITY 3.2
Discuss the implications of the parameters 0 and 1 on the shape of the linear graph.
59
STA2604/1
In Figure 6.1 (b) of Bowerman an increase in the values on the horizontal axis is accompanied by an
increase in the values of the vertical axis.
of the behaviour of the dependent variable. The knowledge we acquired in the school years also
becomes handy here!
The quadratic trend may show either an increase or a decrease in the dependent variable. It is now
time to separate these two so that more details of each can be revealed.
Trend showing growth
The graphs showing growth are given in Figures 6.1 (d) and (e) on page 281 of the textbook. Growth
may occur at an increasing rate, which is shown in Figure 6.1 (d). It may also occur at a decreasing
rate, which is shown in Figure 6.1 (e).
Trend showing decline
The graphs showing decline are given in Figures 6.1 (f) and (g) on page 281 of the textbook. Decline
may occur at an increasing rate, which is shown in Figure 6.1 (f). It may also occur at a decreasing
rate, which is shown in Figure 6.1 (g).
A more general one is the pth-order polynomial trend given by:
yt = T Rt + t
= 0 + 1 t + 2 t2 + ... + p tp + t
ACTIVITY 3.3
Write down the equation for the 3rd -order polynomial trend model.
ACTIVITY 3.4
How would you identify the violations of the assumptions?
60
DISCUSSION OF ACTIVITY 3.4
We know that the behaviour of the residuals indicate what we missed in the estimation. A horizontal
band of the residual plot confirms a constant variance assumption. Fanning out indicates increasing
variance and fuelling in shows a decrease in variance. The normality assumption can be checked
using normal plots.
Apart from these, histograms and stem-and-leaf diagrams can reveal the
normality pattern as well. We leave the discussion of the independence assumption for Section
6.2.
y t
2
s 1+
1
1
[n1]
, y t s 1 +
.
n
n
2
61
STA2604/1
0
2
[n1]
in other books at your disposal. Here, is the sample size. What is the value of n?
as t ,n1 , as
2
Also,
n
s=
i=1
(yt y)
n1
We want the 95% prediction interval. What is the value of ? Immediately we write n = 24 and read
[23]
y = 351.2917;
i=1
n
s=
i=1
(yi y)2
n1
26314.96
= 33.82497
23
y t0.025 s 1 +
1
1
[23]
, y + t0.025 s 1 +
n
n
= (279.8647;
1+
1
; 351.2917 + 2.069 (33.82497)
24
1+
1
24
422.7187)
62
In general, since t has taken the role of x, we can show that the least squares point estimates of 1
and 0 , respectively, are:
b1 =
ntyt tyt
yi b1 t
and b0 =
.
2
2
n
nt (t)
Can you show these? Do you notice the equivalence with the equations on p.285 of Bowerman et
al. (2005)? In performing these calculations in detail we have:
Month t : 1
2
3
4
5
6
7
8
9
10
11
12
t2 : 1
4
9
16
25
36
49
64
81
100 121 144
Cod catch : 197 211 203 247 239 269 308 262 258 256 261 288
tyt : 197 422 609 988 1195 1614 2156 2096 2322 2560 2871 3456
Month t : 13
14
15
16
17
18
19
20
21
22
23
24
t2 : 169 196 225 256 289 324 361 400 441 484 529 576
Cod catch: 296 276 305 308 356 393 363 386 443 308 358 384
tyt : 3848 3864 4575 4928 6052 7074 6897 7720 9303 6776 8234 9216
The required ones are:
t = 300;
t2 = 4900;
yi = 7175;
tyi = 98973
ntyt tyt
24 (98973) (300) (7175)
= 8.0743
2 =
2
24 (4900) 3002
nt (t)
and
b0 =
We can forecast the cod catch for any future month in the year 3, year 4 and so on. For example, the
forecast for January of year 3 is the same as month t = 25 of the entire setup. Hence, this yields:
y25 = 198.0296 + 8.0743t = yt = 198.0296 + 8.0743 (25) = 399.8871
Suppose that we want a forecast for May of the seventh year. This is the 77th month (t = 77) in the
current model. Hence, the forecast is:
y77 = 198.0296 + 8.0743t = yt = 198.0296 + 8.0743 (77) = 819.7507
The point forecasts for January (i.e. y25 , ) and February (i.e. y26 , ) of year 3 have been calculated on
page 285 of Bowerman et al. (2005). Are you happy with the manner in which they are presented?
For linear trend, quadratic trend and polynomials of higher order, point estimation is adequate.
63
STA2604/1
ty
297
498
1020
1624
2320
2886
3843
4424
5004
6420
7370
8544
10504
11326
13005
13680
16405
16578
18164
19800
21399
22462
23759
27048
ty = 258380
(y y)2
50606.1
31314.1
71801.49
111527.9
153631.1
167246.6
227488.9
231320.6
234215.3
324852.1
357553.8
409546.2
541634.2
543107.1
631958.2
613023.2
797374
720729.7
781381.7
842646.9
896729.5
900521.3
923440.3
1112936
(y y)2 = 1167587
t = 300;
t2 = 4900;
yi = 1729;
(y y)2 = 11676587
ACTIVITY 3.5
Determine forecasts for the loan requests for April of year 7.
64
The forecasts for January and February of year 3 have also been calculated in Bowerman. Are you
comfortable with the respective values given for the subscripts 25 and 26, i.e. y25 and y26 ?
First, being the observation from the same variable, it is common for the time-ordered error terms
to be autocorrelated. When this happens, it violates the regression assumption that error terms
need to be independent. Interestingly, there is an easy way to determine if the error terms are
autocorrelated and to determine the direction (negative or positive) of the autocorrelation. The
pattern of autocorrelation has been discussed earlier.
10
12
-5
-1 0
14
16
65
STA2604/1
10
8
6
4
2
0
-2 0
10
12
14
16
14
16
-4
-6
-8
20
15
10
5
0
-5
10
12
-10
-15
66
residuals (a) 2 7 4
3
9 14
4 1 5 3 1
2
5 3
residuals (b) 2 7
4 3
9 0 4 1
5 3
1 1
5 3
residuals (c) 2
7 4
3 9 14 4
1 5
3 1
2 5 3
These residuals confirm the verdicts in the above discussion and also conform to the expressions
given by the paragraphs indicated as phrases? Later we will give a formula to calculate so that you
do not rely only on eye inspection. But before we get there, try the use of runs on page 290 of
Bowerman et al. (2005).
A run is simply a set of same signs following each other. If you can identify that signs of residuals
that follow each other appear as runs then we have a positive autocorrelation. If the signs alternate,
we have a negative autocorrelation. Where none of these patterns appear, then there is a random
pattern. This is the case where the assumption of independent errors is confirmed. The two cases
of autocorrelation are undesirable since they violate the assumption.
is called first-order autocorrelation. We write AR(1) for this case and represent it by the equation
t = 1 t + at . Here we assume that:
1 is the correlation coefficient between error terms separated by one time period; and
a1 , a2 , ... are values randomly and independently selected from a normal distribution having mean
d=
i=2
(ei ei1 )2
n
i=1
e2i
Positive autocorrelation is the first of the three versions that we look at in the use of the DW statistic.
67
STA2604/1
ACTIVITY 3.7
Use the DW test to determine if the following residuals are positively AR(1). Assume that the model
for the residuals was of the fourth power.
Error terms: 2
4 3 9 14 4
ei
2
7
4
3
9
14
4
1
5
3
1
2
5
3
1
4
Total
e2i
4
49
16
9
81
196
16
1
25
9
1
4
25
9
1
16
462
ei1
2
7
4
3
9
14
4
1
5
3
1
2
5
3
1
(ei ei1 )2
25
9
49
36
25
100
25
16
4
4
9
9
4
16
9
340
1 2 5 3
68
DISCUSSION OF ACTIVITY 3.7
1.
2.
d=
3.
We choose = 0.01.
4.
Since we assume that the model used was of the fourth power, and using = 0.01,
from the table, Table A6 Bowerman, page 599 we read off the values
corresponding to row k = 4 and n = 16. These values are
dL,0.01 = 0.53 and dU,0.01 = 1.66.
5.
(ei ei1 )2
340
= 0.7359
=
2
462
ei
While we are discussing this activity, one realises that the choice can be an important factor. To
illustrate this point, suppose in the above activity we chose = 0.05. We would therefore have
dL,0.05 = 0.74 and dU,0.05 = 1.93. The decision is to reject H0 since d < dL, . Interesting! What do
you think? In order to address the activity fully, the decision reached implies that at the 5% level of
significance we conclude that the error terms are positively autocorrelated.
DISCUSSION OF EXAMPLE 6.5, BOWERMAN (page 292)
At the beginning the example is based on a linear trend model. Hence, for that part, k = 1. For the
second part a quadratic trend is assumed, hence there k = 2. Do you realise why for the two models
we have different values of DW? We note that a wrong decision may be made if the error terms are
calculated from an incorrect model. We proceed to the DW test for negative AR(1).
69
STA2604/1
Error terms: -2 -7 4 -3 9 0 -4 -1 5 -3 1 -1 5 3 -4 9 -4
DISCUSSION OF ACTIVITY 3.8
1.
2.
d=
3.
4.
5.
(ei ei1 )2
359
= 0.3619
=
2
992
ei
When you are required to test any hypothesis, show the steps you follow. This is the reason we
formulated the steps formally for this test to make it easy. There is a tendency for students to start
with the statistics, then read of the table values and make a decision about a hypothesis that they
70
did not state. When this happens, note that it is meaningless. It is a serious academic offence. No
marks are awarded for it. You have been advised. We now move to the box on p.294.
The pattern is similar for the three tests. We note that the steps are the same for all three different
statistical hypothesis tests. There is one possibility when the test does not give a clue. We say that
the test is inconclusive, or the test fails.
ACTIVITY 3.9
Use the DW test to determine if the following residuals are positively or negatively AR(1). For
arguments sake, assume that the model used from where these residuals were derived was to
the fifth power. Use = 0.10.
Error terms: -2 7 -4 3 -9 14 -4 1 -5 3 -1 2 -5 3 -9 5 -2 7 -1
71
STA2604/1
2.
d=
3.
= 0.10
4.
5.
(a) We are tempted to reject H0 since d < dL,0.05 . Must we stop here?
(b) Clearly, 4 d = 3.7042 > dU,0.05 . We also must have d > dU,0.05 .
Since this is not the case, the decision to not reject H0 is not justified.
(ei ei1 )2
605
= 0.2958
=
2045
e2i
Conclusion: We reject H0 .
72
Apart from these the organizations need to know if the seasonal variation they experience at more
or less than the average rate.
Reasons for studying seasonal variation
There are three main reasons for studying seasonal variation.
1. The description of the seasonal effect provides a better understanding of the impact this
component has upon a particular series.
2. After establishing the seasonal pattern methods can be implemented to eliminate it from the timeseries to study the effect of other components such as cyclical and irregular variations. This
elimination of the seasonal effect is referred to as deseasonalising or seasonal adjustment of
data.
3. To project the past patterns into the future knowledge of the seasonal variations is a must.
4. Prediction of the future trend.
Assumptions
A decision maker or analyst must select one of the following assumptions when treating the seasonal
component:
1. The impact of the seasonal component is constant from year to year.
2. The seasonal effect is changing slightly from year to year.
3. The impact of the seasonal influence is changing dramatically.
Seasonal Index
Seasonal variation is measured in terms of an index, called seasonal index. It is an average that
indicates the percentage of an actual observation relative to what it would be if no seasonal variation
in a particular period is present. It is attached to each period of the time series within a year. This
implies that if monthly data are considered there are 12 separate seasonal; indexes, one for each
month and 4 separate indexes for quarterly data. The following methods are used to calculate
seasonal indices to measure seasonal variations of a time-series data.
1. Method of simple averages
2. Ratio to trend method
3. Ratio-to-moving average method
4. Link relatives method
In this module you will be required to developed forecasts by focusing on only two of these methods,
namely; method of simple averages and ratio-to-moving average method.
73
STA2604/1
An example
Now let us try to understand the measurement of seasonal variation by using the Ratio-to-Moving
Average method. This technique provides an index to measure the degree of the seasonal variation
in a time series. The index is based on a mean of 100, with the degree of seasonality measured by
variations away from the base. For example if we observe the hotel rentals in a winter resort, we find
that the winter quarter index is 124. The value 124 indicates that 124 percent of the average quarterly
rental occurs in winter. If the hotel management records 1436 rentals for the whole of last year, then
the average quarterly rental would be 359(1436/4). As the winter-quarter index is 124, we estimate
the number of winter rentals as follows:
359 (124/100) = 445
In this example, 359 is the average quarterly rental, 124 is the winter-quarter index, and 445 the
seasonalised spring-quarter rental.
This is method is also called the percentage moving average method. In this method, the original
data values in the time-series are expressed as percentages of moving averages. The steps and the
tabulations are given below.
Steps
1. Find the centered 12 monthly (or 4 quarterly) moving averages of the original data values in the
time-series.
2. Express each original data value of the time-series as a percentage of the corresponding centered
moving average values obtained in step (1). In other words, in a multiplicative time-series model,
we get
This implies that the ratioto-moving average represents the seasonal and irregular components.
3. Arrange these percentages according to months or quarter of given years. Find the averages over
all months or quarters of the given years.
4. If the sum of these indices is not 1200 (or 400 for quarterly figures), multiply then by a correction
factor = 1200/(sum of monthly indices). Otherwise, the 12 monthly averages will be considered
as seasonal indices.
74
Let us calculate the seasonal index by the ratio-to-moving average method from the following data:
Table data
Year/Quarter
2006
2007
2008
2009
I
75
86
90
100
II
60
65
72
78
III
53
53
66
72
IV
59
59
85
93
Now calculations for 4 quarterly moving averages and ratio-to-moving averages are shown in the
below table
Let Q = Quarter, MA = Moving Average, CMA Centered Moving Average, then we complete the
following table.
Year Q
y
2006 1 75
2 60
2007
2008
2009
4 MA total
53
59
1
2
3
4
1
2
3
4
1
2
3
4
86
65
53
59
90
72
66
85
100
78
72
92
263
263
263
267
274
287
313
323
329
335
343
4 MA
4 CMA (T)
274
= 61.75
4
258
= 64.50
4
65.75
65.75
65.75
66.74
68.50
71.75
78.25
80.75
82.25
83.75
85.75
61.75 + 64.50
= 63.125
2
64.50 + 65.75
= 65.125
2
65.125
65.125
66.245
67.62
70.125
75.00
79.50
81.50
83.00
84.75
83.96 90.60
2007
132.05
99.81
80.01 87.25
2008
128.34
96.00
83.03 104.29
2009
120.48
92.04
T otal
380.87 287.81
247 282.14
Mean (Seasonal index) 126.96
95.94
82.33 94.05
The total for the seasonal index is 126.96 + 95.94 + 82.33 + 94.05 = 399.28
Quater
Value
400
126.96 = 127.19
I
399.28
400
95.94 = 96.11
Adjusted seasonal index
II
399.28
400
82.33 = 82.48
III
399.28
400
94.05 = 94.22
IV
399.28
y
T
100
53
100 = 83.96
63.125
59
100 = 90.60
65.125
132.00
99.81
80.01
87.25
128.34
96.00
83.02
104.29
120.48
92.04
75
STA2604/1
Now the total of seasonal averages is 399.28. Therefore the corresponding correction factor would
be 400/399.28 = 1.0018. Each seasonal average is multiplied by the correction factor 1.0018 to get
the adjusted seasonal indices as shown in the above table.
Remarks
2. In a multiplicative time-series model, the seasonal component is expressed in terms of ratio and
percentage as
Seasonal effect = (T S C I)/(T C I) 100 = Y /(T C I) 100
However in practice the detrending of time-series is done to arrive at S C I . This is done by
3. The deseasonalized time-series data will have only trend (T ), cyclical (C) and irregular (I)
components and is expressed as:
(i) Multiplicative model: Y /S 100 = (T S C I)/S 100 = (T C I) 100.
(ii) Additive model: Y S = (T + S + C + I) S = T + C + I
ACTIVITY 3.10
Identify the types of seasonal variation for the time series:
(a) that takes the shape of an increasing linear trend
(b) described by y = ax2 + bx + c with a > 0, the y intercept is y = 1, one root is x = 1.
76
DISCUSSION OF ACTIVITY 3.10
(a) We are told that there is any seasonality. Then it will be constant seasonal variation. Look closely
at Figure 6.1, p.295 of the textbook. It is an example with this trend.
(b) You may plot a graph with these features. The description given is increasing seasonal variation.
Figure 6.1 (d), page 281 of the textbook is an example.
approximate some seasonal time series. Earlier we said that a time series with constant seasonal
variation is easy to work with. In this section we introduce dummy variables and discuss time series
analysis where we model seasonal variation by using dummy variables and trigonometric functions.
Trigonometric functions include functions such as cosine, sine, tangent, secant, cosecant and
cotangent. Exposition is limited to sine and cosine functions. In addition, they are not discussed
in great depth.
where
yt =
T Rt =
SNt =
t =
ACTIVITY 3.11
What is the value of T Rt , the trend for a time series with no trend? Write down the above equation
when there is no trend.
77
STA2604/1
Now, the error term t is a random variable. The assumption made about the error term is that it
satisfies the usual regression assumptions. Thus, we assume that the error terms have a constant
variance, are identically and independently distributed (IID) with a normal distribution. There is also
a further implication that the magnitude of the seasonal swing is independent of the trend. Let trt
and snt be estimates of T Rt and SNt , respectively. Then the estimate of yt is:
yt = trt + snt
Seasonality is a somewhat complex part in a time series. In the next section we use dummy variables
to model seasonality.
78
where the constants s1 ,
s2 ,
...,
xs(L1),t =
NOTEWORTHY POINTS
One of the season parameters has to be set at 0, and not necessarily the last one. However, it
is often more convenient to set the last one as we did. If we fail to set one of them to zero, least
squares estimation may prove to be complex or require an unusual approach.
The dummy variable model is based on the time series that display constant seasonal variation.
79
STA2604/1
With the estimates being completed, the model we will use to forecast the averages for January is:
yt = 0 + 1 t + 2
= 6.28756 + 0.00276t 0.04161
Now for January of the 15th year we note that 14 years include January of year 1 up to December
of year 14, which makes t = 168 months (14 12 months). Thus, for January of year 15 we have
t = 169. The forecast required is therefore:
y169
= 6.28756 + 0.00276 (169) 0.04161
= 6.71239
Since these were transformed data, the required value is y169 = e6.71239 = 822.534. It is as simple as
this.
y79
= 0 + 1t + 3
80
ACTIVITY 3.12
Consider the model:
yt = 5 2M1 + 4M2 + 3M3 14M4 + M5 + t
81
STA2604/1
M5 = 1.
y17 = 5 + 1 = 6
Note that the trend was given by T Rt = 5. This is a constant, which effectively implies that there is
no trend. We next look at the use of trigonometric functions.
82
where
f (t) = an expression of trigonometric functions at time t
2t
L
+ 3 cos
2t
L
2t
L
+ 3 cos
2t
L
+ t .
You may experiment with various values of t, but attend to the next exercise.
ACTIVITY 3.13
Simplify the model when:
(i) t = L
L
(ii) t =
2
2t
L
+ 3 cos
2t
L
+ 4 sin
4t
L
+ 5 cos
4t
L
2t
L
+ 3 cos
2t
L
+ 4 sin
4t
L
+ 5 cos
4t
L
83
STA2604/1
ACTIVITY 3.14
Simplify the model when:
(i) t = L
(ii) t =
L
2
(iii) t =
L
4
decomposition of time series may be multiplicative, we may end up with a form of a time series
that resembles growth curves. The question will be How do we handle it? since linear forms are
easier to handle, we will transform these to linear forms. If we assume that the parameters are
positive, we can transform the growth model to become:
ln yt = ln 0 + (ln 1 ) t + ln t
This is a familiar form once you understand how the transformation is done. We are allowed to work
on the transformed data and reverse the answers using inverse of the transformation used.
84
ACTIVITY 3.15
Write a model for AR(2), a second-order autoregressive model.
85
STA2604/1
3.9 Conclusion
This unit introduced important aspects of time series. It used graphical plots to demonstrate some
patterns, incorporated some applications of estimation in parts of the unit. Trend and seasonality
were discussed. The AR(1) process was also introduced, and the DW statistic used to detect
autocorrelation of the AR(1) process. Two types of seasonal variation, constant and increasing
seasonal variations were discussed. Dummy variables and trigonometric ratios were also discussed.
Growth models were introduced. We are ready for the next unit.
86
- deseasonalise a
time series
- forecast future
values of a
time series
Assessment
- describe, or
unpack a
series
- determine
moving
averages
- determine
seasonal
indices
- develop
forecasts
Content
Activities
- decompose
a time series
- work out
exercises
- MA,
trend analysis
- describe the
data trend
- MA,
seasonal indices
- centered MA
- isolate trend
and
seasonality
- incorporate
the trend in
the forecast
Feedback
- discuss the
activity
- discuss the
activities
- discuss the
activities
- discuss the
activities
4.1 Introduction
This is a continuation of concepts introduced in earlier chapters. Components of a time series should
now be at your fingertips. Are they? This unit deals with the decomposition of a time series, which
aims to isolate the influence of each of the components on the actual time series. It is presented as
Chapter 7 in the prescribed textbook.
Decomposition of time series is an important technique for all types of time, especially for seasonal
adjustment. It seeks to construct, from an observed time series, a number of component series (that
could be used to reconstruct the original time series by additions or multiplications) where each of
these has a certain characteristic or type of behaviour.
ACTIVITY 4.1
State the components of a time series.
DISCUSSION OF ACTIVITY 4.1
The answer you can find on page 325 of Bowerman, i.e., if you already forgot.
The components into which time series can be decomposed into are:
the Trend Component Tt that reflects the long term progression of the series
the Cyclical Component Ct that describes repeated but non-periodic fluctuations, possibly caused
87
STA2604/1
When a time series exhibits increasing seasonal variation, it is represented in this form. Statistical
analysis is useful for effective isolation and analysis of the trend and the seasonal components.
Hence, we will examine statistical approaches to quantify trend and seasonal variations. These are
the components that usually account for a significant proportion of the actual values in a time series.
Isolating them is an opportunity to explain the actual time series values.
The term trend may be seen as a tendency or resulting behaviour of occurrence of something
observed over a long term. In a nutshell, trend analysis is a term referring to the concept of
collecting information and attempting to spot a pattern, or trend, in the information. In some fields of
study, the term trend analysis has more formally-defined meanings.
88
For example, in project management, trend analysis is a mathematical technique that uses historical
results to predict future outcome. This is achieved by tracking variances in cost and schedule
performance. In this context, it is a project management quality control tool.
Although trend analysis is often used to predict future events, it could be used to estimate uncertain
events in the past, such as how many ancient kings probably ruled between two dates, based on
data such as the average years which other known kings reigned.
Moving Average (MA)
A MA is a successive averaging of groups of observations as explained in Bowerman. The number
n of observations averaged in each group must be the same throughout. It is determined by the
number of periods that span the short-term fluctuations. MA removes the short-term fluctuations in
a time series, and it smooths it.
We explain the MA for a 3-period MA. The following steps are involved:
Add observations for the first three periods and find their average. Place the answer opposite the
Remove the observation for the earliest period and replace it by the fourth measurement. Obtain
Repeat the process until you do not have enough observations to produce a MA of three periods.
Note that the above illustration used a case where you will be able to place the MA next to a middle
observation in the first average. The same will be easy when a 5-period MA is needed, or a 7-period
one. That is, for odd number MA we will not struggle to place the MA in the middle. There will be
practical cases where we need to use 2-period MA, 4-period MA, and so on. Study the examples in
Bowerman.
ACTIVITY 4.2
Consider the following data:
170, 140,
They were collected for three days over the regular time periods 812 noon, 124 p.m. and 48 p.m.
Calculate appropriate moving averages and explain the trend of the data.
89
STA2604/1
Average
Day 1
Day 2
Day 3
Morning
Afternoon
Evening
170
140
230
540/3 = 180
Morning
Afternoon
Evening
176
152
233
561/3 = 187
Morning
Afternoon
Evening
182
161
242
585/3 = 195
The average for each day has been placed opposite the midpoint of that day, i.e., the afternoon
period.
We need a trend figure for every period, not just for the afternoons.
It is not yet a
clearly moving average. We make them move by removing oldest and replacing with the newest
observations. The table becomes:
Moving average = trend
Day 1
Morning
170
Afternoon
140
Evening
230
Day 2
Morning
Afternoon
Evening
176
152
233
186
187
189
Day 3
Morning
Afternoon
Evening
182
161
242
192
195
Now, we answer the question about the trend. We note that the MAs are clearly increasing. This
simply informs us that on average, the above observations are increasing. Hence, we have an
increasing trend.
less Trend)
90
Group the deviations according to season
Average the deviations in the groups
ACTIVITY 4.3
Use the data in the previous activity to estimate seasonal deviations.
Morning
Afternoon
Evening
40
48
Day 2
Morning
Afternoon
Evening
10
35
44
Day 3
Morning
Afternoon
Evening
10
34
Due to random influences, values for the same periods differ. But it is clear that there is a common
pattern. For example, afternoon values of Actual Trend are similar in size and sign (40, 35, 34).
The same is true about the evening figures of 48 and 44; and with luck (probably due to the fact that
the sample size is very small) the morning figures are both 10. If in our analysis this pattern fails
to be visible, we either check our calculations, or we may have picked the wrong number of periods
over which to average in the first place.
In order to eliminate the random effect, we follow the instructions according to the second and third
bullets. That is, we collect together the Actual Trend corresponding to each period of the day and
find out what the average variation is for each period. See the next table:
Step 2
Day 1
Day 2
Day 3
Morning
10
10
Afternoon
40
35
34
Evening
48
44
Total
Average
20
10
109
36
92
46
The figures 10, 36 and 46 are called seasonal variations for the morning, afternoon and evening
periods. We have now isolated the trend and seasonal effects present in the time series. Knowledge
of seasonal effects is important for forecasting as well as for removing strong seasonal effects that
may conceal other important features or movements in a data set.
91
STA2604/1
Knowledge that there are random variations is of no use in forecasting, but being essentially
unpredictable, they serve as a guide to the reliability of a forecast. When there are very small random
influences a process is likely to produce reliable forecasts while large fluctuations may completely
upset even the carefully calculated forecasts.
140
144
4
230
228
2
176
176
0
152
151
1
233
235
2
182
182
0
161
159
2
The random variations are very small, the largest in magnitude coming from 4, which is 4/180 or
about 2 percent of the corresponding trend figure. This means that any forecast obtained from this
analysis may be expected to be reasonably reliable. The data fit the pattern of trend plus seasonal
variation reasonably well. Once these issues are in order, we need to develop forecasts. The next
section is dedicated to forecasting.
where
yt+k =
required forecast
period
T Rt+k
= trend for the period in question
period
SNadj
= appropriate seasonal adjustment
92
Thus, if a forecast is required for the afternoon of day 4, it would consist of two parts, namely:
Forecast = trend for day 4 afternoon + seasonal adjustment for afternoon period
Inspection of the trend figures shows that between Day 1 afternoon and Day 3 afternoon the trend
stretched from 180 to 195. This is an increase of 15, which happened during the lapse of 6 periods.
On average therefore, this means that the increase has been 2.5 units per period. We will assume
that this rate is going to continue to apply at least for the next few periods.
Now, this increase gives 197.5 as the trend figure for the evening of day 3; 200 for the morning of
day 4 and 202.5 for the afternoon of day 4. Adjusting this downwards by the seasonal variation of
36, this produces a forecast of 166.5, or 167 to the nearest unit.
ACTIVITY 4.4
The following data represent sales of pies on thousands in the various quarters of a year. Illustrate
by analysing the data to isolate trend and seasonality. Develop some forecasts for illustration.
Quarter
Value
1
142
2
54
3
162
4
206
1
130
2
50
3
174
4
198
1
126
2
42
3
162
4
186
This figure obviously belongs to the middle of the first year, which comes halfway in between the
second and third quarters. The table follows at the bottom of this explanation. This will be true
with other moving averages too. Therefore we need an additional step called centering the moving
averages. The results of this centering give rise to the centred moving averages (CMAs).
The moving average 141 applies to a point halfway between the second and the third quarters of
year 2001, while the figure 138 applies midway between the third and fourth quarters, we can obtain
the moving averages directly comparable with the fourth quarter by taking the average of 141 and
138, which is 139.5. Doing the same for all moving averages, we obtain:
93
STA2604/1
Quarter
1
2
3
4
Actual
142.00
54.00
162.00
206.00
Moving average
141.00
138.00
139.50
137.50
22.50
68.50
1
2
3
4
130.00
50.00
174.00
198.00
137.00
140.00
138.00
137.00
138.50
139.00
137.50
136.00
-8.50
-89.00
36.50
62.00
1
2
3
4
126.00
42.00
162.00
186.00
135.00
132.00
129.00
133.50
130.50
-7.50
-88.50
We average the quarterly variations and round them up so that they add to zero.
2001
2002
2003
Total
Seasonal (Average)
Quarter 1
-8.5
-7.5
Quarter 2
-89.0
-88.5
Quarter 3
22.5
36.5
-
Quarter 4
68.5
62.0
-
-16
-8
-177.5
-88.75
59
29.5
130.5
65.25
Let Q = quarter, MA = moving average, CMA = centered MA, then we complete the table thus:
Q
1
2
3
4
Actual
142.00
54.00
162.00
206.00
MA
CMA (trend)
Expected
Random
141.00
138.00
139.50
137.50
22.50
68.50
169.5
203.5
-7.5
2.5
1
2
3
4
130.00
50.00
174.00
198.00
137.00
140.00
138.00
137.00
138.50
139.00
137.50
136.00
-8.5
-89.00
36.50
62.00
130.5
51.0
167.5
202.0
-0.5
-1.0
6.5
-4.0
1
2
3
4
126.00
42.00
162.00
186.00
135.00
132.00
129.00
133.50
130.50
-7.50
-88.50
126.0
42.5
0.5
-0.5
The greatest random variation is 7.5, only about 5% of the corresponding trend value of 139.5. Hence
the data fit the model well and the analysis would thus provide reliable forecasts.
From third quarter 2002 there has been a steady downward trend. Over the time from then to the
latest trend figure available (third quarter 2003) the trend declined from 139.5 to 130.5; that is, over
four quarters or one year its decrease was 8.5. In first quarter 2003 the trend value was 133.5. If
we assume that the annual decrease is going to persist at least for a while then we would expect the
trend in first quarter 2004 to be 133.5 7.5, or 126.
94
We now adjust this to allow for the fact that the spring quarter is, on average, 8 below trend, giving a
final forecast of 126 8 = 118 for first quarter 2004.
EXERCISES
(a) The following figures show the weekly demand at an electrical repair workshop for a certain type
of connector over a 10-week period:
Week number
Number demanded
1
27
2
23
3
23
4
25
5
26
6
29
7
25
8
221
9
22
10
24
11.
(b) Consider the average number of calls received per day at a Computer Club Warehouse (CCW)
call centre for the past three years. You will also realise that the pattern of the call volumes can
be of help in the analysis. What can one observe by merely looking at these figures? Perform
appropriate time series analysis. The next table presents the data.
Year
1
1
1
1
2
2
2
2
3
3
3
3
Quarter
1
2
3
4
1
2
3
4
1
2
3
4
Call volume
6809
6465
6569
8266
7257
7064
7784
8724
6992
6822
7949
9650
95
STA2604/1
with the known notation. The book enlightens that the centered moving average is estimates of
T Rt + CLt . In the last section we calculated some moving averages. We make an additional
Season
Spring
Summer
Autumn
Winter
Number sold
142
54
162
206
2002
Spring
Summer
Autumn
Winter
130
50
174
198
2003
Spring
Summer
Autumn
Winter
126
42
162
186
4.4 Conclusion
Decomposition was done in multiplicative and additive forms. Experimentation was then done with
various data sets. As we saw, it is just straightforward use of concepts as defined. Some additional
notes to cover explanations in the book have also been provided, we hope it makes the work easier.
We encourage you to do more exercises in the prescribed book to get used to the methods. What is
left is to continue by doing more exercises in the prescribed book.
96
Assessment
- analyse data
- perform simple
exponential
- explore data
with various
smoothing
constants
- monitor the
forecasting system
- measure the
strength of
forecasts
- know various
smoothing
approaches
- determine
aptness of
various
methods
- forecast future
values of a
time series
- develop
forecast
values
Content
- exponential
- smoothing
constants
- damped trend
- simple
exponential
soothing
- tracking signals
- Holts trend
corrected
smoothing
- Holt-Winters
method
- Damped
trend
method
- Holts, HoltWinters and
damped trend
Activities
- perform
appropriate
calculations
- perform
calculations
- interpret
data
- calculate the
statistics
- interpret
them
- perform apt
calcuations
for each
method
- perform
calculations
Feedback
- discuss
likely
errors
- explain
alternative
methods
- discuss the
solutions
- discuss the
solutions
- discuss the
solutions
5.1 Introduction
Changing trend and seasonality of a time series over time makes forecasting difficult to undertake.
This is when exponential smoothing becomes useful. Exponential smoothing is presented in Chapter
8 of Bowerman. Smoothing constants are used to smooth a rough time series. In this module we
study various smoothing methods, and a tracking method to monitor the process. The methods are
simple exponential, Holts trend corrected exponential, Holt-Winters, and damped trend exponential.
A common way to characterise exponential smoothing is to consider it as a technique that can be
applied to time series data, either to produce smoothed data that are to be presented, or to develop
forecasts. The observed phenomenon may be an essentially random process, or it may be an
orderly, but noisy, process. Different smoothing techniques are available as presented in this unit,
and each one for a specific purpose. For example, simple moving average is one in which the past
observations are weighted equally, and exponential smoothing is one which assigns exponentially
decreasing weights over time. Exponential smoothing is commonly applied to financial market and
economic data, but it can be used with any discrete set of repeated measurements.
97
STA2604/1
is used for forecasting when there is no trend or seasonal pattern and the mean of the time series
remains constant.
In some statistics textbooks, this equation is commonly written as yt = + t .
The least squares point estimates of the mean 0 is b0 = y, where:
y=
1 n
yt .
n i=1
Do you remember this formula? Equal weights are given to each observation as
y=
n 1
1 n
yt .
yt =
n i=1
i=1 n
Under these conditions, we require a model that would describe the data more suitably, and estimates
for the mean that may change from one time period to the next. SES is one such method; it does not
use equal weights. Instead, more recent observations are given more weight.
ACTIVITY 5.1
Indicate True or False for each the following statements about SES. In case of False, correct the
statement. Justify the correct statements.
(1) Estimate of the mean is constant.
(2) Estimate of the mean changes over time.
(3) Oldest observations receive the most weight.
(4) Newest observations receive the average weight.
98
DISCUSSION OF ACTIVITY 5.1
(1) Estimate of the mean is constant.
The statement is false. SES is used because of its changing mean to suit the no trend model that
has a mean that changes slowly over time.
(2) Estimate of the mean changes over time.
True. The nature (or formulation) of SES is such that it caters for a mean that changes over time.
(3) Oldest observations receive the most weight.
False. Oldest observations receive the least weight.
(4) Newest observations receive the average weight.
False. They receive largest weights.
We release you from suspense and define SES formally. Let y1 , y2 , ..., yn be a time series with a
mean that is changing slowly over time but having neither a trend nor seasonal pattern. Then the
estimate for the level (or mean) of the time series in period T is:
T
= yT + (1 )
T 1
where
=
T 1
The value of determines the degree of smoothing and how responsive the model is to fluctuation in
the time-series data. This value is arbitrary and is determined both by the nature of the data and the
sensitivity of the forecaster as to what constitutes a good response rate. A smoothing constant close
to zero leads to a stable model while a constant close to one is highly reactive. Typically, constant
values between 0.01 and 0.3 are used.
Let us illustrate with the data we have seen before in order to feel comfortable at the early stage of
SES exploration.
1
362
2
381
3
317
4
297
5
399
6
402
7
375
8
349
9
386
10
328
11
389
12
343
Month t :
Cod catch:
13
276
14
334
15
394
16
334
17
384
18
314
19
344
20
337
21
345
22
362
23
314
24
365
99
STA2604/1
If you recall, the plots for these data showed that the data had no trend and no seasonality. The initial
estimate of the level is
0.
which is:
0
1 12
1
(362 + 381 + 317 + ... + 343)
yt =
12 t=1
12
1
(4329) = 360.6667
12
In order to illustrate, we use = 0.1. Does it satisfy the given restriction? We want to explore by
determining levels from these data.
1
= y1 + (1 )
= y2 + (1 )
24 .
Do you remember the forecast errors? These are shown in Figure 8.1, Bowerman et al. (2005: 348).
In SES, a point forecast at time T of any future value yT + is the last estimate
time series. Why should it be like this? We have just said that a point forecast made in time period T
for yT + is:
yT + =
( = 1, 2, 3, ...)
ACTIVITY 5.2
Write down the point forecast in time period t 1 of the value yt1 .
DISCUSSION OF ACTIVITY 5.2
Because of no trend and no seasonal pattern we have y (t 1) = yt1 =
t1 .
We dealt with the standard error(s) and the sum of squares for error (SSE) in the earlier chapters.
The current version is that the standard error at time T is:
T
s=
SSE
=
T 1
t=1
(yt
2
t1 )
T 1
100
For any , a 95% prediction interval computed in time period T for yT + is:
T
z[0.025] s 1 + ( 1) 2 ;
+ z[0.025] s 1 + ( 1) 2
ACTIVITY 5.3
Write down the formula for a 95% prediction interval computed in time period T for yT + when:
(i) = 1
(ii) = 2.
= y24 + (1 )
23
24
= 354.5400.
For prediction intervals we need the value of the standard error. Can you show that s = 34, 95? Now
verify the given 95% prediction intervals.
The example further assumes a new observation, y24 = 384. The calculations for
should be calculated anew. These are:
25
= y25 + (1 )
24
25
and y24+
101
STA2604/1
The point forecast made in month 25 of the cod catch in month 36 and future ones is:
y25+ =
25
= 355.5416
ACTIVITY 5.4
Write down the model
= yT + (1 )
T 1
in terms of
T 1
and yT
T 1 .
= yT + (1 )
= yT +
T 1
T 1
T 1
T 1
+ yT
T 1
T 1
+ (yT
T 1 )
This form is called the error correction form. We move to the next section.
Y (, T ) =
et ().
t=1
ACTIVITY 5.5
Determine the sum of forecast errors for T = 24 using Figure 8.1 on page 348 of Bowerman.
102
DISCUSSION OF ACTIVITY 5.5
Forecast errors are in column E.
ACTIVITY 5.6
Show that:
Y (, T ) = Y (, T 1) + eT () .
Y (, T ) =
et () =
T 1
t=1
et () + eT ()
t=1
= Y (, T 1) + eT ()
ACTIVITY 5.7
Verify the above equation using the data in Bowerman given in Figure 8.1.
M AD =
t=1
|et |
M AD (, T ) =
t=1
|et ()|
T
Y (, T )
.
M AD (, T )
If C (, T ) is large, then the sum of forecast errors Y (, T ) is large relative to the mean absolute
deviation M AD (, T ). This means that the forecasting system produces errors that are either
consistently positive or consistently negative. This means that a large C (, T ) value shows that
103
STA2604/1
the forecasting system produces forecasts that are consistently smaller or consistently larger than
the actual time series value. If the forecasting system is accurate, it should produce (at least
approximately) an equal number of negative and positive errors. Thus, a large C (, T ) indicates
that the forecasting system does not perform accurately. Note that we have still have not quantified
what a large value of C (, T ) means. There are no hard and fast rules for it. It will be given with
every situation.
ACTIVITY 5.8
Determine the simple cusum tracking signal using the data in Figure 8.1 in Bowerman et al. (2005:
348). Suppose that the forecasting system will be considered accurate if the value of the simple
cusum tracking signal is below 255 in absolute value. Do you think that the forecasting system
needs to be improved?
104
Growth rate
Now, regardless of whether the change 1 is an increase or a decrease, it is called the growth rate.
Holts trend corrected exponential smoothing is appropriate when both the level and the growth
rate are changing. In this case a linear trend model is not useful. For the Holts trend corrected
exponential smoothing, let
T 1
and bT 1 be the corresponding estimate of the growth rate. If we observe a new time series value yt
in time period T , these two estimates require two smoothing equations to be updated.
The estimate of the level of the time series in time period T uses the smoothing constant and is:
= yT + (1 ) (
T 1
+ bT 1 )
The estimate of the growth rate of the time series in time period T uses the smoothing constant
and is:
bT = (
T 1 ) + (1
) (bT 1 )
+ bT
( = 1, 2, 3, ...)
s=
SSE
=
T 2
t=1
[tt (
T 1
+ bT 1 )]2
T 2
+ bT ) z0.025 s,
+ bT ) + z0.025 s]
In general, for 2, a 95% prediction interval computed in time period T for yT + is:
(
+ bT ) z0.025 s 1 +
1
j=1
2 (1 + j);
+ bT ) + z0.025 s 1 +
2 (1 + j)
j=1
ACTIVITY 5.10
Write down the formula for a 95% prediction interval computed in time period T for yT + when:
(i) = 2
(ii) = 3.
105
STA2604/1
In order to handle this model, it is easier to analyse the trend and the seasonal component separately.
The seasonal component can also be handled using the dummy variables if necessary. This method
is appropriate when a time series has a linear trend with an additive seasonal pattern for which the
level, the growth rate, and the seasonal pattern may be changing. Implementation of the additive
Holt-Winters method starts with estimates of the level, the growth rate and the seasonal factor. Let
T 1
denote the estimate of the level in time T 1, and bT 1 the estimate of the growth rate in
time T 1. Suppose that we observe a new observation yt in time period T and let snT L be the
latest estimate of the seasonal factor in time period T . As before, L is the number of seasons. The
subscript T L of snT L is to reflect that the time series value in time period T L is the most recent
time series value observed in the season being analysed. Thus, this most recent time series value
is used in determining snT L .
The estimate of the level of the time series in time period T uses the smoothing constant and is:
T
= (yT snT L ) + (1 ) (
T 1
+ bT 1 )
106
where (yT + snT L ) is the deseasonalised observation in time period T. The estimate of the growth
rate of the time series in time period T uses the smoothing constant and is:
bT = (
T 1 ) + (1
) (bT 1 )
The new estimate for the seasonal factor SNT in time period T uses the smoothing constant and
is:
snT = (yT
where (yT
T)
T ) + (1 ) snT L
+ bT + snT + L
( = 1, 2, 3, ...)
where snT + L is the most recent estimate of the seasonal factor for the season corresponding to
time period T + .
yT + (T ) z0.025 s c ; yT + (T ) z0.025 s c
where
c
for = 1
=1
=1+
2 (1 + j)2
for = 2, 3, ..., L
j=1
=1+
1
j=1
where
dj,L = 1 if j is a multiple of L
= 0 otherwise
ACTIVITY 5.11
Suppose that there is a well-known commodity that is transported by the largest international
shipping and transportation company from a foreign country, which is seasonal over the quarters
of a year.
(a) Determine the appropriate c .
(b) Evaluate dj,L when:
(i) j = 2
(ii) j = 12
107
STA2604/1
for = 1
=1
=1+
2 (1 + j)2
for = 2, 3, 4
j=1
=1+
1
j=1
for = 1
=1
=1+
2 (1 + j)2
for = 2, 3, 4
j=1
=1+
1
j=1
c1 = 1
c2 = 1 + 2 (1 + )2
c3 = 1 + 2 (1 + )2 + 2 (1 + 2)2
c4 = 1 + 2 (1 + )2 + 2 (1 + 4)2
s=
SSE
=
T 3
t=1
[yt (
t1
T 3
The error correction form for the smoothing equations in the additive Holt-Winters method is made
of:
T
bT
snT
T 1
+ bT 1 + [yT (
= bT 1 + [yT (
T 1
T 1
+ bT 1 + snT L )]
+ bT 1 + snT L )]
= snT 1 + (1 ) [yT (
T 1
+ bT 1 + snT L )]
108
In Unit 4 we showed how to estimate the fixed seasonal factors, SNt , by using centred moving
averages. The level at time period T 1 for this model is given by 0 + 1 (T 1), and the level at
time period T is given by 0 + 1 T . This shows a growth rate for the level is 1 .
Implementation of the multiplicative Holt-Winters method starts with estimates of the level, the growth
rate and the seasonal factor. Let
T 1
estimate of the growth rate in time T 1. Then, suppose that we observe a new observation yT in
time period T , and let snT L be the latest estimate of the seasonal factor in time period T . As before,
L is the number of seasons. The subscript T L of snT L is to reflect that the time series value in
time period T L is the most recent time series value observed in the season being analysed. Thus,
this most recent time series value is used in determining snT L .
The estimate of the level of the time series in time period T uses the smoothing constant and is:
T
yT
snT L
+ (1 ) (
T 1
+ bT 1 )
where (yT + snT L ) is the deseasonalised observation in time period T . The estimate of the growth
rate of the time series in time period T uses the smoothing constant and is:
bT = (
T 1 ) + (1
) (bT 1 )
The new estimate for the seasonal factor SNT in time period T uses the smoothing constant and
is:
snT =
yT
T
where
yT
+ (1 ) snT L
+ bT ) snT + L
( = 1, 2, 3, ...; )
where snT + L is the most recent estimate of the seasonal factor for the season corresponding to
time period T + .
109
STA2604/1
+ bT )2
c2 = 2 (1 + )2 (
c3 = 2 (1 + 2)2 (
+ bT )2 + (
T
+ 2bT )2
+ bT )2 + 2 (1 + )2 (
+ 2bT )2 + (
+ 3bT )2
sT =
t=1
yt yt (t 1)
yt (t 1)
T 3
t=1
yt ( t1 + bt1 ) snT L
( t1 + bt1 ) sntL
T 3
The error correction form for the smoothing equations in the additive Holt-Winters method is made
of:
T
bT
snT
T 1
+ bT 1 +
= bT 1 +
yT (
+ bT 1 ) sntL
snT L
T 1
yT ( T 1 + bT 1 ) snT L
( T 1 + bT 1 ) snT L
= snT 1 + (1 )
yT ( T 1 + bT 1 ) snT L
( T 1 + bT 1 ) snT L
ACTIVITY 5.12
Which values of the damping factor are associable with:
(a) Meager (or weak) dampening?
(b) Substantial dampening?
110
DISCUSSION OF ACTIVITY 5.12
We know that the values of the dampening factor lies between 0 and 1. The values near 0 have less
dampening effect than the ones near 1. Hence:
(a) Meager (or weak) dampening will be effected with values near 0.
(b) Substantial dampening will be effected with values near 1.
One may need to know why the value 0 and 1 are excluded as possible.
ACTIVITY 5.13
What will happen if the dampening factor can be equated:
(a) to 0?
(b) to 1?
5.7 Conclusion
The chapter discussed various forecasting models that are used under specific conditions. Important
conditions such as various forms of variation were all given with the methods and you should
familiarise yourself with them. Discussions are based on the textbook.