Sie sind auf Seite 1von 3

Sales forecasting techniques allow companies to predict sales, and once a company has what it

believes is an accurate estimate of future sales it can then quantify finance needs, estimate the
quantity and cost of purchases of raw materials as well as determining production level.
A fundamental problem in sales forecasting is predicting if sales will increase or not. This is
particularly difficult in case the sales of a given product exhibits seasonality. In time series data,
seasonality is the presence of variations that occur at specific regular intervals and repeats regularly.
For example, the sales of ice cream will be higher in the warmer seasons and lower in the colder
seasons.
The provided dataset contains past sales of a multinational ice-cream company that sells ice-creams
in 120 different Countries. Each row in the dataset is a time-series: the input x is time (scaled and
transformed in the interval [0,1]) and the output y represents the sales in one of the Country (this
was also scaled and transformed). There are 120 rows (Countries).
Goal
Construct an algorithm that, for each Country:
1. removes the seasonal component in the time-series (if a seasonal component is present);
2. predicts if the non-seasonal component of sales is increasing or not.
That is the prediction is binary: value=1 if the non-seasonal component is increasing; value=0 if the
non-seasonal component is not increasing.
Remarks
1. Some time-series may not have a seasonal component. At low latitudes, there is not significant
seasonal change.
2. The original time-series duration is unknown. Time is assumed to be normalised into the interval
[0,1] (100 evenly spaced numbers in the interval [0,1])

Task¶
We have provided a notebook that implements a baseline model to solve the sales-forecast
problem. The goal is to show you an initial path towards the solution. The performance of that
baseline algorithm is not good so your goal is first to understand

 why it is not working well


 how to improve it

We have listed a set of possible ideas to improve the algorithm, but you can follow other directions.
The goal of this ML competition is modelling, you need to think about how to build a general model
that can address the sales-forecast problem. The problem is one-dimensional (for each Country), so
it is not hard computationally and you can plot the data and plot the output of your model. The real
challenge here is to think about a general probabilistic model (prior and likelihood), thinking about
basis functions, linear, non-linear, periodic functions, model selection.

How can you verify if your model is good or not?¶


We have provided a training dataset with the true label. However, you can also generate some data
and test your model by assuming a hypothesis about sales trend (increasing, decreasing, linear,
non-linear) and about the presence and type of seasonality. Sales have nonlinear trends, but we
expect that they are quite smooth functions.
This is what we did in the baseline model. We tried to generate data that are easy/difficult for our
model and check its predictions. Probabilistic ML is about building generative model of the data
(prior and likelihood), so when you write down your model you know your assumptions and you
know the limits of your model that you can test by generating artificial data.

Output¶
The goal of the competition in Kaggle is having fun.
Your final output for Assignment 4 will be a notebook and we will mark that notebook. Each group
has to upload a single notebook (the group leader is responsible to upload the notebook) in Sulis,
that describes your model, the performance and weak points of your model. We must be able to
understand the details of your model and its performance without reading the Python code. So it is
important that you add markdown cells describing the details of model and its performance. You
have to write the notebook (and comments) thinking about a reader that doesn't know about the
model you derived, so please explain what you are doing. You don't need to be too verbose but you
must comment your model and your modelling decisions.
Please use the following filename Assignment4_Groupname.ipynb (or a zip file) and at the top of
your notebook report the names and studentId of all members of your group.
Add a Summary: the first section in your notebook should be a summary of the content of the
notebook and it should include a comment about the results of your model(s). This section cannot
exceed 300 words.

Progress-chart¶
We will use the following progress-chart to assess task completion. Your goal is to implement a
model that solve the task, a way to do that is to explore some of the ideas we listed at the end of the
baseline model notebook and possibly propose new ideas.
* Beginning (30%): the new model is different from (or an extension of) baseline, but it has st
ill several weak-points.
* Developing (75%): the probabilistic model is better than baseline, but it is not very reliable.
Its performance strongly depends on modelling choices, tuning of priors parameters, thresholds et
c..
* Accomplished (95%): the algorithm works well, it is reliable and general, and achieves a high a
ccuracy in the test dataset.
* Exemplary (100%): the algorithm includes some original modelling ideas and original ways of mak
ing decisions.
In the above progress-chart, we mentioned as a criterion "improving performance of the baseline
model". It may be that your model is potentially better than baseline but it does not really increase
the accuracy. It may also be the case that you explore different solutions and at the end they don't
perform well. If this is the case, we encourage you to write down the different models you tried (a
description of the model, prior, likelihood etc. and a comment on the model performance). In other
words, accuracy is not the main criterion we use to evaluate your work. We are more interested in
the model(s) and in the way you use that model to make decisions. You can accomplish the task
even if your model does not have a good performance, but you showed that you tried different s.
We won't grade the "elegance" of the code, but please keep the code clean and add comments (in
the code and in markdown cells) so we can understand what you do.

Das könnte Ihnen auch gefallen