Sie sind auf Seite 1von 40

EXTRAPOLATION

HANDBOOK

Table of contents
1. The functional principle of a retail panel in contrast to consumer research

2. Universe

2.1. Definition
2.2. Information
2.3. Assignment of the shops to the distribution channels one to one
2.4. Channel fusions
2.5. Exceptions
3. Sample construction

7
8
8
9
13
14

3.1. Accuracy of samples


3.1.1. The term inaccuracy
3.1.2. The total error
3.1.2.1.
The systematic error
16
3.1.2.2.
The sampling error
17
3.2. Shop profiles
3.3. Methods
3.3.1. Practicable approach
3.3.2. Statistical approach

14
15
15

3.4. Practical aspects


3.4.1. Refusal of panel participation
3.4.2. Changing sample consistency and structure
3.5. Difference between target and actual sample size

24
24
25
25

4. Extrapolation
4.1. Extrapolation channels
4.2. Extrapolation cells
4.2.1. Cell definition
4.2.2. Assignment of the shops to the extrapolation cells
4.2.3. Computation of the extrapolation factors
4.2.4. Carrying and not carrying outlets
4.2.5. Modification of the cell definition
4.2.6. Identical extrapolation for more than one product sector / product group
4.2.7. Handling of outliers and atypical outlets
4.3. Extrapolation for individual retail companies
4.4. Handling of aggregated data

18
19
19
21

27
27
27
28
28
29
30
30
31
31
31
32
2

4.5. Sample data versus census data

33

4.5.1. Sample-based extrapolation


4.5.2. Census data
4.5.3. Conclusion
4.6. Handling of entries and leavings
4.6.1. Entries
4.6.2. Leavings
4.7. Proceeding in case of problems with the data
4.7.1. Missing delivery periods
4.7.2. Late data delivery
4.7.3. Incomplete data
4.7.4. Incorrect data
4.7.5. Extreme changes in the data
4.7.6. Unreproducible changes in the data
4.8. Test of the extrapolated data
4.9. Coverage
4.10. Representativeness of the extrapolation with respect to the universe
4.11. Maintenance and updating of extrapolations

33
34
35
35
35
37
37
37
38
38
39
39
39
40
40
40
41

CHAPTER 1
1. The functional principle of a retail panel in contrast to consumer research
In the following the functional principle of a retail panel in contrast to consumer research is
described with the special extrapolation method for retail panels as the main focus. Since only
three market research institutions are working with a retail panel, ACN and IRI with a food panel
and GfK with a nonfood panel, there is not much literature concerning this subject. On this account
this special extrapolation handbook was created.
The subject of a research project determines whether it is to be regarded as demographic (peoplebased) research or research on market statistics. Demographic projects concern those people
involved in the market and consider their behaviour (buying behaviour, opinions etc.). Research on
market statistics concerns market developments as to market size, sales units, sales value, sales
prices, number and structure of defined companies operating in the market etc. Panels can be used
to generate both demographic data (e.g. household panel) or market statistics (retail panel).
In a retail panel data (e.g. sales units, sales value, sales prices, stocks) of a sample of retailers are
collected and analysed in order to create a special reporting on the considered market. This
reporting supplies decision-makers of manufacturers and retail companies with information
necessary for planning and monitoring market decisions.
The retail panel of GfK Marketing Services is a monitor of sales of product groups based on single
articles level. The functional principle of the retail panel is the panel methodology, which is
characterised by three main terms:
-

universe,
sample,
extrapolation.

The target of the panel methodology is to set up an extrapolation out of an appropriate sample of
retail outlets with respect to the universe with the result of being able to make representative
statements in the statistical sense concerning the market situation and the market development as
on
-

the market size by sales units and sales value,


the market structure by technical features,
the situation of the different distribution channels,
prices, average prices and price categories,
brand shares,
hitlists of the most sold models,
distributions of models, brands, features etc.

In order to get a satisfying reporting quality out of the panel some preconditions have to be
fulfilled.

The universe must be well-defined. As a rule universe is a synonym for one distribution channel.
As to this channel as much information as possible should be available. Therefore from time to
time studies of the universe have to be realised in order to get all the required information:
-

number of shops per region,


address list of the considered shops, and for every shop:

percentage of retailer and wholesaler turnover,


percentage of selling turnover and turnover with service, installation and repair,
assortment structure,
total turnover and turnover with assortment parts respectively product sectors,
sales area,
organization type,
distribution type (brick&mortar, click&mortar, pure player).

The sample should be constructed in such a way, that a representative extrapolation with respect to
the universe is possible. This does not mean that the sample itself has to be representative for the
universe. It is a widespread misbelief that the sample has to be representative for the universe. In
most of the cases the sample is not representative for the universe and this would even be wrong.
The sample has to be appropriate for a representative extrapolation with respect to the universe.
Normally the outlets within a distribution channel are very heterogeneous and there are lots of
outlets with a small turnover, many outlets with a medium turnover, but only a few with a large
turnover. In most cases a stratified sample is used where the turnover classes represent the strata.
Within the turnover classes there is a higher degree of homogeneity of the outlets.
Because of the heterogeneity of the outlets in the total universe the variance in the upper turnover
classes is considerably larger than in the lower ones. But since the statistical accuracy, which
depends on the statistical variances to a great extent, should be large in all turnover classes,
especially in the upper ones, which are highly relevant for the market, the sampling fractions in the
upper turnover classes have to be larger than in the lower turnover classes. This leads in nearly all
cases to a disproportional sample structure.
Regarding the extrapolation with respect to the universe every outlet in the sample gets an
extrapolation factor. So every sample outlet is representative for a certain number of other ones,
which are in the statistical sense similar outlets.
Consequently in turnover classes with a small turnover share (and a small variance) only a small
sampling fraction is necessary. This leads to a large extrapolation factor per shop. In turnover
classes with a large turnover share (and a large variance) a large sampling fraction is necessary,
which leads to a small extrapolation factor per shop.
This issue is demonstrated by the following example:
strata boundaries
(turnover classes)
0,05
0,5
1
2,5
5
10

- 0,5 Mio.
- 1,0 Mio.
2,5 Mio.
5,0 Mio.
10,0 Mio.
25,0 Mio.

universes
7.500
2.500
1.200
500
300
200

sampling
fractions
1
2
5
8
10
25

%
%
%
%
%
%

sample
sizes

extrapolation
factors

75
50
60
40
30
50

100
50
20
12,5
10
4

> 25,0 Mio.

100

40 %

40

2,5

However these are only theoretical figures. In practice there are outliers and atypical outlets which
the computed extrapolation factor cannot be applied to, but only a smaller one. This means there is
a loss of the number of the degrees of freedom, i.e. the sample size has to be enlarged in the
corresponding extrapolation cell, if the computed extrapolation factor should be kept for the other
outlets (see 4.2.7.).
On the other hand there are often chain stores or several stores belonging to one and the same
company which provide GfK with census data. In these cases the extrapolation factor per outlet is
1.
In each case the extrapolated data have to be tested in order to be sure that the specified
extrapolation is correct and the required accuracy has been achieved. If there is a definite shortfall,
the chosen method cannot be used because there would be too many inaccuracies. In this case the
extrapolation or even the sample design has to be modified.
Extrapolation is done channel by channel. The aggregation of all retail channels forms the audited
panel market.

CHAPTER 2
2. Universe
As a rule universe is a synonym for distribution channel. In order to fix a precise definition of this
distribution channel it is necessary to know what is actually wanted from it. Therefore as much
information about the channel respectively the outlets belonging to the channel as possible should
be available. When the channel definition is fixed the assignment of the shops one to one can take
place. It is also possible that shops will be excluded from a channel because of the specified
definition. In the reporting the distribution channels can be shown separately or as a fusion of two
or more channels.
The universe is also determined geographically. These restrictions relate to a certain country, GfK
regions within the country etc. Furthermore there is a restriction in time. In most cases the universe
is fixed for one year and will then be adjusted to the new situation. The information about the
universe have to be updated continually. If the retail scene is changing very fast, the universe is
updated every quarter of a year or even every month.
2.1. Definition
The universe respectively the distribution channel must be well-defined. GfK Marketing Services
fixed precise definitions for all channels of the retail panel specific to the special requirements.
These definitions are valid for all countries, which leads to a harmonisation in the international
reporting. Regarding for example the electrical retailers the following conditions have to be
fulfilled by each shop:
- the turnover per shop is more than 50.000
- more than 50 % of the turnover is realised with electrical products
- more than 50 % of the turnover is realised with sales and less than 50 % with service,
installation and repairs
Electrical retailers then are devided into three shop types:
- buying groups: retailers who are members of buying groups as Euronics,
ElectronicPartner, Expert, E.D.A.
- independents:

retailers with 6 outlets at the most who are not members of


buying groups

- chains:

retailers with more than 6 outlets who are not members of


buying groups (Media Markt, Saturn, Dixons, Currys, But,
Comet, Darty etc.)

The definitions of the distribution channels are specified in such a way, that there is no
overlapping. This is an inflexible rule. It must be guaranteed that a certain shop can only be
assigned to one channel. For instance is it not allowed that a certain shop belongs to the electrical
specialists as to consumer electronics and to photo specialists as to photo.

2.2. Information
In order to draw an appropriate sample it is necessary to collect as much information as possible
about the corresponding distribution channel:
-

number of shops per region,


address list of the considered shops, and for every shop:

percentage of retailer and wholesaler turnover,


percentage of selling turnover and turnover with service, installation and repair,
assortment structure,
total turnover and turnover with assortment parts respectively product sectors,
sales area,
organisation type,
distribution type (brick&mortar, click&mortar, pure player).

In distribution channels with frequent changes in the retail structure studies of the universe have to
be realised every year in order to get all the required information up to date. In distribution
channels with rare changes studies of the universe can be carried out over a longer period. As to
new distribution channels studies of the universe are a must.
Registers of members from buying groups and client lists from wholesalers and manufacturers are
helpful for this as well as structural data which chain stores send to the executives of the GfK retail
service. In most cases information from the previous study can be considered, too. Additionally
surveys have to be carried out, and the best way of interviewing has to be found:
- questioning by field workers,
used if the questions are long and complicate, when explanations are needed
disadvantage: this way is expensive and takes a lot of time
- questioning by letter,
used if the questions are long but not too complicate and dont need any special
explanations
- disadvantage: low response
- questioning by telephone,
used if the number of questions is not too large and the questions themselves are short
and clear
advantage: this way is the cheapest and guarantees the highest response
2.3. Assignment of the shops to the distribution channels one to one
When the channel definition is fixed the shops can be assigned to the channels one to one. As
mentioned above the channel definitions have to be fixed in such a way, that no overlapping is
possible, and it is guaranteed that a certain shop can only be assigned to one channel. It is not
allowed to assign a shop to channel X in panel 1 and to channel Y in panel 2. The assignment of the
shops to the distribution channels does not depend on the panel or other criteria. It depends
exclusively on the specified channel definition.
Only if a shop fulfils the conditions of the fixed channel definition, it will be assigned to this
channel. Otherwise it will be excluded. Regarding for example the electrical retailers a shop with
8

an annual turnover of less than 50.000 will not be included in this channel. Shops which realise
more than 50 % of the turnover with service, installation and repair are also not assigned to
electrical retailers. They are classified as service shops.
2.4. Channel fusions
In the reporting the distribution channels can be shown separately or as a fusion of two or more
channels. The names of the current fusions are fixed and correspond to an international standard.
The following channel fusions are defined:

Fusions

Channels included

Panelmarket

all Channels available in the product group

Department Stores/ Mail Order Houses

Department Stores
Mail Order Houses

Hypermarkets/ Cash & Carry

Hypermarkets
Cash&Carry

Hypermarkets / Supermarkets / Cash & Carry

Hypermarkets
Supermarkets
Cash & Carry

Hypermarkets / Cash & Carry / DIY Superstores

Hypermarkets
Cash & Carry
DIY-Superstores

Hypermarkets / Supermarkets/ Cash & Carry / Variety Stores

Hypermarkets
Supermarkets
Cash & Carry
Variety Stores

Mass Merchandisers

Department Stores
Mail Order Houses
Hypermarkets
Cash & Carry
Supermarkets
Variety Stores

Mass Merchandisers without Variety Stores


(only DIY)

Department Stores
Mail Order Houses
Hypermarkets
Cash & Carry
Supermarkets

Mass Merchandisers / DIY Superstores

Department Stores
Mail Order Houses
Hypermarkets
Cash & Carry
Supermarkets

Variety Stores
DIY-Superstores
DIY-Superstores / Variety Stores

DIY-Superstores
Variety Stores

Electrical Retailers / Iron Mongers / Electrical Installers


(only MDA)
Electrical Retailers / Iron Mongers
(only SDA)

Electrical Specialists
Technical Superstores
Iron Mongers
Electrical Installers

Electrical Retailers

Electrical Specialists
Technical Superstores

Consumer Electronic Stores

Electrical Specialists
Technical Superstores
Photo Specialists

Photo Retailers incl. Technical Superstores

Photo Specialists
Technical Superstores
Minilabs
Binocular Specialists
Photo studios/ateliers

Photo Retailers excl. Technical Superstores

Photo Specialists
Minilabs
Binocular Specialists
Photo studios/ateliers

Computershops

Computer Hardware Shops


Computer Software Shops

Computershops / Toys Specialists

Computer Hardware Shops


Computer Software Shops
Toys Specialists

Systemhouses

IT-Resellers (formerly Systemhouses)


IT Mail Order (Hardware)
IT Mail Order (Software)

Computershops/Systemhouses

Computerhardware-Shops
Computersoftware-Shops
IT-Resellers (formerly Systemhouses)
IT Mail Order (Hardware)
IT Mail Order (Software)

Computershops / Toys Specialists / Systemhouses

Computerhardware-Shops
Computersoftware-Shops
Toys Specialists
IT-Resellers (formerly Systemhouses)
IT Mail Order (Hardware)
IT Mail Order (Software)

Telecom Specialists

Telekom Specialists

10

Mobile Phone Specialists (formerly Radio


Sp.)
Car Spare Parts and Accessories Trade

Car Accessories Specialists


Car Accessories Wholesalers

Car Accessories Shops/Electrical Retailers

Car Accessories Specialists


Car Accessories Wholesalers
Car Audio Specialists
Electrical Specialists
Technical Superstores

Car Accessories Shops

Car Accessories Specialists


Car Accessories Wholesalers
Car Audio Specialists

Car Accessories Shops/ Dealers/Garages

Car Accessories Specialists


Car Accessories Wholesalers
Car Audio Specialists
Car Dealers
Car Garages

Office Equipment Retailers/Telecom Specialists

Office Equipment Specialists


Copier Specialists
Telekom Specialists
Mobile Phone Specialists (formerly Radio
Sp.)

Office Equipment Retailers

Office Equipment Specialists


Copier Specialists

Office Equipment Retailers/Telecom Specialists/Stationers

Office Equipment Specialists


Stationers
Copier Specialists
Telekom Specialists
Mobile Phone Specialists (formerly Radio
Sp.)

Office Equipment Retailers/Stationers

Office Equipment Specialists


Stationers
Copier Specialists

Office Equipment Retailers / Computershops /


Toys Specialists / Systemhouses

Office Equipment Specialists


Copier Specialists
Computerhardware-Shops
Computersoftware-Shops
Toys Specialists
IT-Resellers (formerly Systemhouses)
IT Mail Order (Hardware)
IT Mail Order (Software)

Media

Book Stores
Record Shops
Video Shops

11

Furniture / Kitchen Retailers

Furniture Specialists
Kitchen Specialists

Sports / Shoes Retailers

Sports Shops
Shoes Shops

Drugstores/Chemists/Pharmacies

Drugstores
Chemists
Pharmacies

Food Retailers/Rural Trade

(Traditional) Food Stores


Discount Food Chains
Drugstores
Rural Trade

Food Retailers

(Traditional) Food Stores


Discount Food Chains
Drugstores
Supermarkets

Tyre Accessories Retailers

Car Accessories Specialists


Car Accessories Wholesalers
Tyre Specialists

Power Tool Specialists

Iron Mongers
Car Accessories Wholesalers
Electro Wholesalers
Industrial Suppliers

Iron Mongers / Motorists

Iron Mongers / DIY-Shops


Motorists

Household Retailers

Household Specialists
Iron Mongers / DIY-Shops

Business Channels

Office Equipment Specialists


Stationers
Stationers Wholesalers
Copier Specialists
IT-Resellers (formerly Systemhouses)
IT Mail Order (Hardware)
IT Mail Order (Software)
Telekom Specialists
Mobile Phone Specialists (formerly Radio
Sp.)
Car Accessories Specialists
Car Accessories Wholesalers
Car Audio Specialists
Car Dealers
Car Garages

Consumer Channels

Department Stores
Mail Order Houses
Hypermarkets
Cash & Carry

12

Supermarkets
Variety Stores
DIY-Superstores
Electrical Specialists
Technical Superstores
Photo Specialists
Computer Hardware Shops
Computer Software Shops

It is also possible to define further channel fusions, but the name should be cleared with the
international methodology department of GfK Marketing Services in Germany in order to use the
same name in all countries.
A distribution channel cannot be shown separately, if it includes less than three retailers, because in
this case these retailers become transparent. Of course this situation may occur in case of three,
four or even more retailers within a channel. So this can only be a guideline which should not be
considered as a rule, because it always depends on the special situation in each country. If the
turnover share of a retailer within a channel exceeds 50 %, this retailer should be asked whether he
agrees with this channel shown separately in the reporting.
2.5. Exceptions
As to the assignment of the shops to the distribution channels there are some exceptions existing.
In some cases the outlets of a retailer would be assigned to a certain channel according to the
definition, but this is not possible, since it would be the only retailer in this channel and this retailer
would become transparent then.
An example for this is El Corte Ingles in Spain. The outlets are department stores, but El Corte
Ingles is the only company in Spain which carries on department stores. Thus El Corte Ingles is not
assigned to department stores but to the electrical retailers.

CHAPTER 3
3. Sample construction

13

Considerations of costs and time usually restrict the spectrum of the survey, so that only parts of a
product sector, product groups, are audited from a limited number of retailers according to certain
guidelines. An appropriate sample of retail outlets has to be drawn in order to be able to do a
representative extrapolation with respect to the universe. The sample should be constructed in such
a way that representative statements for the universe as to the market situation and the market
development as on
-

the market size by sales units and sales value,


the market structure by technical features,
the situation of the different distribution channels,
prices, average prices and price categories,
brand shares,
hitlists of the most sold models,
distributions of models, brands, features etc.

can be made.
Since a retail panel will be used over a long period of time, a good sample design is fundamental.
Sometimes a retail panel is created according to the request of a specific manufacturer and its
design strongly reflects the wishes of this client, which may differ from the needs of other future
clients. For this reason the sample design should meet a large variety of different needs. This
means some criteria have to be satisfied: sample size, selection of the retail outlets, sample
structure, data accuracy, reporting quality, costs etc. Of course there is an interrelationship between
these criteria.
3.1. Accuracy of samples
The subscribers of the reporting based on a retail panel place great importance on the quality of the
data as accuracy, precision and representativeness of the results. Of course, the target of a retail
panel should be a high degree of accuracy, so that the market situation and the market development
is shown correctly in the reporting. Since the panel data is based on samples, hundred per cent
correct results cannot be achieved and they are not necessary at all.
For example if panel data is used in order to check the future business policy of a company, a
certain degree of inaccuracy is not disadvantageous, since just back data are considered with
respect to decisions concerning the future. As a rule at the moment of the decision the actual
market situation has already changed. It is only deciding that the market situation and the market
development is not shown distortedly and gives reason to wrong decisions. In most cases it does
not play a role whether the market share of a retail company is 15, 16 or 17 per cent. It is more
meaningful to see whether the market share increases or decreases.
The degree of inaccuracy to be accepted cannot be defined generally, but the target of the highest
precision is not only uneconomic, it can also detract from relevant problems.
3.1.1. The term inaccuracy
Accuracy respectively inaccuracy can be defined as the difference between the sample results and
the true data of the universe. This difference is also called error in the statistical terminology.

14

The difference between the computed sample-based value and the actual value of the universe is
the absolute error, and the percentage difference of these two values is the relative error.
The term error can lead to a misunderstanding by people who are not concerned with statistical
theory. But incorrect statistical data are not to be regarded as being wrong and useless. There is a
wide band between complete correct and complete wrong data.
The panel data cannot be precisely correct, but for the purpose for which the data are intended, the
error can be irrelevant and any attempt to reduce it would not bring any real benefit. This is
especially the case, when precise information about the true situation is not required, but only
broad information about market size and trends are wanted. Normally it would be irrelevant for
example whether sales units of a particular product amounted to 65.167 or 65.000. It is also
sufficient to know that sales units increased at 5% and not at 5,004%.
If the inaccuracies of the data are bigger, the usability of the data may be reduced, but they must
not be worthless or misleading. In such cases the data should be used with caution, i.e. when
interpreting the data possible errors should be considered. In general data incorrect data or as
they are called in practice - approximate data are still better than no information at all. This
situation is comparable with driving a car in the fog. To be able to recognise even the shadowy
image of the road markings at either the centre or the side of the road is a great help.
Data are only useless, if they show a distorted picture of the reality and lead to wrong decisions.
But this risk can largely be avoided, if the research is carefully planned and carefully carried out.
The fundamental problem in the assessment of the accuracy of the data is that in most cases the
true data are unknown. Otherwise there would be no reason to do the survey. Since the true
data are unknown, the error level cannot be specified in concrete individual cases. The best that can
be done is to estimate potential errors on the basis of experience, or if certain requirements are at
hand, to calculate how large the error is likely to be on average as to such surveys.
Nevertheless plausibility checks should be carried out implicitly. This does not mean just to check
the results in order to see whether they are internally consistent, i.e. that there are no discrepancies,
but also to compare them with ones own ideas and experiences. Larger differences give reason to
clear up the discrepancies. But it should be self-evident that the panel results are not only
mistrusted or rejected on that account because they are contrary to the expectation.
3.1.2. The total error
The inaccuracy of the data, i.e. the total error, consists of two different components, the systematic
error and the sampling error. In cases of census data there is no sampling error, but only the
systematic error. In cases of sample surveys there are both, the systematic error and the sampling
error. But this does not mean that census data are always more precise than sample data. For the
different error components can compensate partly or totally and the size of the systematic error can
be completely different as to census data and sample data.
3.1.2.1. The systematic error
The systematic error is largely ignored in scientifical literature and in practice, although it is more
important, since it usually has a larger effect on the data quality than the sampling error. The reason
15

for this negligence may be that possible sources of error though can be shown, but their effect is
not measurable. In contrast the sampling error can largely be kept under control provided a
qualified proceeding.
In order to explain the systematic error the most important reasons are described in the following.
One reason is an imprecise definition of the relevant universe. The survey should only cover the
outlets which are relevant to the objective. This requirement is easily met only in theory. In practice
there are already difficulties in classifying the distribution channels which are relevant for a certain
product category. For example the universe of shops where a private individual can buy a photo
film consists of
-

photo specialists,
photo studios,
department stores,
supermarkets,
hypermarkets,
cash&carry markets,
petrol stations,
mail order,
drugstores,
chemists,
kiosks,
etc.

However, defining the universe is only the first step. It is also necessary to assess the number and
importance of the outlets in the universe. Directories from official or other sources are usually
incomplete and are based on different classifications. Therefore much work has to be done in order
to complete the data base. Nevertheless it must be recognised that part of the universe, either large
or small, cannot be considered because they cannot be identified. This leads to an inevitable undercoverage of the universe.
On the other hand over-coverage can be just as much a source of systematic error as undercoverage. The reason for over-coverage can be double count or the inclusion of outlets which do
not or do only partly belong to the universe. For example this would be the case, if regarding a
certain retailer it is not possible to separate retailer from wholesaler turnover.
Another big problem in market research is the so-called non-answering which can cause significant
distortions in the data. There are always retailers or retail companies which refuse to co-operate. In
other cases there is no data delivery in one or more periods because there is some trouble with the
electronic data transfer or the merchandise management system itself (see also 4.7.1.). If these
retailers cannot or can only partly be considered, the sample results may be distorted.
In case of census data the non-answering is a much bigger problem. If a retailer refuses to cooperate or stops data delivery, the result is a corresponding gap, an under-covering. In cases of key
accounts the quality of the data can be put at risk. In order to produce relief the outlets of this
retailer have to be created out of selected outlets of the other retailers. But here a problem can
appear, namely if this retailer differs from the other ones significantly, for instance a key retailer
with strong private labels in the assortment. So, regarding the systematic error as a rule sample data
is advantageous in comparison with census data.
Another source of systematic error are incorrect data as
16

wrong sales prices,


negative figures,
differences between the delivered turnover and the product of sales units and prices,
wrong article numbers (error of posting),
wrong assignment of products to product groups,
unrealistic sales units, purchase units, stocks or sales value.

Inaccuracies can also occur when the data are evaluated and when the reporting is set up. These
inaccuracies are further sources of systematic error.
3.1.2.2. The sampling error
Regarding sample surveys there is also the sampling error existing. Possible distortions arise when
the results of the sample are extrapolated up to total market, i.e. the sample results are generalised.
A particular sample is only one of lots of possible samples which may be more or less appropriate
for a representative extrapolation to the universe. Therefore, when the sample results are
extrapolated to the universe, inaccuracies and oversimplification can occur.
In the extreme case there is such a large sampling error that the extrapolated results are useless.
Since this cannot be recognised without plausibility checks one could run the risk of making
decisions based on wrong information.
There are two fundamental ways of reducing the risk of getting such an extreme sample: increasing
the sample size or stratification. Increasing the sample size may lead to a better sample and a
reduction of the sampling error, but this is not assured. On the other hand increasing the sample
size can increase the systematic error.
Consequently stratification is the better method and is applied as to the retail panel (see 3.3.). For
example the universe is stratified by the annual turnover of the shops. A sample is drawn from
every turnover class. This helps to prevent highly skewed samples, in which very small or very
large outlets are over-represented. Stratification of the universe leads to a significant reduction of
the sampling error.
The sampling error depends on the sample size and on the heterogeneity of the outlets in the
universe, which is measured by the standard deviation. The larger the standard deviation the larger
is the sampling error. If all outlets in the universe were identical, it would be sufficient to select
only one outlet for the sample, and the results would be completely correct. But in practice the
universe is normally very heterogeneous. There are lots of shops with a small turnover, many shops
with a medium turnover and only a few with a large turnover. Stratification causes more
homogeneity within the turnover classes and therefore helps to reduce the sampling error
significantly.
The sampling error is only an average error, which shows how large the difference is between the
sample result and the true value in the universe regarding the average of all possible samples
which can be drawn out of the universe. The number of possible samples is in case of the retail
panel astronomical large. The sampling error does not state anything about the individual case, it is
simply a global measurement of accuracy. In the individual case the inaccuracy can be smaller or
significantly larger than the sampling error.

17

Besides the sampling error there is the maximum sampling error, which is much more concrete.
The maximum sampling error shows the maximum difference, which can occur for a particular
percentage rate of all samples, for instance 95%. Mentioning the maximum sampling error
increases the secureness as to the size of the possible inaccuracy, but the statement will be more
imprecise, since the maximum sampling error, which will not be exceeded in 95% of the samples,
is roughly twice as much as the average sampling error. Furthermore it is not guaranteed, that the
concrete sample belongs to the 5% of all samples, in which the sampling error is - possibly
extremely - larger than the maximum sampling error.
Unfortunately there are no general rules existing as to what the optimum sample size is. It can only
be approximately estimated in exceptional cases, if target significance levels are known as well as
the structure of the universe, the costs of the survey and the extent of the systematic error.
The computation of the sampling error assumes the knowledge of the standard deviation in the
universe, which is unknown in most cases. So, the standard deviation has to be estimated on the
basis of the sample. But it has to be considered that subject to the individual sample different
estimated values for the standard deviation will result. The rule is that the standard deviation in the
universe is the average value of the estimated standard deviations for all possible samples drawn
from the universe.
In practice only one value for the standard deviation will be calculated. This value can be smaller
or larger than the true value. Then the calculated sampling error will be too small or too large.
Only the average of all possible samples will lead to the true sampling error. Thus the
informational value of the sampling error should not be overestimated.
3.2. Shop profiles
For each shop in the sample a shop profile with all relevant characteristics as
-

shop type,
total turnover or turnover class,
assortment structure / composition of the carried product sectors,
turnover per product sector,
membership in a buying group,
membership in a franchise organisation,
outlet of a key account,
headquarter with subsidiaries,
percentage of retailer and wholesaler turnover,
percentage of selling turnover and turnover with installation, service and repairs,
percentage of e-commerce turnover,
sales area

has to be drawn up. This is necessary so that the shop can be a assigned to
-

the correct distribution channel (e.g. electrical specialists),


the correct strata (e.g. turnover class 2,5 5 million , region north, independent),
the correct extrapolation cell (see 4.2.2.).

The shop profile also provides information about if the shop is atypical or an outlier. This is
important for the extrapolation because an atypical shop or an outlier cannot stand for a large
18

number of other shops, so that the computed extrapolation factor cannot be applied to it. In the
extreme case such a shop can only stand for itself. The handling of outliers and atypical outlets is
described in chapter 4.2.7.
Shop profiles should be updated, if necessary, since there are sometimes changes. Shops can be
enlarged or downsized. They can also change the assortment structure by including new product
groups or new product sectors in their sales program and by excluding others.

3.3. Methods
In principle diverse methods of sample construction exist:
-

quota procedure,
cut-off sampling,
focused sampling,
simple random sampling,
stratified random sampling,
clustered sampling.

These methods are described in the panel guide retail and technology. Here two established
approaches are introduced.
3.3.1. Practicable approach
As a rule a stratified sample with at least three dimensions is used for the retail panel of GfK MS.
The dimensions can be distribution channels, organisation types, regions, turnover classes, sales
area classes etc. The result is a certain number of cells which are characterised by these features.
An example for such a cell is
electrical specialists / independents / north / 0,5 1 million .
For all these cells resulting from the stratification the number of outlets and the turnover of theses
outlets have to be estimated. After having fixed the total sample size depending on the costs, the
number of sample shops per cell can be calculated according to the following formula:
% n(i) = f1 * (% N(i)) + f2 * (% X(i)) ; f1 + f2 = 1
n(i):
% n(i):
N(i):
% N(i):
X(i):
% X(i):
f1:
f2:

sample size in cell i


percentage of the sample size in cell i
number of shops in the universe of cell i
percentage of the number of shops in the universe of cell i
turnover of the shops in the universe of cell i
percentage of the turnover of the shops in the universe of cell i
factor 1
factor 2

This is a very simple formula but because of the various alternatives of determining the factors
satisfactory results can be achieved in all situations. If the sample size of a cell should depend on
19

the number of outlets in the universe of this cell by the majority, factor 1 must be larger than factor
2. If the sample size of a cell should depend on the turnover of the shops in the universe of this cell
by the majority, factor 1 must be smaller than factor 2. Factor 1 and factor 2 may also be identical,
if the same weight is given to the number of outlets and the turnover. But it has always to be
considered that the sum of factor 1 and factor 2 results in 1.
The following example may clarify the calculation of the sample sizes in the cells (f1 = f2 = 0,5).
Assuming the universe consists of 15.000 outlets.
cell
no.
1
2
3
4
5

turnover
classes
< 0,5 mio.
0,5 1 mio.
1 2,5 mio.
2,5 5 mio.
> 5 mio.
total

cell
no.
1
2
3
4
5

turnover
classes

number of shops
in the universe

percentage of the number


of shops in the universe

8.400
3.600
1.950
600
450

56 %
24 %
13 %
4%
3%

15.000

100 %

percentage of the turnover


of the shops in the universe

< 0,5 mio.


0,5 1 mio.
1 2,5 mio.
2,5 5 mio.
> 5 mio.
total

8%
10 %
17 %
12 %
53 %
100 %

percentage of
the sample size
32 % (= (56 + 8) / 2)
17 % (= (24 + 10) / 2)
15 % (= (13 + 17) / 2)
8 % (= ( 4 + 12) / 2)
28 % (= ( 3 + 53) / 2)
100 %

If the total sample size is fixed by 500 outlets, the sample sizes in the cells are
n(1) = 500 * 0,32 = 160
n(2) = 500 * 0,17 = 85
n(3) = 500 * 0,15 = 75
n(4) = 500 * 0,08 = 40
n(5) = 500 * 0,28 = 140
and the sampling fractions sf(i) = n(i)/N(i) (i = 1,2,3,4,5) are
sf(1) = n(1)/N(1) = 160/8.400 = 1,9 %
sf(2) = n(2)/N(2) = 85/3.600 = 2,4 %
sf(3) = n(3)/N(3) = 75/1.950 = 3,8 %
sf(4) = n(4)/N(4) = 40/600 = 6,7 %
sf(5) = n(5)/N(5) = 140/450 = 31,1 %
The example shows that this procedure leads to a disproportional sample structure. In small
turnover classes there is only a small sampling fraction and in large turnover classes the sampling
20

fraction is large. It was the target to achieve the disproportional structure of the sample, because the
variance of the outlets in large turnover classes is very much larger than the variance of the outlets
in smaller ones. But the outlets in the larger turnover classes are highly relevant for the market and
in order to get the same statistical accuracy the sampling fractions in the larger classes have to be
larger than in the smaller ones. In practice this means that the percentage sample size in the larger
turnover classes and especially in the largest one has to be large and comparatively many shops
have to be recruited here. Though in most cases this will not raise a problem, since in these
turnover classes there are chain stores delivering census data.
But not only the number of shops in the sample is important. It is also necessary to have shops of
different companies in the sample. Otherwise the degree of heterogeneity will be undervalued.
On the other hand this does not mean that the sample should include shops of every retail company.
This is a widespread misbelieve. If there are no shops of a certain company in the sample, they can
be considered in the extrapolation by applying corresponding extrapolation factors to the other
sample outlets.

3.3.2. Statistical approach


Another practical method to construct a sample is based on a statistical approach, the so-called
Neyman principle. In a first step this method is used in order to calculate the needed total sample
size in dependence on a fixed degree of accuracy. In a second step it is used to calculate the sample
sizes within the strata (e.g. turnover classes, sales area classes etc.). This method can also be used
for the calculation of the sample sizes within the strata, if the total sample size is fixed.
In principle two different procedures exist in the statistical theory. On the one hand the budget is
fixed, so that the total sample size is predetermined and the statistical accuracy of this procedure is
maximised as far as possible. In this case the target is minimisation of the sampling error and
maximisation of the quality of the results generated. The other alternative is to achieve a specified
accuracy. The total sample size then is not fixed, and the minimum needed total sample size will be
calculated in dependence on the specified level of accuracy.
In order to make statements in the statistical theory a statistical distribution of a significant
parameter value of the regarded units has to be assumed. This is also necessary, if the minimum
needed total sample size should be calculated in dependence on a fixed level of accuracy. The
parameter value in terms of which the sample should be constructed should have a high
respectively the highest possible correlation with the characteristic to be researched. In the retail
panel the shops are the regarded units and the parameter value usually is the turnover.
According to the central limit theorem of statistics the Gaussian normal distribution can be
assumed, if the universe is large. As a rule the degree of heterogeneity of the universe, i.e. the
shops in the distribution channel, is extremely high. The empirical density function of the
distribution of the turnover follows in most cases a pattern of a large gradient on the left side and a
decreasing gradient on the right, as there are lots of shops with a small turnover, many shops with a
medium turnover, and only a few with a large turnover. This means it is not similar to the density
function of the Gaussian normal distribution.
For this reason a logarithmic transformation has to be carried out. With this procedure the band
between 0 and 1 is transferred into the band between - and 1. So, especially the band between 0
21

and 1, i.e. the turnover class where the percentage number of outlets is the largest, is lengthened.
By means of the logarithmic transformation the empirical density function becomes similar to the
density function of the Gaussian normal distribution. The arithmetic mean of the sample (i.e. the
weighted average in a stratified sample) can then be used as an unbiased estimate of the universe
mean.
Having fixed the confidence level and the maximum sampling error the minimum needed total
sample size n(Neyman) can be calculated after this formula:
n(Neyman) = N(i)*S(i) / e*X/q(1-/2) + N(i)*S(i)
The legend of the figures in the Neyman formula is as follows:
e:
1-:
X:
N(i):
S(i):
q(1-/2):

maximum sampling error


confidence level
total turnover of all outlets in the universe
number of outlets in the universe of stratum i
standard deviation in stratum i
percentile of the normal distribution for the confidence level 1-

The following example should illustrate the Neyman principle.


Assuming an appropriate sample should be selected from the universe of 15.000 electrical retailers.
If the maximum value of the sampling error allowed is 2%, the confidence level should be 95% at
least. These figures are common in the business world. If the sample size would be calculated for a
sample without stratification, more than 3.000 outlets would be needed for the sample in order to
achieve the required statistical accuracy. This means more than 20% of the outlets in the universe
have to be included in the sample. This is a result of the extremely heterogeneous universe. There
are shops with an annual turnover of 50.000 whereas others realise a turnover of more than 50
mio. . Since the calculation of the sample size depends on the variance of the arithmetic mean,
such a large sample size results.
Therefore the universe has to be divided into strata, here turnover classes. Then a homogenisation
of the universe will be achieved (see 3.3.1.). In the example the following turnover classes were
fixed:
0,05 million
0,5 million
1
million
2,5 million
5
million
10
million

- 0,5 million
- 1 million
- 2,5 million
- 5 million
- 10 million
- 25 million
> 25 million

The calculation of the at least needed total sample size according to the Neyman principle and
considering a maximum sampling error of 2% an a confidence level of 95% results in 372 outlets.
Other results are possible, if the values of the maximum sampling error and the confidence level
are modified.
If the confidence level is fixed by 95% and the maximum sampling error is varied, extremely
different sample sizes are the result. If a maximum sampling error of 1% is required, the sample
22

size increases from 372 to 806 shops. If on the other hand a maximum sampling error of 5% is
sufficient, the sample size could be reduced to 196 shops.
1- = 95%, e = 5%
1- = 95%, e = 4%
1- = 95%, e = 3%
1- = 95%, e = 2%
1- = 95%, e = 1%

n = 78
n = 118
n = 196
n = 372
n = 806

If the maximum sampling error is fixed by 2% and the confidence level is varied, the following
sample sizes will be calculated:
e = 2%, 1- = 95%
e = 2%, 1- = 96%
e = 2%, 1- = 97%
e = 2%, 1- = 98%
e = 2%, 1- = 99%

n = 372
n = 399
n = 431
n = 471
n = 539

In the following table more results of the calculation are shown.


sample size

1- = 95%

1- = 96%

1- = 97%

1- = 98%

1- = 99%

e = 5%
e = 4%
e = 3%
e = 2%
e = 1%

78
118
196
372
806

86
129
213
399
837

95
143
234
431
871

108
161
261
471
910

131
194
310
539
969

All these results are correct in the statistical sense. In practice it has to be weighed which values to
use as to the maximum sampling error and the confidence level. Generally the values e = 2% and
1- are used.
According to the formula
n(i, Neyman) = n(Neyman)*N(i)*S(i) / N(j)*S(j)
the sample sizes within the strata respectively the turnover classes are calculated. In the example
this leads to the following sampling fractions (n(Neyman) = 372):
turnover classes
0,05 million
0,5 million
1
million
2,5 million
5
million
10
million

- 0,5 million
- 1 million
- 2,5 million
- 5 million
- 10 million
- 25 million
> 25 million

sampling fractions
1%
2%
5%
8%
10%
34%
48%

23

Once more this example shows that the Neyman principle leads to a disproportional sample
structure, too. In small turnover classes there is only a small sampling fraction and in large
turnover classes the sampling fraction is large, and this was the target (see 3.3.1.).
But there is one weak point of the Neyman principle. If the degree of heterogeneity is very high,
the Neyman formula can require a sample size, which is larger than the universe in this turnover
class. This can appear in the largest turnover class. This is a typical phenomenon of the Neyman
principle. In practice this problem is solved by the following way. In the turnover class where the
sample size should exceed the universe all outlets are taken for the sample. The difference between
the required and the actual number of outlets in this turnover class can be added to the required
sample size in the next larger turnover class. It can also be allocated to more turnover classes. In
order to keep the required accuracy the total sample size and the sample sizes of the other turnover
classes should be recalculated using the Neyman principle.
As to sample optimisation there are two more interesting documents on the StarTrack platform:
3. Methodology
Sample Optimisation
Optimum Allocation of Stratified Random Samples
3. Methodology
Sample Optimisation
Optimising Stratified Random Samples
3.4. Practical aspects
3.4.1. Refusal of panel participation
In practice it can happen that a retailer or a retail company refuses to participate in the retail panel.
This problem of the non-answering concerns sample data as well as census data.
If sample data are used and the refusing retailer is less important for the panel, these outlets will be
substituted by other ones provided that comparable outlets are available. If the sample size is large
enough in the corresponding cell, it is also possible to modify the extrapolation factors of the other
outlets in such a way, that the loss of data can be compensated.
If census data are used and the refusing retailer is less important for the panel, the outlets of this
retailer have to be created out of selected outlets of the other retailers. But this method can only be
applied, if data per outlets are available. Otherwise the data of the rest of the retailers have to be
weighted correspondingly.
If the refusing retailer or retail company is important for the panel, the quality of the data in the
reporting can be put at risk, no matter if sample data or census data is used. The results can be
distorted, especially if the lost retailer differs from the other ones significantly, for example a key
retailer with strong private labels in the assortment.
In general it is not necessary to get outlets of all retailers or retail companies into the panel, since
with respect to the distribution channels there are lots of similar outlets which can stand for a
number of other ones. This should be considered when constructing the sample in order to keep the
costs controlled.
24

3.4.2. Changing sample consistency and structure


In most cases the consistency of the sample changes from period to period. There are sometimes
retailers who stop the data delivery because they do not want to provide GfK MS with their data
any longer, or GfK MS stops the co-operation with a retailer because of bad data quality or because
the data delivery is too late. Retailers can close their shop(s) or some of their shops, which then are
missing in the sample. New shops are included in the sample. Shops can be enlarged or downsized.
They can also change the assortment structure by including new product groups or new product
sectors in their sales program and by excluding others.
All these activities lead to changes within the sample which in turn can lead to variations of the
extrapolated results as to the universe. But on the other hand the universe itself is permanently
changing over time. So, it is not possible to differentiate between the true variations and the
variations caused by changes in the sample.
Panel samples as the retail panel have the advantage that the problem of non-answering or refusal
of panel participation essentially occur in the initial phase. The variations caused by changes in the
sample decrease in panel samples, too. The handling of these changes when extrapolation is done is
described in chapter 4.6.
3.5. Difference between target and actual sample size
Normally there is a (possibly large) difference between the target and the actual sample size. There
are two main reasons for this: the existence of outliers and the delivery of census data. In both
cases the actual sample size will be larger than the target sample size.
It often occurs that retail companies, especially chain stores, deliver census data. But regarding the
sample it would not be necessary to consider all outlets of this retailer in the evaluation. On the
other hand census data will be processed, if this retailer is provided with his exclusive performance
in the reporting. Then there is no sampling error (see also 3.1.2.). So, the actual sample size will be
larger than the calculated one. But the processing of census data is a question of costs. Therefore in
cases of retailers who are not supplied with the exclusive segment a sample will be preferred.
The other reason for that the actual sample size is normally larger than the target sample size is the
existence of outliers. In practice it happens again and again that there are shops which cannot
represent the number of other ones according to the computed extrapolation factor because the
sales structure is too different. Then the computed extrapolation factor cannot be applied to such an
outlier or atypical outlet, but only a smaller one. In the extreme case such a shop can only stand for
itself with extrapolation factor 1. But this means there is a loss in the number of the degrees of
freedom, i.e. the sample size in this cell has to be enlarged, if the computed extrapolation factor
should be kept for the other outlets. Therefore the sample size in practice is mostly larger than in
the theoretical model (see also 4.2.7.).
It can also occur that in a special extrapolation cell the actual sample size is smaller than the target
sample size. For instance, if a retailer with several outlets or a chain store stops data delivery, the
sample size can become too small in this cell. Then new shops have to be recruited. The handling
of such cases is described in chapter 4.6.2.

25

CHAPTER 4
4. Extrapolation
The target is to set up a representative extrapolation with respect to the universe. In all cases the
extrapolation takes place either for distribution channels or for retailers who are mostly supplied
with an exclusive segment in the special reporting. Regarding the extrapolation for distribution
channels the channels are estimated completely, that means the coverage rate per channel is 100%.
Considering as example the electrical retailers then the sales units and sales value of this channel
are estimated hundred per cent. As a matter of course when extrapolating with respect to a retailer
the sales data of this retailer are estimated completely too.
4.1. Extrapolation channels

26

The extrapolation channels can be identical to the country channels, but it is also possible to put all
country channels, which belong to the same production project, into the same extrapolation
channel. For instance the two country channels electrical specialists and technical superstores
may be put together into the extrapolation channel electrical retailers. In order to be able to
recognise the extrapolations in the DWH at a later date, there should be given reasonable names to
the extrapolation channels. The name should be structured as follows:
-

country identification code, e.g. DE for Germany,


sector identification code, e.g. CE for consumer electronics,
name of the channel, e.g. electrical retailers.

Then the name of the extrapolation channel in the mentioned example would be
DE CE electrical retailers.
In many cases there are changes in the retail scene or in the sample during the year. Shops are
closed, new shops open, retailers in the panel stop data delivery, new retailers join the panel, other
retailers in the panel deliver data of new shops etc. These changes have to be considered, updates
of the extrapolation are necessary. In practice a first extrapolation version is set up at the beginning
of the year. If then there are changes in the universe or in the sample, they will be considered in a
second extrapolation version and so on.
It is always necessary to set up a new extrapolation version, if an extrapolation variant is built up
for the first time or if there is a change in the extrapolation caused by a change in the universe or in
the sample. This means, it is a must to set up a new extrapolation version, even if there is only one
change.
The versions of an extrapolation should be labelled with reasonable names. It is advisable to assign
the name of the reporting period, e.g. January 2005. If extrapolation versions do not change for
several periods, it is better to choose another name, e.g. 1-2005.
4.2. Extrapolation cells
Since in most of the cases the outlets within a distribution channel are very heterogeneous and the
variance is very large, the sample will be stratified in order to get more homogeneity within the
strata (as mentioned in chapter 3.). Instancing the electrical retailers the spectrum of the turnover of
the outlets is very large and there is a large degree of heterogeneity. Turnover classes should be
established.
Concerning the extrapolation there is also a segmentation of the channels which need not be
identical to the stratification structure. Each channel is separated into diverse segments, which are
called extrapolation cells. These cells can be completely different regarding the different channels
and product sectors.
4.2.1. Cell definition
The extrapolation cells are constructed in dependence on several features as regions or turnover
classes or sales area or organisation type etc. A combination of features is also possible. For
example an extrapolation cell can be defined by a special region, a special turnover class and a
27

special organisation type (e.g. north / 1 2 million. / buying group). In order to construct
extrapolation cells according to the desired features it is an essential condition that these features
are listed in MDM. Only the listed features can be used for the cell construction.
Then an extrapolation cell is defined by the following criterions:
-

cell name,
cell features (turnover class, region, sales area, organisation types etc.),
number of outlets in the sample,
table of the itemised outlets in the sample with shop number,
extrapolation universe,
distribution universe,
extrapolation factors,
distribution factors.

The cell name should correspond to the features, which describe the extrapolation cell, e.g. a
combination of a special region and a special turnover class as
electrical specialists / independents / north / 0,5 - 1 million. .
The extrapolation cells have to be defined in such a way, that there is no overlapping. This means
the cell definitions have to be fixed one to one, so that overlapping is not possible. It must be
guaranteed that a certain shop can only be assigned to one extrapolation cell.
4.2.2. Assignment of the shops to the extrapolation cells
When the extrapolation cells are defined the shops can be assigned to the cells. In each cell the cell
outlets are selected from the table of the itemised outlets in the sample with shop number. The
assignment of the shops to the extrapolation cells may differ from product sector to product sector
respectively product group to product group, if the cell definitions differ itself. As a matter of
course the characteristics of the shops as turnover, sales area etc. remain the same. This is clarified
by an example.
Assuming an electrical specialist who belongs to the buying group Euronics, is situated in the
north, realises an annual turnover of 1,5 million and has a sales area of 250 m the following
assignments are possible.
As to consumer electronics this shop may belong to an extrapolation cell characterised by
electrical specialists / Euronics / north / 1 2 million / 200 500 m.
As to photo the same shop may belong to an extrapolation cell characterised by
electrical specialists / buying groups / north / 1 5 million / 200 300 m.
But the assignment of the shops to the distribution channel does not depend on the panel or other
criteria. It depends exclusively on the channel definition. This means for example, it is not allowed
that a shop is assigned to electrical specialists regarding consumer electronics and to photo
specialists regarding photo.

28

Since overlapping of cells must be excluded, each shop can only be assigned to one extrapolation
cell. Only if a shop fulfils the conditions of the specified cell definition, it will be assigned to this
cell.
An extrapolation cell must not necessary consist of lots of outlets. It is also possible to create a cell
with only one outlet. This is often done, if new recruited shops should be integrated into the sample
in the new reporting period, but not into an existing cell. Then a new extrapolation cell can be set
up with an own extrapolation factor.

4.2.3. Computation of the extrapolation factors


Theoretically the extrapolation factor in each extrapolation cell is the reciprocal value of the
sampling fraction in this cell (e.g. sampling fraction 5 % extrapolation factor 20), and all sample
shops in this cell get the same extrapolation factor. But in practice there are mostly some cases
where an individual factor is attached to a shop, which is different from the factors of the other
outlets. Among other things this rule has to be applied to atypical outlets, which the computed
extrapolation factor cannot be placed to, but only a smaller one. In some cases there are shops
which cannot represent a large number of other shops for reasons of their sales structure. In an
extreme case such a shop can only represent itself, i.e. the extrapolation factor is 1.
The extrapolation universe and the distribution universe may differ as well as the extrapolation
factor and the distribution factor. For example, if a certain retail company with 50 department
stores transmits the data for all 50 outlets in aggregated form, i.e. not per outlet, the extrapolation
factor is 1, the extrapolation universe is 1, the distribution factor is 50 and the distribution universe
is 50. Thus a difference has to be made between the extrapolation universe (factor) and the
distribution universe (factor).
In an extrapolation cell either the universe is fixed (fixed universe) or the factors are fixed (fixed
factors). This can be done for the extrapolation universe and for the extrapolation factors as well as
for the distribution universe and the distribution factors. It often occurs, that distribution universes
or factors for a product sector or for certain product groups are predetermined within a channel.
That depends on the number of outlets in this channel, which carry the corresponding product
sector or product group.
Regarding the first method (fixed universes) the extrapolation factors and the distribution factors
are calculated automatically. For example, if a cell universe consists of 100 outlets and there are 5
sample outlets, the system calculates 20 as extrapolation factor for each shop. However, it is also
possible to attach an individual factor to a shop. If in the example above factor 4 is attached to one
outlet, the system would calculate automatically factor 24 for the other shops.
The universes per cell and per cell definition are fixed once at the beginning of the reporting year.
As a rule the universes are not modified within this year unless there are relevant changes. On the
other hand there are distribution channels with frequent changes in the retail scene, so that the
extrapolation has to be updated during the year.
Regarding the second method (fixed factors) the extrapolation universe and the distribution
universe are calculated automatically. For instance, if there are 5 sample outlets in an extrapolation
cell, which factor 20 is placed to each of them, the system calculates a universe of 100 outlets. If

29

there are four outlets with factor 24 and one outlet with factor 4, the system also calculates a
universe of 100 outlets.
4.2.4. Carrying and not carrying outlets
If only a part of the outlets in a distribution channel carries a certain product category (product
sector, product group, brand, feature etc.), a difference can be made between carrying and not
carrying outlets. Then the extrapolation takes place for the carrying outlets. But the not carrying
outlets have to be considered in the extrapolation because extrapolation is always done for the
complete channel with a coverage rate of 100 %. So as to the not carrying outlets one or more
further simulated extrapolation cells are created. The following example may demonstrate this
proceeding.
Imagine a universe of 5.000 outlets in the distribution channel hypermarkets. 80% of the outlets
carry products of the product sector consumer electronics. Accordingly the universe of the carrying
outlets in this channel amounts to 4.000 outlets. These 4.000 carrying hypermarkets are assigned to
extrapolation cells, e.g. according to sales area groups. The 1.000 not carrying hypermarkets are
also assigned to simulated extrapolation cells. The reason for this proceeding is to get a correct
distribution. If the simulated extrapolation cells for the not carrying hypermarkets were not created,
the system would fix the universe of all hypermarkets at 4.000 outlets (= 100%) automatically and
the distribution of the carrying hypermarkets would be invalid because it is only 80%. Without the
simulated extrapolation cells the system is not able to recognise that the universe of all
hypermarkets is 5.000.
4.2.5. Modification of the cell definition
A cell definition can remain changeless during the whole reporting year, but it can also be modified
in one, two or all reporting periods. There are several reasons for the modification of cell
definitions. For example if two regions, which were extrapolated separately in the past, now are put
together, the cell definition has to be modified. Or if two turnover classes, which were combined in
the past, are separated, the cell definition has to be modified, too.
4.2.6. Identical extrapolation for more than one product sector / product group
Regarding a special distribution channel or a special part of a distribution channel the extrapolation
cells often are identical for more than one product sector / product group. Then the extrapolation is
simplified. There are also extrapolations existing which are applied to all product sectors. This
occurs primarily in distribution channels, which carry products of various product sectors, e.g.
hypermarkets or department stores.
In order to make extrapolations less difficult the target should be to use the same extrapolation
respectively extrapolation cells for as much product groups as possible. This saves a lot of work
concerning maintenance and updating of extrapolations.
4.2.7. Handling of outliers and atypical outlets

30

In practice it happens again and again that there are shops which cannot represent the number of
other ones according to the computed extrapolation factor because the sales structure is too
different. Then the computed extrapolation factor cannot be applied to such an outlier or atypical
outlet, but only a smaller one. In the extreme case such a shop can only stand for itself with
extrapolation factor 1. But this means there is a loss in the number of the degrees of freedom, i.e.
the sample size in this cell has to be enlarged, if the computed extrapolation factor should be kept
for the other outlets. Therefore the sample size in practice is mostly larger than in the theoretical
model (see also 3.5.).
The problem of the reduced number of degrees of freedom can usually be solved easier in cells
with the more important outlets (large turnover, large sales area etc.), since in these cells there are
nearly always retail chains which deliver census data and the number of sample outlets exceeds
that required by the sample design. Regarding cells with the less important outlets the sample size
should be enlarged in order to keep the required accuracy. But as a rule the number of less
important outlets in the universe is very large, so that it would not be a problem to recruit some
more shops. On the other hand in cells with less important outlets there is more homogeneity and
the variance is not very large. Here trivial inaccuracies may be neglected.
For example assuming in the turnover class 0,5 1 Mio. the extrapolation factor is 50, and there
are 36 shops in the sample. Two shops can only stand for themselves, i.e. extrapolation factor 1,
two shops can only represent 10 others, i.e. maximum extrapolation factor 11, and two further
shops can only represent 20 others, i.e. maximum extrapolation factor 21. The result is a factor sum
of 66 for 6 outlets. Since the cell universe is 1.800 (= 36 * 50), the other 30 sample outlets would
get factor 57,8 (= 1.734 / 30). In order to keep extrapolation factor 50 for these 30 outlets at least 5
more shops are needed in this cell (1.734 / 50 = 34,68 30 + 5).
4.3. Extrapolation for individual retail companies
Key accounters are often supplied with an exclusive segment in a special reporting, so that they can
see their market share in the different product sectors, product groups and feature categories and
can compare their assortment structure directly with that of the corresponding distribution channel
and the panel market. In such a case the sample of outlets of the corresponding retail company has
to be drawn with regard to representative aspects concerning this company and not to
representative aspects concerning the distribution channel. This means an appropriate sample of
outlets is required in order to get a good estimation for the retailers own data.
If the outlets of a retail company are largely homogeneous, i.e. the turnover and the assortment
structure is nearly the same for all outlets, the choice of outlets is rather easy. But the higher the
heterogeneity of the outlets is the choice becomes more and more difficult because then one outlet
cannot be representative for a large number of other ones.
Another problem with the reporting quality can appear, if such a retailer closes existing outlets or
opens up new outlets. The extrapolation has to be modified immediately. A quick information on
the part of the retailer is absolutely necessary. It is the same, if an outlet is enlarged or downsized.
As a conclusion a sample-based extrapolation with respect to the data of a retail company will
never ensure absolute accuracy. Completely correct data for the exclusive segment can only be
guaranteed by census data assumed a complete and correct data delivery and processing (see also
chapter 4.5.). If in case of census data a retailer closes existing shops or opens up new ones, this
change is taken into account automatically.
31

Regarding the extrapolation for individual retailers the outlets of such a retailer are assigned to one
or more separate extrapolation cells. If there is a split in regions, turnover classes, sales areas etc. in
the reporting for the corresponding distribution channel, an extrapolation cell for each segment is
required. That means for example, if there is a reporting for 3 turnover classes und 2 regions, the
outlets of the corresponding retailer have to be assigned to 6 extrapolation cells. If no split takes
place, one extrapolation cell may be sufficient.
4.4. Handling of aggregated data
Sometimes a retailer with more than one outlet delivers the data not per outlet but in aggregated
form. The reason may be that this retailer is not able to separate the data or this is too expensive for
him. In other cases data are delivered per outlet and GfK aggregates them in order to save costs.
This procedure is allowed, if there is no (e.g. regional) split in the reporting for the corresponding
distribution channel.
Then the extrapolation factor is 1 and the extrapolation universe is also 1. The distribution factor
and the distribution universe correspond to the number of outlets of this retailer. For example, if a
certain retail company with 50 department stores delivers the data of all 50 outlets in aggregated
form, the extrapolation factor is 1, the extrapolation universe is 1, the distribution factor is 50, and
the distribution universe is 50. If only the data of 10 of the 50 department stores is delivered in
aggregated form, the extrapolation factor is 5, the extrapolation universe is 5, the distribution factor
is 50, and the distribution universe is 50. In cases of data delivery in aggregated form it is not
possible to use single outlets of this retailer representative for outlets of other retailers regarding
the extrapolation.
Even if there is a split in the reporting for a distribution channel, an aggregation of the data is
allowed, but only with respect to this segmentation respectively to these extrapolation cells. Then
the data of the outlets of this retailer, which belong to the same segment, can be aggregated.
Assuming for example a distribution channel with a reporting for 3 turnover classes and 2 regions
and a retail company with 50 department stores which delivers the data in aggregated form
corresponding to these 6 extrapolation cells:
turnover class

region

# outlets

extrapolation
factor

distribution
factor

< 5 Mio.
5 20 Mio.
> 20 Mio.
< 5 Mio.
5 20 Mio.
> 20 Mio.

North
North
North
South
South
South

5
12
8
4
15
6

1
1
1
1
1
1

5
12
8
4
15
6

4.5. Sample data versus census data


4.5.1. Sample-based extrapolation

32

In cases of sample data every outlet in the sample gets an extrapolation factor in order to carry out
a representative extrapolation with respect to the universe. That means that every sample outlet is
representative for other ones, which are in the statistical sense similar outlets. There is only one
exception, if in the extreme case an outlet is so untypical because of its different sales structure,
turnover etc., that it cannot stand for any other outlet and the extrapolation factor can only be 1.
It is important to distinguish between the extrapolation concerning a distribution channel and the
extrapolation concerning a retail company, which is provided with the exclusive segment in the
special reporting. If there is an exclusive segment for a retail company, the sample of outlets of this
company has to be drawn with regard to representative aspects concerning this company (see also
chapter 4.3.). If there is no exclusive segment, the representative aspects concerning the
distribution channel are deciding. As a rule in both cases different extrapolation factors will result.
Advantages
The sample-based extrapolation concerning a distribution channel permits to include such retailers
completely, who do not supply GfK with data. This means that the distribution channel has to be
interpreted with a coverage rate of 100%, since all retailers are taken into account, whatever they
deliver data or not. If a retailer decides to stop the data delivery, it is possible to modify the
extrapolation factors of the other outlets in such a way, that the loss of data can be compensated
rapidly. If an additional shop is integrated into the sample and the extrapolation factors of the other
outlets are modified proportionately, the results of the extrapolation will be correct to.
Consequently occurring leavings and entries of retailers in the sample can be compensated by
modifying the extrapolation factors of the other outlets. This rule is valid in a comparable manner,
if retail companies close existing outlets or open up new outlets. Thus the anonymity of retailers in
the reporting is guaranteed. The following example may show this.
Assuming an extrapolation universe of 600 outlets and a sample of 20 outlets. Then the
extrapolation factor per outlet may be 30 per outlet in the first period. In the second period there
are 5 new shops in the sample (sample size 25) and the extrapolation factor per shop will be 24. In
the third period there are 3 leavings (sample size 22) and the extrapolation factor per shop will be
27,3. In the next period there are 2 leavings and 4 new shops (sample size 24) and the extrapolation
factor per shop will be 25.
Disadvantages
Regarding retail companies, who are supplied with the exclusive segment in their special reporting
an appropriate sample of outlets has to be chosen in order to get a good estimation for the retailers
own data. The choice of outlets becomes more and more difficult the higher the heterogeneity of
the outlets of the retail company is. Furthermore it has to be taken into account, if such a retailer
closes existing outlets or opens up new outlets. The reaction has to take place immediately and this
requires a quick information on the part of the retailer.
Only processing with census data guarantees completely correct data for the exclusive segment
assumed a complete and correct data delivery and processing. If all outlets of a retail company
realise nearly the same turnover and carry nearly the same assortment, i.e. there is a high degree of
homogeneity, the sample-based extrapolation concerning this retail company may lead to practical
results.
4.5.2. Census data
33

If all retailers within a distribution channel deliver census data, these data are aggregated. An
extrapolation is not necessary. The data of a certain retailer can be used directly for the exclusive
segment in the reporting. If not all retailers within a distribution channel deliver census data, a
sample has to be drawn in parallel.
Advantages
Since in cases of complete census data within a distribution channel there is no extrapolation and
no sampling error, an optimal quality of the reporting can be achieved. If a retail company closes
existing shops or opens up new ones, this change is taken into account automatically. This means
that extensive extrapolation tests with modified extrapolation factors are not necessary. The
distribution channel will always be represented with a coverage rate of 100%.
Disadvantages
If a retailer stops the data delivery, the result is a corresponding gap, a coverage rate less than
100% in dependence on the importance (turnover) of this retailer. In cases of key accounts the
quality of the data in the reporting can be put at risk. In order to produce relief the outlets of this
retailer have to be created out of selected outlets of the other retailers. But this method can only be
applied, if data per outlet are available. Otherwise the data of the rest of the retailers have to be
weighted correspondingly. But here a problem can appear, if the lost retailer differs from the other
ones significantly, for example a key retailer with strong private labels in the assortment.
Another problem appears, if a new retailer enters the market, who has to be assigned to a
distribution channel with complete census data. Then this retailer will be completely transparent in
the reporting. Consequently in cases of complete census data within a distribution channel there are
basic advantages with respect to the reporting quality, assumed a complete and correct data
delivery and processing, but there is also a bad risk, since one is dependent on every single retailer
in this channel.
4.5.3. Conclusion
The maximum of quality and confidence in reports with exclusive segments can only be achieved,
if the corresponding retail companies transmit census data. Regarding the retailers, who are not
supplied with an exclusive segment in the reporting, census data certainly guarantee the highest
quality and confidence, but the processing of census data is a question of costs. Therefore in these
cases a sample will be preferred. This means, if possible, census data concerning retailers with
exclusive segment and sample data concerning retailers without exclusive segment in the reporting.
4.6. Handling of entries and leavings
Sometimes retailers stop the data delivery because they do not want to provide GfK with their data
any longer, or GfK stops the co-operation with a retailer because of the bad quality of the data or
because the data delivery is too late, or new retailers could be recruited for the panel. These
activities lead to changes within the sample.

34

On the other hand retailers close their shop(s) or some of their shops, new retailers come into the
market and other retailers open up new shops. Some shops are enlarged, others are downsized.
Some shops change the assortment structure. They include new product groups or new product
sectors in their sales program and exclude others. So there are always changes in the universe.
Since a panel has to be adjusted permanently to the (changing) universe, there must always be
changes in the sample size and the cell populations, so that extrapolation cells have to be modified
in order to get a representative extrapolation further on. Normally the cell population is fixed once
at the beginning of the reporting year and is not modified within this year unless there are relevant
changes. But there are a couple of cases where a modification is necessary.
4.6.1. Entries
New shops of existing country channels, which are already included into MDM shop master, are
listed in the extrapolation system with a proposal for every new shop, which extrapolation cell
according to the cell definition it would be assigned to and which extrapolation factor it would get.
The decision concerning the extrapolation factor is based on the cell population and the universe
description for the corresponding cell. For the calculation of the proposed extrapolation factor it
may be specified on cell level whether the existing shop factors or the cell universe should remain
unchanged.
In extrapolation cells with a fixed universe the factors would change automatically, i.e. they would
decrease. For example, if 5 new shops join an extrapolation cell with a fixed universe of 600 shops
and a population of 25 shops, the extrapolation factor per shop will decrease from 24 to 20.
If the extrapolation factors remain unchanged, the new shop increases the cell universe. For
example, if 5 new shops join an extrapolation cell with a population of 25 shops and a fixed
extrapolation factor of 24, the universe increases from 600 to 720.
Thus, a new shop can be included into the extrapolation model with a single mouse click by
accepting the proposals for this shop. Exceptions are atypical shops which are not representative
for as much as shops as the extrapolation factor specifies. These shops need a special treatment.
The extrapolation factor has to be smaller than those of the other shops, i.e. atypical shops will get
fixed extrapolation factors. An atypical shop can also be assigned to a separate extrapolation cell of
its own. But as a rule atypical shops do not appear all-too often.
Most of the new shops are included automatically into the extrapolation model and the exceptions
are specified explicitly. This proceeding limits user interaction to a minimum but still guarantees
that extrapolation models are modified under user control.
A precedent condition for the decision whether a shop is atypical or not is that new shops have to
be examined with minuteness. For each new shop a so-called shop profile with all relevant
characteristics as
-

shop type,
total turnover or turnover class,
assortment structure / composition of the carried product sectors,
turnover per product sector,
membership in a buying group,
membership in a franchise organisation,
35

outlet of a key account,


headquarter with subsidiaries,
percentage of retailer and wholesaler turnover,
percentage of selling turnover and turnover with installation, service and repairs,
percentage of e-commerce turnover,
sales area

has to be drawn up. With the help of these data it should be possible to decide, if the shop is
atypical or not, and if not, which extrapolation cell it has to be assigned to.
In the normal case the decision on whether and how to include a new shop into the extrapolation
model is made before data of this shop are available for the first time. But it happens that the
delivered data of new (and also some old) shops are not yet usable. For these cases the system is
able to re-compute automatically the cells according to the cell population after the amendments in
the FactTool are made.
For example, if the extrapolation model expects and orders from IDAS data for 10 shops for a cell
with a universe of 70 outlets, resulting in an extrapolation factor of 7 per shop, and only 5 of the
data deliveries arrived or are usable, some of the missing data deliveries may be compensated in
the FactTool, e.g. by copying the data from the previous period. Assuming in this example that this
is done for 2 of the 5 missing shops this function would re-calculate the extrapolation factors based
on 7 shops, resulting in a shop factor of 10. Of course, this is based on the assumption that there
are no shops with a fixed factor in this cell.
4.6.2. Leavings
If a retailer stops the data delivery or GfK stops the co-operation with a retailer, the missing data
can be compensated by modification of the other extrapolation factors. This method is useful in
cases of large cell populations where the missing shop can be set aside.
In extrapolation cells with a fixed universe the factors would change automatically, i.e. they would
increase. For example, if a shop is missing in an extrapolation cell with a fixed universe of 600
shops and a population of 25 shops, the extrapolation factor per shop will increase from 24 to 25.
.
If the extrapolation factors remain unchanged, the missing shop decreases the cell universe. For
example, if a shop is missing in an extrapolation cell with a population of 25 shops and a fixed
extrapolation factor of 24, the universe decreases from 600 to 576.
It is also possible to copy the data of the previous period, but this cannot be done for many periods,
because the data then becomes too old, particularly for features and models. This method only
makes sense, if a new shop as a substitute for the lost shop is expected soon.
If the cell population is small, the reporting quality and consistency is put at risk. Incorrect data
and extreme values as special offers of products with large sales units or products with a price
higher-than-average and only little sales units will be multiplied with an increased extrapolation
factor. In this case a new shop must be found as a substitute for the missing one as fast as can. If
this is not possible, it has to be considered, whether the corresponding extrapolation cell can be put
together with another one. It is also possible to create dummy outlets in this cell. Then for each
dummy outlet a shop number is defined. The complete data of another really existing shop is

36

copied permanently into this shop number. This procedure is carried out until the cell population
will be enlarged.
4.7. Proceeding in case of problems with the data
4.7.1. Missing delivery periods
In most of the delivery periods the data of one or more retailers are missing because the data
transfer could not be realised. There are various reasons for that:
-

There are company holidays and no contact person is gettable.


The responsible person is ill or on holiday etc. and another person is not able to transmit
the data.
There is some trouble with the electronic data transfer or the merchandise management
system itself, which cannot be solved in time.
There is some trouble with the update, which cannot be solved in time.
A new merchandise management system is installed at the time.

In these cases there is a loss of the data concerning the corresponding retailer and delivery period.
Then it has to be decided about the further proceeding, i.e. if the data of the previous delivery
period could be duplicated, perhaps with a weighting, or if the data of two or more previous
periods could be taken into account, or if this period can be compensated by the average of the
previous and the following period (if possible), or if the data of another retailer (or other retailers)
should be applied alternatively, or if the extrapolation factors of the other outlets in the
corresponding extrapolation cells should be modified correspondingly (as described in 4.6.2.). The
smaller the turnover of this retailer the lesser is this problem. Regarding a retailer who is relevant
for the corresponding distribution channel problems with the reporting quality may appear.
As to this problem there is another document on the StarTrack platform:
2. Manuals
4. DWH DataWarehouse
DWH FactTool (English)
4.7.2. Late data delivery
If the data of a retailer were not transmitted until the latest delivery date according to agreement,
next day he will be asked for the reason and a new delivery date will be fixed. The new date of data
delivery should be determined, so that the data processing concerning this retailer is guaranteed in
sufficient time. If the data transfer until this new delivery date is not possible for one of the above
mentioned reasons or for another reason, the data concerning this retailer and this delivery period
are lost. In this case the proceeding is as described in 4.7.1.
4.7.3. Incomplete data
In some cases the data of a retailer are not complete, i.e.
-

the data of one or more outlets are missing,


the data of one or more product groups are missing,
37

facts as stocks or sales prices are missing,


data sets are missing,
the data do not include the total delivery period (for instance instead of a month only 2
weeks).

Then the retailer will be asked for the delivery of the complete data. If this is not possible, the
further proceeding has to be fixed.
In case of missing outlets the data of other outlets of this retailer could be duplicated, but only if
the assortment structure and the turnover is similar to that of the missing outlets. Otherwise it is
better to duplicate the data of the previous delivery period, perhaps with a weighting. If the sample
in the extrapolation cells of the missing outlets is large enough and the other outlets are not too
different, the extrapolation factors of the other outlets could be modified correspondingly (as
described in 4.6.2.).
In case of missing facts the calculation rules may help to compute substitutional figures. If the data
do not include the total delivery period, it could be tried to weight the available data.
4.7.4. Incorrect data
Sometimes there are obviously mistakes in the data of a retailer, e.g.
-

wrong sales prices,


negative figures,
differences between the delivered turnover and the product of sales units and prices,
wrong article numbers (error of posting),
wrong assignment of products to product groups,
unrealistic sales units, purchase units, stocks or sales value.

The first step in these cases is to ask the retailer for correct data. If he is not able to deliver correct
data in time, the calculation rules have to be applied, if possible. In cases of delivered wrong sales
prices the price check algorithm can correct them under certain circumstances. Errors of postings
and a wrong assignment of products to product groups can only be corrected manually. Here it is
the task of the retailer to put things straight.
4.7.5. Extreme changes in the data
Many times data with extreme changes in comparison with other delivery periods are transmitted,
e.g.
-

the number of data sets jumps up or decreases extremely,


the sales units increase or decrease extremely,
the stocks are upsized or downsized.

Then the retailer has to be asked, if the data are correct or not. In some cases the data really are
correct and there are reasons for the changes, e.g.
-

the assortment was changed,


the purchase was downsized because one or more relevant manufacturer could not
38

provide the retailer with goods,


the shop was enlarged or downsized,
the shop was closed because of reconstruction or renovation or company holidays,
a new and larger shop opened nearby and affected the sales of the corresponding shop,
the best salesman was ill or on holiday.

In cases where the data are extreme but correct they have to be processed. If the changes in the data
trace back to mistakes, the retailer will be asked for correct data. If he is not able to deliver correct
data in time, the calculation rules have to be applied.
4.7.6. Irreproducible changes in the data
It often appears that there are changes in the data, which cannot be reconstructed as
-

the delivery of new, so far unknown outlets of a retailer,


the delivery of new, so far unknown product groups,
modifications of the assignment within the product groups and product categories,
modifications of the assignment of the outlet numbers,
the pooling of product groups.

In all these cases the only way to get the needed information is to ask the retailer.
4.8. Test of the extrapolated data
In each case the extrapolated data have to be tested in order to be sure that the specified
extrapolation is correct and the required accuracy has been achieved. If there is a definite shortfall,
the chosen method cannot be used because there would be too many inaccuracies. In this case the
extrapolation or even the sample design has to be modified.
4.9. Coverage
The retail panel is not in a position to represent the total market. Small and very small businesses
are not included, since they are not economic to survey. For example electrical businesses with an
annual turnover of less than 50.000 do not belong to the electrical specialists according to GfK
definition and are excluded. There are also large companies which refuse data delivery because of
their data confidentiality. If these companies belong to a distribution channel with a sufficient
number of similar outlets in the sample, the extrapolation can take place with corresponding
extrapolation factors and 100 % coverage in this channel, the more so as it is not necessary that
outlets of all companies in a distribution channel are in the sample. If such a company dominates a
distribution channel, as a rule this channel will not be audited.
But in each case of extrapolation the coverage is 100 % with respect to the extrapolated
distribution channel or the extrapolated retail company. The aggregation of these channels results
in the panel market.
In cases where it makes sense the gap between the panel market and the total market can partly be
filled with data of the household panel. Here data on purchases by households in distribution

39

channels are collected, which are not covered by the retail panel, for example electrical businesses
with an annual turnover of less than 50.000 or petrol stations or kiosks etc..
4.10.

Representativeness of the extrapolation with respect to the universe

It is a widespread misbelief that the sample has to be representative for the universe. This view is
wrong. A sample representative for the universe is the exceptional case. In order to estimate
universes with heterogeneous outlets what is mostly the case as to retail channels a sample is
needed which is appropriate for a representative extrapolation. In most cases there will be a
disproportional sample structure which depends on the parameter values of the outlets, i.e.
turnover, sales area etc.
4.11. Maintenance and updating of extrapolations
The retail scene is changing during time and these changes have to be considered in the
extrapolation (as described in 4.6.). This means, if there is a change in the universe or in the sample
or in the MDM features etc., the extrapolation has to be readjusted. This is done by setting up a
new extrapolation version as to the corresponding country channel. The changes are not carried out
automatically by the system, but the user has to put them in.
As to the setting up of extrapolations there are two more documents on the StarTrack platform:
2. Manuals
4. DWH DataWarehouse
DWH Extrapolation (English)
2. Manuals
4. DWH DataWarehouse
How-To-Do-Documents
How-To-Do DWH Extrapolation

40

Das könnte Ihnen auch gefallen