Sie sind auf Seite 1von 44

WHITE PAPER

FROM BIG DATA


TO BIG BUSINE$$
PAPER 1 : Fad or
Performance Enhancer ?
LIVRE 1 - LIVRE BLANC DU BIG DATA AU BIG BUSINESS

Chapitre 1 - p.01
PAPER 1 : Fad or performance
lever ?
FROM BIG DATA TO BIG BUSINESS - PAPER 1

Table of Contents - p.04

01 03
CHAPTER CHAPTER

Big data, background p.06 The new uses created by p.20


and principles Big Data

Why this paper ? p.06 A closer look at existing, untapped p.20


information repositories
A shift in management p.07
thinking Proliferation of raw, internal data p.21
available
Big Data definition p.07
Proliferation of public, external p.23
Disruption or evolution ? p.08 and purchasable data
Big Data history p.08 Open Data p.24
Form follows function p.08 Monetizing data p.24
Processing weak p.09 Full-scale data cross-referencing p.25
signals
Big Data Projects p.10

02
CHAPTER

04
CHAPTER

Data p.11

Architectures and p.26


Big Datas 3Vs p.13 algorithms

Beyond the 3Vs: the 5Vs p.14


Beyond the 5Vs: the 3Ps p.16 Big Data hardware architecture p.26
specifications
Data, information, p.17
knowledge and wisdom Big Data software architecture p.27
What is the difference, what is specifications
their value ? Big Data database specifications p.29
The accumulation of data No ready-made database p.30
seemed meaningless but use has p.18
changed all that. Many tools, each specialized in p.30
one field
Big Data architecture, specific to p.30
Big Data?
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.05 - Table of Contents

05 07
CHAPTER CHAPTER

The Big Data jobs p.31 How to convert Big Data p.39
into Big Busine$$

The return of EIM (Enterprise p.32


Information Management) What to remember about Big p.39
Data?
How to get started? p.32
The 10 key points
Valuing data by dedicating it to p.32 p.40
business
The new Big Data jobs p.33
From punishment p.34
to career prospect
The relationship between MDM
and Big Data p.35

06
CHAPTER

Big Data or Big Brother? p.36

Heads: the hope for a dynamic p.36


sector that will boost the whole
economy
Tails: the privacy debate p.36
CHAPTER 01
Big Data, background
and principles

Why this paper ? learn, how to make the most of it.

Big Data literature abounds; this is a The aim of this white paper is to pro-
sure sign that Big Datas importance vide companies with key insights that
is strongly felt by the market as a will help them approach Big Data,
whole and across the world. Howe- not as a mythology, but as a power-
ver, even in cases where documen- ful performance optimization tool
tation is of high quality, it is usually that can be adapted to their specific
mostly descriptive in nature and fo- contexts.
cused on exaggerated, near-apoca-
lyptic concerns associated with the We hope that this will help readers
exponential growth in data and data lay, or maybe even validate, the foun-
sources. These approaches do not dations for a smooth, controlled in-
help understand the real challenges tegration of Big Data into their com-
posed by Big Data or how companies panys ecosystem.
can exploit them.

Even though, at this point, it is hard


to predict what the future holds, we
are convinced that Big Data1 will im-
pact heavily on companies and civil
society in a number of increasingly
changing ways. By rapidly cutting
their teeth on Big Data, currently still
in infancy, businesses will not only
master the phenomenon, but also

1
Especially since Big Data involves new forms of
reasoning such as, forms of inductive reasoning (see
page 8). We can quite safely refer to Big Data as a new
philosophy and as whole a revolutionary approach to
marketing.
A shift in management Paper 2 on data, Big Datas essential
thinking fuel

The Big Data evolution is extremely Paper 3 on Big Data uses


important to companies. More than
a simple trend, it is a revolution in Paper 4 on Big Data architectures &
thinking, a crucial contribution to the algorithms
management arsenal that will funda-
mentally change the world of business. Paper 5 on Big Data professions
Its obvious impact on marketing should
not obscure consequences of a per- Paper 6 on data confidentiality, user
fect command of the subject on other protection and ethics
fields such as analysis, management,
production, supply chain management We will provide both a global (in this
and R&D to name but a few. paper) and detailed (in subsequent
papers) view of Big Data in order to
This major revolution in thinking is make it accessible to all professionals
however ill-served by an abundant lite- who want their business to benefit
rature that tends to either censor the from this new approach and its tools.
new fields too-technical vocabulary in
order to conceal its complexity, some-
times to the very detriment of compre- Big Data3 definition
hension; or, on the contrary, to go too
deeply into this complexity and its even The term Big Data refers to a new disci-
more technical 4 components (bu- pline that is at the crossroads of seve-
siness function, technology, databases ral others: statistics, technology,
and statistics) and lose the reader. The databases and business functions
risk then is for readers to erroneously (marketing, finance, HR, etc.).
think that the Big Data subject is either
too generic to constitute a real innova- A new discipline that owes its existence
tion or too innovative for useful appli- to technological power has rende-
cation to day-to-day business. red possible things that until now re-
mained in the realm of the theoretical.
As a result, Big Data2 is perceived as a The things that we talk about here are
problem by companies when in fact, it mainly associated with two challenges:
is a solution. The aim of this paper, the data volume and data complexity.
first of a series of 6, is to describe the
impact and uses of Big Data in a simple
and clear manner. Instead of impoveri-
shing the narrative, we will explain the
jargon in order to make the advantages
of these new tools accessible to all.

This first paper is dedicated to the Big


Data phenomenon in general, with
each chapter detailing aspects of the
subject a little further. Subjects that will
be dealt with in subsequent chapters
are:

2 3
For more information go to "Tendances" Trends.be Although both the singular and the plural are often used
to qualify data, Big Data is mostly referred to as a mass
noun, i.e. a singular phenomenon of mass information.
Consequently, Big Data will always be referred to as a
mass noun in the singular in this document.
FROM BIG DATA TO BIG BUSINESS - PAPER 1

Big Data, background and principles - p.08

The objective of Big Data is to tap into coined the term Big Data6 persist,
exponentially increasing volumes of we can however trace the earliest do-
data that have become near impos- cumentation on the famous 3Vs (Vo-
sible to process, using traditional da- lume, Velocity and Variety) predicting
tabase management and information exploding amounts of data and the
management tools4, and to handle creation of a new data processing
complex data in a timely manner. back to the beginning of 2001, accor-
ding to analysis firm Gartner.
According to the works and words of
the 451 group and Gartner; the aim of It should also be mentioned that
Big Data is to achieve competitive ad- Big Data is the culmination of the
vantage through data collection, ana- Data Mining approach, popular du-
lysis and use methods that, until now, ring years 1995-2000, which itself
could not be used due to the econo- was born out of the association of
mic, functional or technical constraints two relatively old schools of thought
associated with the volumes, proces- (trends), i.e. statistics and artificial in-
sing velocity and variety of data invol- telligence.
ved.

Form follows function


Disruption or evolution ?

Big Data is sometimes presented as a


disruptive phenomenon that challen-
ges the very foundation of everything
done in the past in terms of decision
support or, on the contrary, simply
as the next evolutionary step in bu-
siness intelligence organizations and
systems. Though this may seem like a
mere semantics debate, it is not. In-
deed, depending on their opinion on
the matter, the scenarios put in place
by companies will be very different.

Big Data history

Big Datas meaningful history clearly


highlights the fields specific nature;
if the term Big Data was first used by
analyst firm Gartner in 2008, Big Datas
origins can however be traced much Regarding the disciplines origins de-
further back. In a way, concurrent with bate, the excellent OReilly Big Data
the rise of information technology, the glossary could help reach a consen-
concept-like all other innovations-sim- sus as it focuses on the e-merchants
ply took time to become widespread and other collaborative web players
and to refine itself. who actually were part of the pheno-
menon rather than the analysts who
Gil Press sets this new disciplines simply described it.
first appearance in an even more dis-
tant past5 (1944). But without going In other words, it would seem that,
that far, and even if somewhat tech- much like for OReillys famous Web
nical discussions on who actually 2.0 in 2004, the phenomenons des-

4 6
Please view also complete definition in Wikipedias open See article on the New York Times blog:
encyclopedia which inspired ours: http://bits.blogs.nytimes.com/2013/02/01/the-origins-
http://en.wikipedia.org/wiki/Big_data
5
http://www.forbes.com/sites/gilpress/2013/05/09/a-very-
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.9 - Big Data, background and principles

criptive term appeared after the phe-


nomenon itself, as is often the case in Logic has been overthrown: data co-
the digital sphere. mes before use, business initiative
before research, function, in a sense,
Cloud Computing, for example, stem- precedes formand the uses created
med from major websites (Amazon, by these technological advances are
Google, e-Bay, Microsoft, etc.) excess numerous, as described in the chap-
hosting capacity, a result of the origi- ter on uses (see Chapter 3: page 5).
nal architectures they had set up to
support visitor traffic as well as meet Before thinking about whether Big
hosting elasticity needs. Similarly, Data is a disruption or an evolution,
Big Data materialized from the over- let us present the new developments
dose of data generated by Internet triggered by the phenomenon in the
users activities on sites like Amazon, following 4 categories: data, uses,
Yahoo! and much later Google. The work methods and tools.
main players have not changed which
led to unforeseen innovative marke-
ting applications7.

A mechanism has been made avai-


lable to influential companies. They
can now invest in technologies in
order to find ways to satisfy user
needs in an extremely short period
of time. The Internet bubble made
experimenting in real-time and with
unlimited resources possiblethe ul-
timate researchers dream!

Processing weak signals the future would be so freely available. It


(P. Cahen) is how you mentally approach it that helps
you decipher the future. Intuition plays a
The fact that Big Data helps take advantage major role in detecting and interpreting
of weak signals does not mean that weak weak signals.
signal detection is a new subject. Philippe
Cahen, author of Uncertainty Marketing, The future is a leap into the unknown, an
explains how weak signals impact unknown that can be either reassuringly
marketing and why they are so important. friendly or so unbearable that all we want
to do is run away. Actually living life the way
What is a weak signal? Philippe Cahen it was planned is rare. Usually, it is quite
gives the following definition in his book the opposite.
Everything there is to know about
uncertainty marketing: Big Data fuels the thinking process and
actions associated with weak signals; it
A weak signal is a lateral thought- helps derive hypotheses that can then
provoking piece of information []. A be tested by cross-referencing data and
weak signal is not a small fact that gives behaviors.
information about the future. It would be
too easy, somewhat nave, to think that Source: Marketing Uncertain by Philippe
information about Cahen, Kawa publishing , 2012

7
Lise Gasnier on Solucom Insight
FROM BIG DATA TO BIG BUSINESS - PAPER 1

Big Data, background and principle- p.10

Big Data projects Last but not least, every Big Data
project includes a social component
Big Data projects typically have four that must be taken into considera-
components. tion. What is the ability of our so-
cieties and each group of people or
First, it involves Big Data technologies, individuals to accept the circulation
hardware and software. Second, and use of their personal data? To
it requires a specific methodology avoid exposing ones project and the
approach that will be briefly mentioned whole area of application to risks, it
in this document and further detailed will be up to companies to self-re-
in following publications. gulate and to legislators to adapt
to these new technology-driven
The third component is a legal one contexts and possibilities.
since a perfect command of the legal
framework associated with handled
data and intended uses is important.
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.11 - Data

CHAPTER 02
Data

As its name suggests, data is well and The ever-exponentially increasing


truly the foundation and raw material data generated by connected objects:
of the Big Data phenomenon; it is smart machines, cars and
therefore the only natural starting meters, set top box (operators Inter-
point. net gateways, cable operators decod-
ers, etc.), sensors, data generated by
In a matter of years, structured data, home automation systems and per-
managed by traditional IT applications sonal biometric systems
(ERP, CRM, SCM, etc.), was joined by
a huge volume of other data, often And finally, data created and ex-
referred to as unstructured or semi- changed outside of traditional busi-
structured data8: Examples of this ness communication channels,
type of unstructured data include: through the social Web

Electronic mail (e-mails and Data is considered unstructured


an increasing amount of instant when it requires a complex transfor-
messages), the digitization of all mation before delivering meaning.
contractual documents, data entry
and digital footprints on websites, Whether you are working with an im-
conversations with call centers age or a sound, a sentiment or a text
in a given language, a geolocation or
Mobility data: IDs (IMEI identifiers, sensor information, it is easy to un-
SIM cards, ULDs, etc.), browser derstand that real time processing,
histories, geolocations and even user will require powerful algorithms.
preferences

8
Examples of semi-structured data: Messages, mails, logs,
etc.); and unstructured data: photos, videos, sounds.
FROM BIG DATA TO BIG BUSINESS - PAPER 1

Data - p.12

These new types of data can be used Is there a priority when processing
to enrich other data but they can also, data; should one or more sources
in some cases, constitute the very of data be given higher priority?
heart of the information to process. The most accurate answer to this
This will depend on the particular question will depend partly on the
sector and the impacted process, as objective one wishes to achieve and
will be discussed later in the chapter partly on the data available.
on Big Data uses.
Let us look at an example of Big
Since it is clear that the new data will Data applied to marketing: how
have to be linked to existing reposito- to choose between a marketing
ries and data at some point, compa- message customization and targeting
nies should therefore, before jumping initiative, and a brand e-reputation
on the Big Data bandwagon, ensure or awareness measurement or
that the bulk of their traditional data improvement effort? The answer to
is well structured and processed, ac- this question in this case will differ
cording to application subject: trans- according to the order in which items
actional data, receipts, campaign are processed.
data, browsing data, sensor, probe,
measuring tool and statistical analy- In the first case, customers purchase
sis tool data or visit, meter and notifi- history will play a major role in
cation data, etc. analysis. Indeed, past purchases
will provide valuable information on
We should point out that Big Data is buying behavior (to whoever knows
not meant to systematically process how to analyze them) and also on
all the data available in an area. Trying brand preferences, uses, needs, etc.
to do so would be counterproductive Next, you will try to factor in brand
and would lead to risky, extremely interaction data, Web navigation data,
complex, and to be honest, pointless marketing campaign data and so on.
projects.
In the second case, if your aim is the
In the past, we have often failed to analysis of brand reputation, you will
keep in mind the age-old adage that primarily seek to interpret information
trees do not grow to the sky. In spite obtained from social networks,
of the current enthusiasm, going for forums, and more generally, anything
a constraint-free or limitless type hy- that is being said on the Web about
pothesis would be unreasonable. the brand.

So, if we take as basic principle that Illustrating why setting an initial


we will not systematically process all goal for every Big Data project is
of a given areas data, but rather fo- important, as it determines the order
cus on and process data we need in of priority during data processing.
a logical manner, the next question is:
where to begin? Field observation has led us to
the following conclusion: the less
structured the data, the more complex
the processing required to transform
it into actionable knowledge; namely
through an intelligent reformatting
process. The first step is to examine
available structured data to ensure
that it is being properly used. Then,
it can be gradually enriched with
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.13 - Data

unstructured data and associated V for Velocity: given the rapid


intelligent algorithms. obsolescence of some real-time and
social media data (behavioral or user
It should be noted that it is not really opinion data10), their integration with
the volume of data available that will other data to generate timely insight
indicate how complex processing must be done as quickly as possible.
will need to be. Richness, degree
of reliability, and structure (or lack We would like to make the following
thereof) will be the more determining clear: there are two major types of
factors. Big Data projects. One that processes
data in real time, and another that
operates without this constraint. These
two types of project involve different
Big Datas 3Vs approaches, technical architectures,
tools and data.
These new data sources are
characterized by what is commonly It seems evident that a real-time
known as the 3Vs9: product recommendation project for
an e-commerce website cannot have
V for Volume: with a growth rate the same objectives, and resource
of 50% a year, the volume of data requirements, as an in-store buying
available is growing exponentially. behavior analysis project.
Since data cross-referencing is the
key to determining the relevance of To fully grasp the nature of these issues,
generated information, this means we will adopt a two-step approach:
that the amount of data to handle is testing and standardization.
exploding.
Testing or build corresponds to
V for Variety: The various format the use case validation phase during
types (text, photo, video, sound, tech- which data is specified, formatted
nical log, etc.) must now be added to and analyzed for the first time by
a wide variety of internal and external a Data Scientist11. Based on this
providers; objects or people Variety initial analysis, several predictive
also encompasses all the possible uses models will be developed with a
associated with raw data. A sound file view to implementation and model
generated by a call center can be used automation.
to create a text file [speech-to-text
application] or for voice sampling and When in this phase, we recommend
later, vocal recognition. that instead of investing in an
architecture, you opt for a platform
as a service which will adapt to
requirements as testing progresses,
i.e. as needs evolve.

9 10
For more information on the originator of the Known as sentiment analysis.
3V-concept and the numerous people who have laid 11
See chapter on the new Big Data jobs on page 21 for
claim to its invention, please see Doug Laneys article: more information on this new profession
Deja VVVu: Others Claiming Gartners Construct for
Big Data: http://blogs.gartner.com/doug-laney/deja-
vvvue-others-claiming-gartners-volume-velocity-variety-
construct-for-big-data
FROM BIG DATA TO BIG BUSINESS - PAPER 1

Data - p.14

The predictive model challenge is a mode.


complex one and we will tackle the
subject in further detail in following Processing cannot be done entirely
chapters. At this stage, we should in real time since usually, responses
simply note that there are two types must be immediate and there is no
of models: self-learning models using time to search a whole database for a
artificial intelligence algorithms and more comprehensive analysis.
traditional predictive models based
on statistical algorithms. However, hot data can be processed
in real time if the operation is
Although self-learning models require sufficiently backed up by aggregates
less preparation time and initial data and results obtained in earlier batch
analysis, they do not remove the need calculations. This is known as predictive
for a Data Scientist or Data Miner. model adaptation to real time.

As far as we are concerned, the testing Cases that truly require full real-time
phase is a key element of Big Data processing are actually extremely rare.
implementation methodology. This is
namely due to the maturity level of its Beyond the 3Vs: the 5Vs
4 previously mentioned components:
business function, technology, The classic characterization has now
algorithm, and data. been enriched with 2 more Vs which
we think are of importance:
Once the approach is stable, we
can launch the next phase, i.e. V for Veracity: data obtained from
standardization or run, during the ISs central applications is limited
which Big Data models are used in an in volume but controlled in terms of
automated format that is linked to the coherence and quality. Public data
IS (company repositories and data) in a associated with opinion, expression
seamless and scalable manner. or behavior, on the other hand, even
though abundant, could have been
When the build phase, which filtered or distorted. So, its use hinges
represents 80% of the work effort, is on the ability to neutralize these
set in a real-time environment, we weaknesses without modifying the
work with samples and still need to original information. Managing data
collaborate with one or more Data veracity criteria is an integral part of
Scientists to coldly process the data Big Data projects. Data reliability has
using one or several Data Mining become an essential criteria as the
software technologies (regardless of gigo13 principle applies more than
type). ever to Big Data. So much so in fact
that now the term Right Data has
When the real time dimension popped up in opposition to Big Data,
is absent, testing can be done in which is too Big and not enough
environments that are almost identical Right.
to real life.

The standardization or run phase V for Value: even though assessing


also requires some fine-tuning the value of a data item on the spot is
between real-time and delayed data difficult, it would however make sense
exploitation. A decision will have to to strive to integrate data sources
be made regarding what to process that are likely to generate information
interactively with user data in real- whose added value is proven. Beware
time mode; and what to put through a of falling into a restrictive pattern: a
delayed calculation process in batch12 data source that is internally useless

12 13
In contrast to real-time mode, batch processing is the Gigo: Garbage in Garbage Out. For more information
processing of data that has been previously transferred visit: http://en.wikipedia.org/wiki/Garbage_in,_garbage_
to a storage space. out
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.15 - Data

can have a monetizable value for a but extremely descriptive, we can


partner. Another data source can shed an additional light on the subject
seem to have no value but when of Big Data and its destination. We
associated with others can generate a have named the 3Ps: Prediction,
discriminating signal. Personalization and Prevention, that
highlight in a unique way, the role
played by Big Data in some particularly
relevant use cases.
Beyond the 5Vs, the 3Ps

Above and beyond this 3V-approach


now extended to 5, which is useful

Uses
Prediction
Prevention
Personalization

Data
Velocity
Knowledge Volume
Variety
Veracity
Value

Figure 1. Big Data is about more than just the


very descriptive 3Vs. 2 more Vs can be added
to that to qualify data and more importantly,
there are the 3Ps that describe where Big Data
is heading.
FROM BIG DATA TO BIG BUSINESS - PAPER 1

Data - p.16

P for Prediction system analyzes interactions with the


With a view to anticipation, a big part customer in real time and cross-refer-
of Big Data uses is dedicated to pre- ences them with organized information
diction. How to use data to better from the base (see above). The solution
anticipate? How to acquire sufficient then uses these analyses and interac-
knowledge to forecast demand, pre- tions to output real-time personaliza-
dict problems, behaviors, tastes, etc. tion or a customized offering.
Going too far would of course raise
ethical issues as will be discussed later.
However, being excessively cautious P for Prevention
could cause the company to lose mar- The third field of application is Preven-
ket share to competitors. tion. The point is to use Big Data to
identify risks and dangers and prevent
For example, Big Data helps better them. Above and beyond the predictive
understand customers and their ex- notion, the goal is to define what is con-
pectations by cross-referencing data sidered a risk or a potential danger. Let
from Business Intelligence or analytical us consider a few examples. Big Data
CRM (which describes the customers helps identify fraudulent behaviors
behavior on traditional channels) with and applies, in real time, an adapted
browsing data (which describes the processing strategy. In such a case, the
customers behavior on Web and mo- Big Data solution will apply analysis and
bile digital channels) and data captured predictive models to collected data to
on social networks. So, through Big define detailed potentially fraudulent
Data, I can collect, group and reconcile behavior patterns. Another solutions
all captured data to apply analysis and would be to set up a rules engine that
predictive models to them and obtain, will detect, in real time, behaviors that
for each customer, a conversion and match the defined pattern and trigger
churn rate of unrivalled precision as an adapted processing workflow. Secu-
well as relevant and personalized rec- rity is not the only area that can benefit
ommendations. from this mode of operation; it can be
extended to applications associated
with the health sector and risk preven-
P for Personalization tion for example.
The second category of uses focuses on
the ability to personalize, at the most Data, information, knowledge
detailed level, the interface provided and wisdom...What is the
by the solution. Beyond the predictive difference, what is their value?
component, it is more about in-depth
environment knowledge that can be
used to configure the whole system so
that it suits a specific group of people
or even a single individual. Let us take
another example. Big Data makes com-
plete website personalization according
Info Data
to customer or connection context a
possibility. As a result, the very moment
a person logs on, the system under-
stands who is there, with a varying
degree of certainty and quick response,
why the person is there, with also an
allowed element of uncertainty. Based Figure 2. Data is different from information, it
on this information, the page automat- must be refined before it can be considered, if
possible, as true information
ically builds itself, content is adapted,
processes differentiate themselves. The
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.17 - Data

The difference between data, 4. U


 nderstanding: i.e. appreciation
information, knowledge and wisdom of why
was well established by Russell Ackoff14,
a systems theorist and professor of 5. W
 isdom: the ultimate step, which
organizational change; he classified is evaluated understanding
content as interpreted by the human
mind in five different categories:

1. Data: which are equivalent to


symbols

2. Information: data that are


processed to be useful; provides
answers to who, what, where,
and when questions

3. K
 nowledge: associated with
information and the processing
of data by the human mind.
Knowledge answers the how
question

Improve results ..... ............. OPTIMIZATION


AVANTAGE CONCURENTIEL

DATA Impact of phenomenon ............... PREDICTIVE ANALYSIS


MINING Trend Simulation .. ... FORECAST

Explanation of the phenomenon DESCRIPTIVE ANALYSIS

NOTIFICATIONS ....................................................Action

DETAILED QUERY ........... .Problem Identification ?


BI
ON-DEMAND REPORT . Dashboard (where/when/how ?)

PREDEFINED REPORT ......................................................................................What ?

PROCESSED DATA
DATA INFORMATION KNOWLEDGE INTELLIGENCE
RAW DATA

Figure 3. Big Data is perfectly adapted to Ackoffs


theory on data and information.

14
Freely inspired by the Bellinger, Castro and Mills article
at: http://www.systems-thinking.org/dikw/dikw.htm
FROM BIG DATA TO BIG BUSINESS - PAPER 1

Data - p.18

Ackoff indicates that the first four cat-


egories relate to the present and the
past; they deal with what is already
known. The fifth category deals with
the future because it incorporates
vision and design. With wisdom, peo-
ple can create the future rather than
just grasp the present and past. But
achieving wisdom isnt easy; people
must move successively through the
other categories.

This sequential and gradual linear ap-


proach to knowledge is challenged by
Big Data, even if it can be argued that
Edgar Morin15, through his introduc-
tion to complex thinking, has already
debunked this deterministic vision of
knowledge. With Big Data, the Meth-
od thought up by Edgar Morin is al-
most within reach.

The accumulation of data


seemed meaningless but use
has changed all that.

The accumulation of data, namely on


the Internet and social networks, has
no basic intrinsic value as in tradition- Figure 4. The traditional deductive approach,
al operating mode. Unstructured data on top, (refinement, cross-referencing, deduction)
and the inference method, at the bottom; typical
is not considered information. Such of Big Data processing where the starting point is
raw, unrefined, non cross-referenced a hypothesis that leads to the cross-referencing
data cannot be leveraged without a and association of data to form other hypotheses
...that end up generating information.
considerable amount of prior work.

The Big Data phenomenon changes Big Data relies heavily on statistics
the traditional way of looking at da- and artificial intelligence; one of
ta-without actually making Master statistics biggest assets being that it
Data Management (MDM16) obsolete naturally adapts to the notion of data
-by introducing an imperfection and uncertainty. However this uncertainty
uncertainty notion (see text box on does not mean that it is possible to
uncertainty marketing). work with data that is of too poor
quality.

But a certain measure of uncertainty


regarding data and results is tolerated.
It would then be fitting to characterize
Big Data results using a robustness
index.

15 16
See a summary of Edgar Morins Method at: http:// http://www.piloter.org/business-intelligence/mdm.htm
fr.wikipedia.org/wiki/La_Mthode_(Edgar_Morin)#La_
connaissance_de_la_connaissance
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.19 - Data

Field observation shows that a 95% While raw data in itself is of little impor-
robustness (i.e. a 95% chance that tance, by cross-referencing and vali-
the result is correct) is extremely sat- dating it statistically we can gradually
isfactory, and that such data can be increase its value, not through deduc-
deemed reliable. It is essential to take tion, but through induction and statis-
this uncertainty element of data and tical inference. Without going so far as
results into consideration when setting to speak of artificial intelligence (where
up a Big Data project. the machine decides which data is
valid), with Big Data, it is statistical
Uncertainty can be measured using cross-referencing of data that creates
statistical tools such as the widely pop- value and meaning. So it would appear
ular p-value. that data can lead to information that
is counterintuitive.

Big Data: data can finally be fed to a real-


driving uncertainty time system or machine-or used
marketing. for studies, error identification,
customer case studies or opportunity
evaluation-and help determine the
Big Data is characterized, amongst most appropriate corporate levers or
other things, by the fact that the processes to activate
data it provides is not always certain.
However, the huge volume of
information and wide variety of data
sources involved can be leveraged
downstream to improve on this level Read the rest of Patrick Bensabats inter-
of certainty. In other words, weak view on Uncertainty Marketing and Big
signals can gradually grow in strength data on Business & Decisions Big Data
and once they are strong enough, the blog at: www.blog.businessdecision.com
FROM BIG DATA TO BIG BUSINESS - PAPER 1

The new uses created by Big Data - p.20

CHAPTER 03
The new uses created by
Big Data

Big Datas aim is to search all data in-


ternally available in a company or its
ecosystem for a means to generate
innovative uses and performance Meter data for meter-reading
levers.
Log data for IT operators
A closer look at existing, untapped
information repositories Geolocation data to push
marketing offerings
Traditionally, companies always give
priority to structured data from And yet, cross-referencing these
relational databases and associat- data items with other IS data could
ed with management applications: produce valuable information for a
ERP, CRM, etc. The other types of big number of users.
data such as office automation data,
e-mails, audio recordings etc. are Consider this example: websites
neither shared nor cross-referenced generate technical data (logs, tags,
with structured data to enrich deci- etc.) meant to be used for technical
sion support information. administration or website optimiza-
tion. This data includes information
Data use is often limited to one on visitor clicks on these sites; by
application: today, corporate data combining this technical data with
is mostly restricted to single-use customer knowledge data stored in
in the silos in which they were the base, we can quickly derive the
created and is rarely used for following winning equation:
a purpose other than the one
for which it was generated. For
instance:
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.21 - The new uses created by Big Data

The approach itself was of course


totally different due to the nature
and volume of data involved,
and to the underlying solution
implementation philosophy.
Indeed, for reasons associated with
CUSTOMER KNOWLEDGE DATA (PROFILE) technical and financial constraints,
= WHO IS MY CUSTOMER ?
Business Intelligence best practices
advocated the organization of
+
data according to intended use.
The ultimate purpose had to be
GENERAL BROWSING DATA - THROUGH CLICKS
(CONTEXT -SPECIFIC) understood before structuring the
= WHAT DOES MY CUSTOMER WANT ? database for optimum operation.
But data is evolving at breathtaking
= speed and as a result, each new
data warehouse upgrade required is
OFFERING ADAPTED TO CUSTOMER calling the information organization
strategy further into question.

Big Data has created a paradigm


Figure 5. How to leverage corporate data. shift.

Proliferation of raw, internal data


available

We mentioned earlier that the


By smartly reconciling these two emergence of the Internet led to
data sources (at customer level, the generation of a huge amount
not later than D+1), we are gifted of additional data: visitors physical
with enriched information that can addresses (IP), cookie (or its
be used to recommend an adapted substitute finger-printing) data,
offering to the customer at the very technical data on how the site is
time at which he expresses the need running (logs), Web statistics data
for it. (Web analytics/tags) to keep track of
visited pages, clicks and all events,
Can such a project be implemented including response tests where
without considering Big Data? customers reaction to a browsing
Admittedly it can, but only at the scenario is evaluated (A/B testing).
cost of increased complexity and
development expenses, and with the More recently, the development of
need to adapt features to each new Smartphones and mobile Internet
hypothesis that makes rethinking further added to the data deluge:
the customer path unavoidable. call numbers, geographic locations,
activity timestamps, etc.
It is clear that the de-isolation of
data started with the construction of
cross-functional data warehouses.
In that sense, Business Intelligence
helped prepare for the advent of Big
Data. It made companies grasp the
importance of grouping data around Example 1(page 22): How ACCOR
key repositories like the customer or used Big Data to boost its sales.
the product.
FROM BIG DATA TO BIG BUSINESS - PAPER 1

The new uses created by Big Data - p.22

How to build loyality to


increase hotels
occupancy rate

Accor is an international hospital- To do so, the hotel group had to


ity Group present in 92 countries. acquire a deep, cross-channel
It manages 3,500 hotels, oper- customer knowledge.
ating under 15 different brand
names, and some 230 million This information helped the com-
people visit its websites every pany recommend offerings in real
year; the companys business time, based on customer expe-
club for regular customers boasts rience and preferences, and for
10 million members. all of Accor Groups brands. Any
customer insight gained is also
The Challenge: Self- shared with all customer rela-
directed customers tionship players in order to better
with extremely person- meet expectations.
al and complex paths
This was made possible by imple-
50% of Accors activity is carried menting, in just 9 months, a 1 to 1
out directly, through its central CRM tool that builds a knowledge
distribution channels. This activi- base for each customer, and
ty is based on a wide range of of- linking this tool to a real-time
ferings whose growth is driven by marketing solution to optimize
online travel agencies. digital channel and call center
sales, both locally in hotels and
The tourism market is now a ful- centrally for all of Accor Groups
ly digitized one: people shop brands and chains.
around, compare and choose
whatever suits them best, fol- Customer information is enriched
lowing a completely self-directed with data generated by the real-
digital path. Each customer lives time offering recommendation
a unique experience which in- tool. Accor is gradually deploying
volves several channels: and this this solution to all of the Groups
experience is what determines units.
whether a customer will return to
a given hotel or not.
From Big Data to Big Business
Corporate sales performance
thus depends on how well the Performance indicators have
company knows its customers helped validate the results and
and their behavior and its conse- ROI of this 1 to 1 marketing solu-
quent ability to recommend the tion.
offerings most adapted to their
wishes, interests and past expe- Thanks to this personalized
rience. The aim of this approach marketing system, Accor makes
is to significantly increase the 1,200,000 personalized product
efficiency of marketing systems recommendations per day
as regards offering content and
timeliness.
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.23 - The new uses created by Big Data

Proliferation of public, exter-


The number of contacts in the nal and purchasable data
customer database has increased
from 20 to 35 million Social networks (Twitter, Facebook,
LinkedIn, etc.) also generate a con-
The click rate on invitations ad- siderable amount of data. Openly
vertised on the websites pages accessible through APIs (Application
has doubled thanks to message Program Interfaces18) - interface pro-
personalization grams that help siphon off an applica-
tions data to reinject it into another
Click rates, conversion rates, this social data can constitute a
customer lifecycle data and many source of information for companies
other indicators are measured whose reputation is made public on
and recorded in dashboards these platforms. Sectors such as con-
sumer goods, food-processing, luxury
This marketing-oriented Big Data goods, retail and insurance thus have
project has helped launch a true 1 access to an abundant wealth of ex-
to 1 marketing offering on a mass ternal data.
market by combining software in-
telligence with human action. Considered as unstructured, this
data is mostly text data, with some
This is one of the most ambitious video and photo content. It is abun-
projects undertaken in Europe in dant but, by nature (comments, opin-
the field of tourism, a textbook ions, etc.), has limited intrinsic value.
case when it comes to digital The weak signals that it generates
transformation. And it was car- can only gain in relevance through
ried out by Business & Decision. cross-referencing with other data19.

The problem that companies face is


that harnessing these new types will
broaden their field of application. Let
us take for example facial recognition,
which is a top priority for many Inter-
Since the use of SIM cards, bluetooth net giants. The value creation poten-
connections or any other protocol is tial of applications associated with this
not restricted to individuals, the In- data justifies the huge investments
ternet of Things will also generate a currently being made in the field. As
considerable amount of data in many soon as this data is mastered, con-
sectors: automotive, health, energy cerned companies will have to update
supply all their repositories and many of their
marketing applications if they do not
According to Michel Lvy-Provenal, want to lose ground to new entrants
the estimated number of connected or competitors who have been quick-
objects in the world - mainly com- er to invest in the area.
puters, telephones and tablets is 5
billion; a number that is expected to
rise to 15 billion in 2015 and 50 billion
in 202017. We should also mention
the contactless payment surge and
the arrival of iBeacon that will further
boost the trend.

17 19
See article on SFRs website and Michel Lvy-Provenals It should however be noted that there is still much
interview room for improvement in the field of multimedia data
18
http://encyclopedia2.thefreedictionary.com/ use.
Application+Program+Interface
FROM BIG DATA TO BIG BUSINESS - PAPER 1

The new uses created by Big Data - p.24

Example 2: Data collection on a Open Data


fleet of vehicles for a major Eu-
ropean provider of personal ser- Open pen data is a set of free, public
vice. data made available by public orga-
nizations such as the State or local
A major European provider of authorities; it can be a strategic
personal services uses Big Data choice for companies like the RATP
to better manage its fleet of ve- (Public Transportation company)
hicles. which uses it to gradually communi-
cate part of its ridership data22 .
It all starts with data collected
on the fleet and its rounds: end-
of-month odometer readings,
driver fuel card statements, data
from garage servicing, and more
recently, from sensors built into
electrical vehicles. Data until now
untapped, due to approximate re-
porting and poor reliability.

The customer called upon Busi-


ness & Decision to design and im-
plement a Hadoop20 -based proto-
type. The data mentioned above
is processed on the Hadoop plat-
form to enhance reliability, and
then fed to the Business Intelli- Figure 6. data.ratp.fr, the RATPs Open Data
Platform.
gence tool Qlikview21 to generate
eco-driving indicators that will
help improve fleet management.
Monetizing data
Pleased with the results of this
first attempt, the operators IT Whenever we talk about internal
Dept. is thinking about recom- and external data, it is important to
mending this technology to other keep in mind that this new data can
Group entities. be monetized: for instance, is geolo-
cation data captured by a telephone
operator valuable to sectors that
have a close interest in customer
mobility (insurance, tour operators,
etc.)?

Data generated by an Internet con-


nection home router, for example,
can provide nominative information
on who is watching what advertise-
ment at a given moment; data that
is extremely precious to advertisers.

20 22
For more information on Hadoop and Map Reduce, On the gradual conversion of the RATP to Open Data, you
see page20. can visit Le Mondes blog for more examples of what can be
21
See website : www.qlik.com/fr achieved using this data.
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.25 - The new uses created by Big Data

The question is how will this emerg- almost as soon as they are created.
ing data monetization market
self-regulate? Over time, every entity
can become a supplier and consum- This is the context in which the most
er of data. Like in the energy sector, serious information privacy issues will
sophisticated systems will have to arise. Information is more valuable
be created to circulate and sell data when it comprises of data from differ-
amongst all these potential players ent environments. For instance, when
wearing several hats. I buy a plane ticket, the data generat-
ed is precious for hotels, insurers, car
Full-scale data cross-referencing rental agencies, etc. And as time pass-
es, the information loses its value. The
From the above, we can conclude two ultimate goal is thus to understand
things. how to use data within best practices
guidelines, legal constraints and the
On one hand, that data variety and context of an optimal customer path
volume are a reality that will impose in order to make the most of it. But
itself during the coming years. how can the customer monitor or au-
thorize these data exchanges?
And on the other, that one same data
item can serve different information Is it possible to harness all this data
uses and needs, internally within the without a specific Big Data approach?
company or within the context of col- (Bearing in mind that the Big Data
laboration with other companies. approach is not just technological but
has four components: technological,
At this stage, awareness of two com- methodological, legal and social.) Yes,
plementary facts is important. maybe so, but it would be at the cost
of substantial hardware, software and
There will be a surge in data cross-ref- human capital investment.
erencing. Cross-referencing relevant
data for controlled use can create op-
portunities; this has been shown us-
ing 2 types of data: CRM and Internet
browsing. But extending cross-refer-
encing to other data sources is equal-
ly relevant as it can potentially create
valuable information.

In fact, it can be argued that the mo-


ment weak signal-carrying unstruc-
tured data is integrated into your
system, multiple cross-referencing be-
comes a must: checking the relevance
of data of this type against other data
of similar type multiple times will in-
crease the quality of the information
produced.

The data cross-referencing timescale


is however getting considerably short-
er and for some uses, quite close to
real time. Indeed, some captured data
are so volatile that they must be used
FROM BIG DATA TO BIG BUSINESS - PAPER 1

Architectures and algorithms - p.26

CHAPTER 04
Architectures
and algorithms

Big Data hardware Whenever technology evolved, a mi-


architecture specifications gration process to validate the way in
which programs were run and their
Massively parallel performance had to be executed and
computing this had a cost: scalability was thus
only relative.
Massively parallel technology has
been on the agenda for 20 to 25
years. Over time, computer engi-  loud Computing infrastructures
C
neers have come up with various (IaaS 23) made Big Data possible
ways to deal with technological prob-
lems: vector machines, mainframes, Everything changed with the advent
servers, addition of an increasing of Cloud Computing. Now, the cus-
number of processors, and today, tomer can easily scale (Clouds elas-
the use of processors that have an ticity principle) since machines have
increasing number of cores or of ma- become universal and extremely
chines grouped in clusters. simple.

Nowadays, parallelization is the stan- This infrastructure standardization


dard solution. In the past, massively has helped perform multiprocessor,
parallel computing required sizeable parallel calculations using easily ac-
investment (with 8 to 16 proces- cessible standard operation systems
sors, you were spending roughly the and machines. Computing power has
equivalent of $100,000 for hardware not changed, but the scaling practice
that would only last 3 years before has.
becoming obsolete). If you wanted to
add new processors later, customers
were at the mercy of their suppliers
once initial investments had been
made.

23
IaaS: Infrastructure as a Service, i.e.
the ability to purchase remote infrastructure
and to use it on demand, as is done for
software (Software as
a Service).
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.27 - Architectures and algorithms

In the old days, if scaling was required the same algorithm and are not
but you did not the have the neces- really compatible with the Map/
sary budget, you had to revise targets Reduce25 method. Map/Reduce
downwards. can certainly be used to create
a scoring model, but updating
Now, Cloud computing has made Big behavioral categorization, for in-
data possible. With this easy access stance, would require that data
to scalability, virtual machines can be be re-centralized all throughout
installed on systems developed and calculation.
designed for parallel processing.
This somewhat restricts Hadoops
capabilities as regards process-
Big Data technology ing, namely marketing-oriented
immersed in the Cloud processing, but does not pre-
clude it. Indeed, all heavy pro-
cessing can be executed in batch
Facebooks Cassandra, for instance, is a mode in SQL bases that can be
standard database especially designed used to back up Hadoops more
to be integrated into clusters managed real-time oriented bases.
in the Cloud - proving once again that
the Cloud is indeed the catalyst for Big It should be understood that var-
Data in terms of technology. If required ious base structures will have to
to, Cassandra can populate additional coexist since, as we speak, there
machines and data population can be is currently no storage model that
performed by sending a simple SMS can single-handedly provide the
to a system machine that will trigger a solution to Big Data.
server administration action.

What holds true for Cassandra also


applies to Hadoop which is a software
framework 24 designed for massive
parallel processing; quite logically, the
platform on which it runs is a massive
parallel one. Big Data software architecture
specifications

Big Data software specifications are


a function of the above. There was
a time when parallel computing was
only accessible to a small community
of highly skilled scientists, develop-
Is Hadoop the ers and experts, in the field of video
solution to Big Data? games.

The scaling capacity of the Ha- Parallel computing is now


doop model is undeniable. But widespread
does this mean that it can meet
all of Big Datas challenges? We Today, things have totally changed:
can safely contend otherwise. Cloud Computing, and tools like Map
Reduce, Hadoop and Cassandra, all
Some statistical operations, like massively parallel, have brought the
for instance, determining the typ- whole IT sphere face to face with the
ical profile of customers stored fact that it needs to understand par-
in a database require that the allel, or indeed distributed, computing
whole base be accessed within developments. We have thus stepped
into a transition phase where comput-

24 25
A software framework is a set of methodologies and For an explanation on Map/Reduce, see page 19 of this
tools associated with a programming language. document.
See: http://fr.wikipedia.org/wiki/Framework
FROM BIG DATA TO BIG BUSINESS - PAPER 1

Architectures and algorithms - p.28

er engineers have no choice but to ac- Step 2: Reduce to group pieces


quire these new skills.
Each node uses the second step (Re-
Any developer intending to work with duce), during which calculation is per-
the Web, real-time reporting, twitter formed independently (Shuffle) and
or sentiment analysis must under- uploaded to the machine which or-
stand the nature of a distributed algo- dered the operation, until everything
rithm, have applied mathematics and returns to the initial node.
digital analysis skills and know what
the algorithm is. A few years back,
algorithm classes were optional in
school, but now they are unavoidable
and mandatory.

Map Reduce: a two step model

Map Reduce is a programming mod-


el, an algorithm that is mainly used to
process large volumes of data distrib- MAP SHUFFLE REDUCE
uted amongst clusters or parallel ma-
chines. The process consists of two Figure 7. How Map Reduce works (Freely
inspired by a diagram on the sqlauthority.com
steps: blog)

Step 1: Map to divide the problem into


pieces This method is used to break down
problems, either into parts small
The problem is first broken down enough to be processed by an algo-
into big chunks. Each machine is as- rithm, or according to machine com-
signed to a particular sub-problem. puting capacity.
This machine will repeatedly split
the sub-problem again into sub-sub- All results are finally uploaded back
problems and so on, until we reach and give the final calculation. This is
the point where each machine is pro- a common data processing algorithm
cessing a really small part of the prob- principle.
lem.

The methods limitations

However, some problems cannot be


solved using this method. In which
case, they have to be reformulated
or another algorithm of similar type
must be found to process them.
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.29 - Architectures and algorithms

Big Data database Column-oriented databases


specifications
There are also what are known as co-
We usually store all our data in the lumnar databases which do not store
same way, without thinking about its data in rows, but will allow you to se-
relevance, in relational databases of lect a particular column according to
row type. the algorithm that you plan to use. In
Business Intelligence, a snowflake or
Row-oriented databases star schema is chosen for data ware-
houses to expedite search operations
As the name implies, in this type of in the database.
database, data is saved in rows (this
is the case for instance for all Oracle, This is highly compatible with colum-
dB2, SQL, SQL Server-type databases). nar databases (with products like
Generally, this is due to lack of option SAP and Sybase IQ) or in the Big Data
or poor performance or maintainabili- sphere, with Hbase26, which is a co-
ty issues. In other words, reasons that lumnar database.
have nothing to do with the problem
to solve. The row format is also called
table format and looks somewhat Document-oriented databases
like Excel spreadsheets.
Finally, there are document-orient-
Product Description Price per kg Service duration ed databases (example MongodB) in
which only key/value pairs are loose-
in euros in years

1 cabbage green vegetable 1 2


ly stored. They are highly unstruc-
2 carrot orange vegetable 10 5
tured databases in which everything
tasteless
3 turnip
vegetable
15 10 is stored unsystematically and data
pipeline research algorithms use the
Figure 8. In a row design, even specific, a value databases specific features to im-
for each column; example here of a product
database prove performance.

Twitter keyspace
PRODUCT DESCRIPTION
1 : Statuses CF
cabbage green vegetable
key Columns
"1" "text": "Nom nom nom" "user_id": "5"
key
"2" "text": "@evan Zzzzzz" "in_reply": "8" "user_id": "5"
DESCRIPTION
PRODUCT
2 orange
carrot : Status Audits CF
vegetable
: Status Relationships CF

PRODUCT DESCRIPTION : Users CF


3
turnip tasteless vegetable key Columns
"5" "screen_name": "buttons cat"

: User Relationships super CF

ke Supercolumns
Figure 9. In contrast, in a columnar database, "5" "user_timeline": "2": "" "1": "" "home_timeline": "8": ""

as in example above, the number of columns can


vary for each record.
Figure 10. Evan Weaver describes how
Cassandra processes data storage in its key/value
base.27

26
For more information visit: http://www.journaldunet.com
27
http://blog.evanweaver.com/2009/07/06/up-and-running-
with-cassandra
FROM BIG DATA TO BIG BUSINESS - PAPER 1

Architectures and algorithms - p.30

No ready-made databases social network interaction analysis.

Databases must be selected accord- This is one of the complex features


ing to needs. In a relational database, and unique characteristics of the Big
a search operation or selection of Data world: there are many tools but
data corresponding to a given field - the community has decided to favor
for example in a customer database, those tools that are highly specialized
looking for all customers named Mi- and simple, not to say simplistic, and
chael - has a very low cost. the best in their category.

However, if I want to perform a lot of


insert with join28 operations be- Big Data architecture, specific
tween a customer database and a to Big Data?
past purchases product database to
generate statistics, then the perfor- The market answered that one. Com-
mance is quite poor. The same task panies have already started to adapt
with a columnar database would give some of these architectures com-
almost instantaneous results as it is ponents to improve their ISs perfor-
designed for such operations. mance. The aim seems to be isofunc-
tional i.e. to do the same thing in a
better way and for less. Better mean-
Many tools, each specialized in ing faster, closer to real time, and for
one field less meaning by taking advantage of
the large number of innovative com-
While row-oriented or coloumn-ori- panies that challenge (before being
ented databases would struggle to bought?) the well-established leaders
deal with full-text searches, MongodB of the new technologies sector. The
would yield extremely fast results. main targets are the most expensive
and complex systems: ERP, CRM, BI,
Database selection is thus dependent etc. The initiative however is purely
on intended use. However, there are technical.
databases that exist today that try to
combine all of these capabilities.

Some 50 to 110 hybrid versions are


in fact regularly used and selected
based on the problem to solve, not to
mention new formats like the graph
format developed by SAP InfiniteIn-
sight29 that is specifically adapted to

28 29
Insert (SQL): http://fr.wikipedia.org/wiki/Insert_ Formerly KXEN: See http://en.wikipedia.org/wiki/
(SQL) join: http://fr.wikipedia.org/wiki/Jointure_ KXEN_Inc.
(informatique)
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.31 - The Big Data jobs

CHAPTER 05
The Big Data jobs

MATHS
MACHINE
HACKING LEARNING & STATISTICS
SKILLS KNOWLEDGE
DATA
SCIENCE

DANGER TRADITIONAL
ZONE ! RESEARCH

SUBSTANTIVE
EXPERTISE

Figure 11. Data science (according to Drew Conway) 30 is at the intersection of substantive expertise, hacking skills
and math and statistics knowledge. It is only by combining those expertise areas that the pitfalls described by
Conway will be avoided. It is definitely time to start training our experts

30
Drew Conways Venn diagram on Data science: http://
drewconway.com/zia/2013/3/26/the-data-science-venn-
diagram
FROM BIG DATA TO BIG BUSINESS - PAPER 1

The Big Data jobs - p.32

The return of EIM (Enterprise As a result, the companys data (data


Information Management) is a corporate asset in the sense that
it has intrinsic value) will increase in
EIM is back on the agenda. Diving value.
deep into the subject of Big Data for
the purposes of this white paper led The challenge in implementing such
us to take another look at this Gart- projects lies in the fact that the or-
ner-created notion, and made us take ganization needs to change to reflect
a step back from Big Data. EIM, no the new data-centric approach; most
matter how old-fashioned the term, companies will find out about the
is back on the table. quality of their data during the pro-
cess and based on the observation,
Data-centric initiatives in companies set up governance. Though spon-
are often a byproduct of ERP imple- sored by top management, the initia-
mentation (SAP, JD Edwards, etc.); tive will be driven by operational staff.
their aim being to deal with any aris-
ing Master Data Management (MDM)
issues. Valuing data by dedicating it to
business
Customers are now more mature
with regards to the subject. They ful- Marketing aside, the value of this data
ly understand the need to manage will have to be proven. And this is
master data (maturity remains how- where the vision (What?), the strategy
ever highly relative since master data (How?) and the metrics (How much?)
management software packages are will come in.
more than ten years old). But even
though they have started focusing on The aim of the above-mentioned
the area, they have no idea how to get governance is to restore business
started. functions control over data. The
problem most often encountered
with customers, is that they consider
How to get started? the data initiative to be primarily
an IT concern. As companies grow
Implementing a data-centric initiative, in maturity, data responsibility will
whether MDM or Big Data, requires have to be returned to departmental
a rigorous approach to data. Which functions. The true role of IT is to
implies the following: ensure that people who have control
over data spend time analyzing it
First, that you know your data and rather than preparing it.
have identified all the information
repositories in a given scope.
Complementarity with Big Data
Then, that data is both quantitative
and qualitative (MDM creating The above introduction will be helpful
through master data a single source in understanding the new professions
of truth in the information system). created by Big Data which, unlike
MDM applies to product data, those associated with the previous-
customer data, HR data ly mentioned MDM, are not IT but
business related. In its classification
Last but not least, that governance of new Big Data jobs, Gartner clearly
is set up: which will translate into a demonstrates that they are business
statement of duties and responsibili- professions and not IT ones.
ties for the various players (inside and
outside of the organization) involved
at each data lifecycle stage.
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.33 - The Big Data jobs

The new Big Data jobs Where no CDOs are appointed, the
IT Department takes care of these
For the sake of simplicity, we will list activities.
4 categories of jobs which, though
sometimes not directly linked to Big
Data, have become visible and ap- A communication job is
pealing thanks to the data awareness essential
created by Big Data :
The CDO is also tasked with setting up
- The CDO (Chief Data Officer) an operational organization and gov-
- The data Steward: administers data ernance for the Governance Board;
- The data Scientist: analyzes data as this term can be somewhat daunt-
using complex statistical and data ing, it is usually replaced by Gover-
mining tools nance council.
- The data Analyst: analyzes data in
light of specific business needs Communication actually plays a key
role in these new jobs. Indeed, all
1. T
 he CDO the people holding those positions
(whether it be the CDO or the others)
This occupation is still quite rare, but will need to communicate a lot to jus-
the concept is gaining ground and is tify any action taken based on data
bound to spread. He is a company considerations.
high-ranking executive (C-Level) and
takes part in the executive committee. There will be an inevitable evolution of
He has multiple roles: the landscape in the next five years.

First, contributing to corporate The one thing unclear is whether the


strategy by providing data, managing CDO is a ladder position. In fact, this
data and maintaining its quality level. will depend on the IT Department
(CIO) and the way the role is fulfilled,
Then, sharing data insight internally namely on the persons capacity (or in-
(in big companies, every entity has capacity) to move beyond infrastruc-
repositories but no one has a view ture-related tasks. Nonetheless this
across the whole organization) and remains a grey area.
optimizing key business processes
through data consumption.

And finally, building a multi-skilled


team to achieve his objectives.

The CDO has enough power to


override resistance to change and
implement change strategies in the
company. And that is not all: the
CDO has an external dimension too,
namely in cases where the company
shares data with external parties. He
also intervenes in highly standardized
and/or regulated sectors (such as GSI
standards in retail for example).

CDOs can play an active role regarding


these standards - on behalf of a group
of stakeholders in a given activity
sector for instance - by implementing
the standard, sharing data, etc.
FROM BIG DATA TO BIG BUSINESS - PAPER 1

The Big Data jobs - p.34

2. The Data Steward nal experience.

He has a certain knowledge of data 4. The Data Analyst


and works with it on a daily basis,
not necessarily in full-time capacity. The data Analyst also adds value to
It is the lowest job in the new jobs data. He receives part of the Scientists
hierarchy, he is a doer. He must be data findings and cross-references
accountable to the CDO as he is part them with other reports and data to
of a community and does not work which he has access within the context
alone. Big companies will typically have of his job. He uses dashboarding, infor-
one data Steward per area. In a matrix mation viewing and information explo-
organization, a business dimension ration tools that are somewhat similar
will be added. Data Stewards are in to those he uses for Business Intel-
charge of implementing the strategy ligence, but applies them differently
in the field, applying data governance based on the data made available to
as set up by the CDO and enforcing him and business challenges that need
it; and likewise for best practices and to be met.
lifecycles.
The data Analyst is not a technician, he
3. The Data Scientist is a business professional with a keen
understanding of data. Let us take
The data Scientist is the one who adds the example of this Swiss customer
value to data. He has the tools to pro- (Givaudan) specialized in fragrances,
cess weak data (provided by the Data who must reassure customers with
Steward. Big Data may be recent, but regards to product reliability and as-
the challenges associated with it are sociated risks - which in this context
not. However, it is only now that tech- is a regulatory obligation. The use of
nological advances have made pro- data will help stabilize and sustain the
cessing huge volumes of data, in real activity. Data Analysts will use existing
time possible. The data Scientist is a traditional Business Intelligence tools
multi-skilled expert. He masters sta- but complement them with Big Data
tistical and datamining tools and can type solutions to do so. They are the
manipulate data in whichever way he ones who will consolidate the various
pleases; he knows the sector in which sources of data and work on method
he operates sufficiently well to orient evolution to simplify and improve pro-
research according to its challen- cesses.
ges; and finally, he has an intimate
knowledge of processes that helps
him ask the right questions and figure From punishment to
out possible answers. career prospect

As an example, customer turnover Working on data will help optimize


or retention in telecom companies business processes and improve
is a critical issue since competition business key performance indicators
is severe and product differentia- (KPIs).
tion often too low. To understand its
mechanisms, the data Scientist needs These new jobs now offer career pros-
to master the sectors specific charac- pects and are no longer considered a
teristics, the challenges associated punishment. Traditionally, assigning
with churn and the statistics and da- someone to any of these positions
tamining tools typically used by these was synonymous to sidelining them.
businesses. This is no longer the case.

As a specialist in their field, the data Top universities (HEC, EN- SAE, Essec)
Scientist is someone who has ad- are now providing training courses on
vanced training and proven professio- data-related subjects. The fact that
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.35 - The Big Data jobs

HEC has incorporated a data dimen- Big Data cannot be considered out-
sion into its MBA program is in itself a side this reality event if some sources
strong signal since this course is one (Facebook, Google, Amazon, etc.) have
that is attended by people who are a separate existence. There is intelli-
well on the way to occupying key posi- gence in the links between these data
tions in their work organization. and this is what will transform, and not
increase the datas value.

The relationship between Retail illustrating example


MDM and Big Data
The current marketing trend has
At the 2013 Enterprise Information moved away from multi-channel
Management Summit in London, or- to omni-channel. But a successful
ganized by Gartner, the analyst firm omni-channel strategy requires in-
insisted on the need to extend the depth customer knowledge, to enable
mature and structuring approach of personalization, MDM, to better know
MDM to semi- or unstructured data; the customer, BI and finally social
this is what will link MDM to Big Data. networks.

MDM and Big Data are perfectly com- If the retailer has a sound knowledge
plementary. If Big Data is synonymous of their products, provides relevant
to proliferation and chaos (often crea- information, irrespective of channel
tive), then MDM is what will bring or- used, and has up-to-date stock
der to it by structuring something that information (through BI and links
is not, categorizing and organizing the namely with ERPs), he can personalize
phenomenon. customer offering by linking these
various levels of data.

Illustrating Example

How to develop and deliver a 360


customer view, the ultimate marke-
ters goal? To do so, we need several
types of master data managed by the
MDM: namely the customer (social se-
curity number, Siren if it is B2B, e-mail
address, other contact details, etc.). MDM TRANSACTIONAL
DATA REFERENCE DATA FACT
Then, the customers history, segmen-
tation: this is Business Intelligence (BI).
Meaning that we will establish rela-
tionships between master data and
relational data, rather than the factual BIG DATA
data being customer purchases. BEHAVIORAL DATA

Big Data will thus enrich the initial as-


set with information on the customers
behavior, contacts with various brands Figure 12. 3 types of data and the
and networks in general, namely combination of the 3 areas
through social networks and media.
All these different types of data are
thus required for a comprehensive bu-
siness view.
FROM BIG DATA TO BIG BUSINESS - PAPER 1

Big Data or Big Brother? - p.36

CHAPTER 06
Big Data or Big Brother?

Passions usually run high when it co- Heads: the hope for a dynamic
mes to Big Data; it seems you either sector that will boost the whole
hail the phenomenon as the future economy
of computing and a new economic
order or denounce it as the orwellian During a recent work session on eco-
drift of a scientific society that has nomic issues organized by French
spun out of control. newspaper Journal du dimanche and
cosponsored by Business & Decision,
A dispute not limited to the court of Philippe Oddo, from Oddo bank, de-
public opinion since the European clared by way of introduction: Big data
Union ruling against the giant Google is about collecting and processing in-
in June 2014, which was highly symbo- formation to better anticipate future
lic of the brewing unease in a society trends in all sectors, namely in the
where an extremely active minority field of financial analysis. A preamble
of users have launched a democratic that leaves little doubt to the fact that
debate - one we absolutely do not in- Big Data is not only the future of high
tend to avoid, or even challenge here. tech, but also that of numerous more
After all, without going as far as swee- traditional sectors. Such a statement
ping everything in its path, the advent from a business professional is a very
of Big Data did change the economic strong signal indeed.
landscape and emphasize the moral
and ethical duties of companies, for Tails: the privacy debate
which social responsibility has moved
from choice to clear imperative. The subject is not only at the heart
of the economy (for the Press which
In a way, the Big Data debate is remi- has access to a huge volume of data,
niscent of the Web debate which took according to Denis Olivennes), but
place at the beginning of the 90s, at also at the heart of the debate on
the onset of the phenomenon. privacy. Within the context of the

Open Internet Project 31, a terrifying

31
JDD article published on 8 February 2014: http://www.
lejdd.fr/Economie/Entreprises/Laurent-Alexandre-La-
strategie-secrete-de-Google-apparait-652106
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.37 - Big Data or Big Brother?

presentation was made by Laurent Probably Big Data itself


Alexandre (the surgeon who foun-
ded Doctissimo) on the totalitarian Big Data did not create the priva-
goal of ITs dominant players, with a cy problem; but it certainly made it
strong focus on Google. worse.

Dj-vu all over again And though technology is often


singled out as the prime culprit for
Innovation stirring up extreme making Big Data dangerous, it is pro-
reactions is not a novelty - in High bably also the solution to the pro-
Tech or other fields. In fact, the first blem.
known example of this dates back
to 19th-century UK, with the Lud- The network, for example, provides
dites reaction to the introduction of a browser that scrambles data; today
weaving looms. this can be considered as hacking but
tomorrow, there will be a market for
All technological leaps, no matter this.
how small, give rise to a wave of
techo-scientism and a fierce and of- Technology and business can sur-
ten irrational opposing movement. pass regulation. In a not-too-distant
future, maybe we will pay to protect
Ignoring ethical issues is however our data.
not an option. On the contrary, it is
up to companies to take the lead and Indeed, it is highly possible that sal-
remove any ambiguity on the matter vation will not come from the State,
for users, and its more severe critics. clearly unable to regulate tools with
a legal paraphernalia that is to slow
and rigid to adapt to the ever-chan-
ging IT sphere.

The solution to privacy issues is

The Open Internet Project (Openly anti-Google site and association)32

32 Google by way of its Right to be forgotten decision.


These subjects are already extensively addressed in the
public sphere and are, above all, a debate of opinion. Another subject about which passions run high. These
We are thus not going to deal with them here. It should are mentioned here but will not be delved into in detail.
also be noted that the European Union has already
taken action against giants of the Web and namely
FROM BIG DATA TO BIG BUSINESS - PAPER 1

Big Data or Big Brother? - p.38

Espionage and Big Data have In the midst of this debate, adverti-
nothing in common sers cannot play the neutrality card.
Companies must choose a visibly
Last but not least, is the conflation strong stance when it comes to
between Big Data, Big Brother and ethics and, if possible, do so proac-
major social networks (not to the tively. Above all else, a clear divide
mention the NSA32) often evident must be imposed between anony-
in the general publics mind and the mous and statistical data use (even
sometimes hasty criticisms leveled at for personalized recommendations)
Big Data. and spying on peoples private lives.
At the end of the day, the difference
between the two is not so much tech-
The use that these players make of nical as it is human and ethical; and
Big Data must not be confused with likewise, any solution to the problem
the goals of companies who only will have to be human and ethical34 .
wish to use Big Data to optimize sales
and performance without however There is no middle ground when it
disrespecting the customer. This last comes to Big Data: either the adver-
category includes the vast majority of tiser adopts an ethical approach to it
Big Data users and is the one which or he intrudes, spams and oversteps
interests us here and in our daily his boundaries; with the obvious im-
work. plications on his reputation if this is
publicly revealed and comes under
Moreover, we are faced today with a public criticism.
double paradox. One the one hand,
users want relevant and personalized But all things considered - and once
information but do not want to be the ethical issue is resolved beyond
watched. Evidenced by the fact that, ambiguity by advertisers - we cannot
for some time now, consumers have dispute the fact that customer ex-
been treating any message not to perience improvement depends on
their liking as spam. in-depth customer knowledge; so,
ultimately, all businesses will turn to
And on the other, users want to Big Data.
protect their privacy but display their
whole life on social networks. This
even led Mark Zuckerberg to, albeit
a bit hastily, conclude that there was
no longer such a thing as privacy 33.

33 34
See. http://bit.ly/privateZuck for his speech on the See Orange Groups code of ethics, clearly in line with
subject at the end of 2009. This sentence sparked a lot this: http://oran.ge/SMYku4
of reaction on social media and elsewhere, compelling
Facebook to prove time and again that they were
respecting peoples private lives; they never really
managed to convince anyone of this.
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.39 - How to convert Big Data into Big Busine$$

CHAPTER 07
How to convert Big Data
into Big Busine$$

What to remember about


Big Data? is stored in some decentralized for-
mats. The use of these formats would
To conclude, what should we re- preclude certain uses.
member about Big Data? First and
foremost, that it is important to un- Without going into extremes, Big
derstand what it is; hastily and em- Data processing sometimes takes
phatically deciding that nothing has time, a timescale that increases ex-
changed, or on the contrary, that ponentially with the volume of data
everything that has is wrong. In rea- to process.
lity, this is far from the case.
For satisfactory response times, a
The first step is to once and for all compromise must often be found
abandon some myths: analyzing the between the amount of data to take
whole of Big Data is a utopian dream. into consideration and the proces-
Similarly, the idea that you must sing deadline. Keeping some inter-
store everything to someday use it mediate calculations up-to-date in
is highly unrealistic. business oriented warehouses can
also help resolve the problem and
To begin with, storing useless data is bring computing time down to ac-
costly, even more for Big Data pro- ceptable levels.
jects than for traditional Business In-
telligence projects. But more impor- If you are able to integrate all these
tantly, storage type must be specified facts, then you will be able to convert
right from the start, with intended Big Data into Big Business.
processing in mind. As previously ar-
gued, there are statistical operations
that are impossible to perform if data
FROM BIG DATA TO BIG BUSINESS - PAPER 1

How to convert Big Data into Big Busine$$ - p.40

The 10 key points 4. T


 his is a major phenomenon, as
important as CRM in its time
What are then the 10 points to re-
member to guarantee a successful Almost everyone who criticizes Big
Big Data project? Data will tell you that it is not so-
mething new but a purely incidental
1. T
 he initial objective phenomenon, a fad that will die out
is important as quickly as it started. This is mi-
sunderstanding history and techno-
If your objective is not clearly de- logy marketing as Big Data is rooted
fined, you run the risk of selecting the in years of effort and trial and error
wrong tools and penalizing your pro- (marketing, finance, management)
ject in terms of time and resources that are finally bearing fruit thanks
consumed. Focus on use case ma- to advances in technology and the
turity and data identification rather development of its uses (infrastruc-
than investing in technical infrastruc- tures, software, widespread use,
ture. In the meantime, use Platform social Web predominance and om-
as a Service. nipresent networks). All this means
that fundamental business challen-
2. The uncertainty notion ges are finally being met after 15 to
20 years of work and hit and miss.
One of the most significant develop- A sure sign that the field is reaching
ments associated with Big Data, as maturity - this convergence of means
compared to more traditional work will no doubt pave the way for spec-
on data, is the management of un- tacular breakthroughs.
certainty. This does not mean that
there is no planning involved or that 5. B
 ig Data has a significant
Big Data projects do not require impact on organizations
preparation; it is quite the contrary.
What it does mean and this is parti- Not only because new jobs have
cularly true for marketing is that Big been created for which there is still
Data projects must account for this no truly adequate training process.
uncertainty right from the start and But also because organizations
be based on iterative models with, if functions will be fundamentally re-
necessary, self-learning capabilities. designed: knowledge, approaches,
decisions and methods are changing
3. B
 ig Data requires multi-skilling drastically. Marketing, for example,
will no longer be done the way we
Big Data is not for robots. It is at the are doing it now, even if evolution will
intersection of automation, technolo- take some years. If you are part of a
gy and human intelligence. To make business organization, start taking an
the most of it and obtain results that interest in data now because your
will meet your expectations, new job is about to change.
profiles must be cross-disciplina-
ry: IT, database, statistics, artificial
intelligence and last but not least,
business knowledge (marketing, fi-
nance, supply chain, etc.).
PAPER 1 - FROM BIG DATA TO BIG BUSINESS

p.41 - How to convert Big Data into Big Busine$$

6. T
 he technology needed to New knowledge and skills will have
access Big Data is now available to be acquired; and since Big Data is
constantly reconfiguring itself lear-
Big Data is not something of the fu- ning will be continuous.
ture, it is already accessible here, to-
day even if its landscape is evolving 9. Y
 ou cannot escape Big Data,
extremely rapidly. Many technologies and all businesses will even-
used for Big Data have in fact been tually embrace it, as is the case
invented by the giants of the Web for the internet
(Google and Yahoo are two pioneers)
and are now available to anyone who All true innovations always have a
can implement them. more or less similar adoption curve,
very well described by Geoffrey
7. D
 ata is probably the less known Moore in Crossing the Chasm, his
and less understood raw 90s best-seller.
material
Big Data is at an inflection point
The difference between an informa- where adoption is starting to spread
tion system (all the processes and or- beyond the elite circle of Web giants
ganizations between data, its creation, and social media that invented it. Ap-
life, processing and archiving) and a plying its techniques and approaches
computer system (the hardware and to companies in more traditional sec-
in particular the software that are used tors is now possible. This journey is
to run everything and process data) is just the beginning.
a classic. Data is still today a big mys-
tery to business leaders who think
that computer systems magically and 10. B
 ig Data is not just about
effortlessly transform business. real time

But data can be fickle and needs a Even if Hadoop is a major innovation,
lot of work. Its growing importance in Big Data is not just about Hadoop or
a society where computerization has real time. Some of its uses are indeed
pervaded all sectors is compelling adapted to a large volume of data
users to change their perception of and may require, depending on the
data. There is however still a lot to do case, remote processing. It is a savvy
for real change to occur. combination of various approaches
and techniques that will ensure Big
Data project quality and results.
8. M
 anaging a Big Data project is
different It is only by integrating these various
points characterizing true Big Data
Big Data is neither a fashion trend, that future companies will be able to
nor a buzz word for datamining. It convert Big Data into Big Business.
has its own vocabulary, professions,
methods, algorithms and specific
project approaches.

As mentioned in this white paper, a


Big Data project has specific charac-
teristics. Above and beyond the tech-
nical approach, it demands distinc-
tive methodologies, an appropriate
legal framework and a reliable way to
measure social impacts.
LIVRE 1 - LIVRE BLANC DU BIG DATA AU BIG BUSINESS

Chapitre 1 - p.01
LIVRE 1 - LIVRE BLANC DU BIG DATA AU BIG BUSINESS

Chapitre 1 - p.01

Contributors to this white paper :


Patrick Bensabat, Didier Gaultier,
Michael Hoarau, Bruno Laug
et Yann Gourvenec

For more information, please visit our


specialist blog on Big Data.

blog.businessdecision.com
LIVRE 1 - LIVRE BLANC DU BIG DATA AU BIG BUSINESS

Chapitre 1 - p.01

New disciplines at the coming to-


gether of statistics, technology, da-
tabases and jobs, Big Data is not a
trend. It will profoundly change the
world of business, and breakout of
the closed circle of web and social
media giants who invented them.

Eventually, all companies will adopt


the use of Big Data, in the same way
they have adopted the use of the
internet. This white paper aims to
identify clearly, without jargon, im-
pacts and uses of Big Data.

Coordinated by Business & Deci-


sion, this white paper is written by
professionals in the treatment and
management of information recog-
nized for thier expertise in business
intelligence, marketing and e-busi-
ness.

Das könnte Ihnen auch gefallen