Beruflich Dokumente
Kultur Dokumente
Chapitre 1 - p.01
PAPER 1 : Fad or performance
lever ?
FROM BIG DATA TO BIG BUSINESS - PAPER 1
01 03
CHAPTER CHAPTER
02
CHAPTER
04
CHAPTER
Data p.11
05 07
CHAPTER CHAPTER
The Big Data jobs p.31 How to convert Big Data p.39
into Big Busine$$
06
CHAPTER
Big Data literature abounds; this is a The aim of this white paper is to pro-
sure sign that Big Datas importance vide companies with key insights that
is strongly felt by the market as a will help them approach Big Data,
whole and across the world. Howe- not as a mythology, but as a power-
ver, even in cases where documen- ful performance optimization tool
tation is of high quality, it is usually that can be adapted to their specific
mostly descriptive in nature and fo- contexts.
cused on exaggerated, near-apoca-
lyptic concerns associated with the We hope that this will help readers
exponential growth in data and data lay, or maybe even validate, the foun-
sources. These approaches do not dations for a smooth, controlled in-
help understand the real challenges tegration of Big Data into their com-
posed by Big Data or how companies panys ecosystem.
can exploit them.
1
Especially since Big Data involves new forms of
reasoning such as, forms of inductive reasoning (see
page 8). We can quite safely refer to Big Data as a new
philosophy and as whole a revolutionary approach to
marketing.
A shift in management Paper 2 on data, Big Datas essential
thinking fuel
2 3
For more information go to "Tendances" Trends.be Although both the singular and the plural are often used
to qualify data, Big Data is mostly referred to as a mass
noun, i.e. a singular phenomenon of mass information.
Consequently, Big Data will always be referred to as a
mass noun in the singular in this document.
FROM BIG DATA TO BIG BUSINESS - PAPER 1
The objective of Big Data is to tap into coined the term Big Data6 persist,
exponentially increasing volumes of we can however trace the earliest do-
data that have become near impos- cumentation on the famous 3Vs (Vo-
sible to process, using traditional da- lume, Velocity and Variety) predicting
tabase management and information exploding amounts of data and the
management tools4, and to handle creation of a new data processing
complex data in a timely manner. back to the beginning of 2001, accor-
ding to analysis firm Gartner.
According to the works and words of
the 451 group and Gartner; the aim of It should also be mentioned that
Big Data is to achieve competitive ad- Big Data is the culmination of the
vantage through data collection, ana- Data Mining approach, popular du-
lysis and use methods that, until now, ring years 1995-2000, which itself
could not be used due to the econo- was born out of the association of
mic, functional or technical constraints two relatively old schools of thought
associated with the volumes, proces- (trends), i.e. statistics and artificial in-
sing velocity and variety of data invol- telligence.
ved.
4 6
Please view also complete definition in Wikipedias open See article on the New York Times blog:
encyclopedia which inspired ours: http://bits.blogs.nytimes.com/2013/02/01/the-origins-
http://en.wikipedia.org/wiki/Big_data
5
http://www.forbes.com/sites/gilpress/2013/05/09/a-very-
PAPER 1 - FROM BIG DATA TO BIG BUSINESS
7
Lise Gasnier on Solucom Insight
FROM BIG DATA TO BIG BUSINESS - PAPER 1
Big Data projects Last but not least, every Big Data
project includes a social component
Big Data projects typically have four that must be taken into considera-
components. tion. What is the ability of our so-
cieties and each group of people or
First, it involves Big Data technologies, individuals to accept the circulation
hardware and software. Second, and use of their personal data? To
it requires a specific methodology avoid exposing ones project and the
approach that will be briefly mentioned whole area of application to risks, it
in this document and further detailed will be up to companies to self-re-
in following publications. gulate and to legislators to adapt
to these new technology-driven
The third component is a legal one contexts and possibilities.
since a perfect command of the legal
framework associated with handled
data and intended uses is important.
PAPER 1 - FROM BIG DATA TO BIG BUSINESS
p.11 - Data
CHAPTER 02
Data
8
Examples of semi-structured data: Messages, mails, logs,
etc.); and unstructured data: photos, videos, sounds.
FROM BIG DATA TO BIG BUSINESS - PAPER 1
Data - p.12
These new types of data can be used Is there a priority when processing
to enrich other data but they can also, data; should one or more sources
in some cases, constitute the very of data be given higher priority?
heart of the information to process. The most accurate answer to this
This will depend on the particular question will depend partly on the
sector and the impacted process, as objective one wishes to achieve and
will be discussed later in the chapter partly on the data available.
on Big Data uses.
Let us look at an example of Big
Since it is clear that the new data will Data applied to marketing: how
have to be linked to existing reposito- to choose between a marketing
ries and data at some point, compa- message customization and targeting
nies should therefore, before jumping initiative, and a brand e-reputation
on the Big Data bandwagon, ensure or awareness measurement or
that the bulk of their traditional data improvement effort? The answer to
is well structured and processed, ac- this question in this case will differ
cording to application subject: trans- according to the order in which items
actional data, receipts, campaign are processed.
data, browsing data, sensor, probe,
measuring tool and statistical analy- In the first case, customers purchase
sis tool data or visit, meter and notifi- history will play a major role in
cation data, etc. analysis. Indeed, past purchases
will provide valuable information on
We should point out that Big Data is buying behavior (to whoever knows
not meant to systematically process how to analyze them) and also on
all the data available in an area. Trying brand preferences, uses, needs, etc.
to do so would be counterproductive Next, you will try to factor in brand
and would lead to risky, extremely interaction data, Web navigation data,
complex, and to be honest, pointless marketing campaign data and so on.
projects.
In the second case, if your aim is the
In the past, we have often failed to analysis of brand reputation, you will
keep in mind the age-old adage that primarily seek to interpret information
trees do not grow to the sky. In spite obtained from social networks,
of the current enthusiasm, going for forums, and more generally, anything
a constraint-free or limitless type hy- that is being said on the Web about
pothesis would be unreasonable. the brand.
p.13 - Data
9 10
For more information on the originator of the Known as sentiment analysis.
3V-concept and the numerous people who have laid 11
See chapter on the new Big Data jobs on page 21 for
claim to its invention, please see Doug Laneys article: more information on this new profession
Deja VVVu: Others Claiming Gartners Construct for
Big Data: http://blogs.gartner.com/doug-laney/deja-
vvvue-others-claiming-gartners-volume-velocity-variety-
construct-for-big-data
FROM BIG DATA TO BIG BUSINESS - PAPER 1
Data - p.14
As far as we are concerned, the testing Cases that truly require full real-time
phase is a key element of Big Data processing are actually extremely rare.
implementation methodology. This is
namely due to the maturity level of its Beyond the 3Vs: the 5Vs
4 previously mentioned components:
business function, technology, The classic characterization has now
algorithm, and data. been enriched with 2 more Vs which
we think are of importance:
Once the approach is stable, we
can launch the next phase, i.e. V for Veracity: data obtained from
standardization or run, during the ISs central applications is limited
which Big Data models are used in an in volume but controlled in terms of
automated format that is linked to the coherence and quality. Public data
IS (company repositories and data) in a associated with opinion, expression
seamless and scalable manner. or behavior, on the other hand, even
though abundant, could have been
When the build phase, which filtered or distorted. So, its use hinges
represents 80% of the work effort, is on the ability to neutralize these
set in a real-time environment, we weaknesses without modifying the
work with samples and still need to original information. Managing data
collaborate with one or more Data veracity criteria is an integral part of
Scientists to coldly process the data Big Data projects. Data reliability has
using one or several Data Mining become an essential criteria as the
software technologies (regardless of gigo13 principle applies more than
type). ever to Big Data. So much so in fact
that now the term Right Data has
When the real time dimension popped up in opposition to Big Data,
is absent, testing can be done in which is too Big and not enough
environments that are almost identical Right.
to real life.
12 13
In contrast to real-time mode, batch processing is the Gigo: Garbage in Garbage Out. For more information
processing of data that has been previously transferred visit: http://en.wikipedia.org/wiki/Garbage_in,_garbage_
to a storage space. out
PAPER 1 - FROM BIG DATA TO BIG BUSINESS
p.15 - Data
Uses
Prediction
Prevention
Personalization
Data
Velocity
Knowledge Volume
Variety
Veracity
Value
Data - p.16
p.17 - Data
3. K
nowledge: associated with
information and the processing
of data by the human mind.
Knowledge answers the how
question
NOTIFICATIONS ....................................................Action
PROCESSED DATA
DATA INFORMATION KNOWLEDGE INTELLIGENCE
RAW DATA
14
Freely inspired by the Bellinger, Castro and Mills article
at: http://www.systems-thinking.org/dikw/dikw.htm
FROM BIG DATA TO BIG BUSINESS - PAPER 1
Data - p.18
The Big Data phenomenon changes Big Data relies heavily on statistics
the traditional way of looking at da- and artificial intelligence; one of
ta-without actually making Master statistics biggest assets being that it
Data Management (MDM16) obsolete naturally adapts to the notion of data
-by introducing an imperfection and uncertainty. However this uncertainty
uncertainty notion (see text box on does not mean that it is possible to
uncertainty marketing). work with data that is of too poor
quality.
15 16
See a summary of Edgar Morins Method at: http:// http://www.piloter.org/business-intelligence/mdm.htm
fr.wikipedia.org/wiki/La_Mthode_(Edgar_Morin)#La_
connaissance_de_la_connaissance
PAPER 1 - FROM BIG DATA TO BIG BUSINESS
p.19 - Data
Field observation shows that a 95% While raw data in itself is of little impor-
robustness (i.e. a 95% chance that tance, by cross-referencing and vali-
the result is correct) is extremely sat- dating it statistically we can gradually
isfactory, and that such data can be increase its value, not through deduc-
deemed reliable. It is essential to take tion, but through induction and statis-
this uncertainty element of data and tical inference. Without going so far as
results into consideration when setting to speak of artificial intelligence (where
up a Big Data project. the machine decides which data is
valid), with Big Data, it is statistical
Uncertainty can be measured using cross-referencing of data that creates
statistical tools such as the widely pop- value and meaning. So it would appear
ular p-value. that data can lead to information that
is counterintuitive.
CHAPTER 03
The new uses created by
Big Data
17 19
See article on SFRs website and Michel Lvy-Provenals It should however be noted that there is still much
interview room for improvement in the field of multimedia data
18
http://encyclopedia2.thefreedictionary.com/ use.
Application+Program+Interface
FROM BIG DATA TO BIG BUSINESS - PAPER 1
20 22
For more information on Hadoop and Map Reduce, On the gradual conversion of the RATP to Open Data, you
see page20. can visit Le Mondes blog for more examples of what can be
21
See website : www.qlik.com/fr achieved using this data.
PAPER 1 - FROM BIG DATA TO BIG BUSINESS
The question is how will this emerg- almost as soon as they are created.
ing data monetization market
self-regulate? Over time, every entity
can become a supplier and consum- This is the context in which the most
er of data. Like in the energy sector, serious information privacy issues will
sophisticated systems will have to arise. Information is more valuable
be created to circulate and sell data when it comprises of data from differ-
amongst all these potential players ent environments. For instance, when
wearing several hats. I buy a plane ticket, the data generat-
ed is precious for hotels, insurers, car
Full-scale data cross-referencing rental agencies, etc. And as time pass-
es, the information loses its value. The
From the above, we can conclude two ultimate goal is thus to understand
things. how to use data within best practices
guidelines, legal constraints and the
On one hand, that data variety and context of an optimal customer path
volume are a reality that will impose in order to make the most of it. But
itself during the coming years. how can the customer monitor or au-
thorize these data exchanges?
And on the other, that one same data
item can serve different information Is it possible to harness all this data
uses and needs, internally within the without a specific Big Data approach?
company or within the context of col- (Bearing in mind that the Big Data
laboration with other companies. approach is not just technological but
has four components: technological,
At this stage, awareness of two com- methodological, legal and social.) Yes,
plementary facts is important. maybe so, but it would be at the cost
of substantial hardware, software and
There will be a surge in data cross-ref- human capital investment.
erencing. Cross-referencing relevant
data for controlled use can create op-
portunities; this has been shown us-
ing 2 types of data: CRM and Internet
browsing. But extending cross-refer-
encing to other data sources is equal-
ly relevant as it can potentially create
valuable information.
CHAPTER 04
Architectures
and algorithms
23
IaaS: Infrastructure as a Service, i.e.
the ability to purchase remote infrastructure
and to use it on demand, as is done for
software (Software as
a Service).
PAPER 1 - FROM BIG DATA TO BIG BUSINESS
In the old days, if scaling was required the same algorithm and are not
but you did not the have the neces- really compatible with the Map/
sary budget, you had to revise targets Reduce25 method. Map/Reduce
downwards. can certainly be used to create
a scoring model, but updating
Now, Cloud computing has made Big behavioral categorization, for in-
data possible. With this easy access stance, would require that data
to scalability, virtual machines can be be re-centralized all throughout
installed on systems developed and calculation.
designed for parallel processing.
This somewhat restricts Hadoops
capabilities as regards process-
Big Data technology ing, namely marketing-oriented
immersed in the Cloud processing, but does not pre-
clude it. Indeed, all heavy pro-
cessing can be executed in batch
Facebooks Cassandra, for instance, is a mode in SQL bases that can be
standard database especially designed used to back up Hadoops more
to be integrated into clusters managed real-time oriented bases.
in the Cloud - proving once again that
the Cloud is indeed the catalyst for Big It should be understood that var-
Data in terms of technology. If required ious base structures will have to
to, Cassandra can populate additional coexist since, as we speak, there
machines and data population can be is currently no storage model that
performed by sending a simple SMS can single-handedly provide the
to a system machine that will trigger a solution to Big Data.
server administration action.
24 25
A software framework is a set of methodologies and For an explanation on Map/Reduce, see page 19 of this
tools associated with a programming language. document.
See: http://fr.wikipedia.org/wiki/Framework
FROM BIG DATA TO BIG BUSINESS - PAPER 1
Twitter keyspace
PRODUCT DESCRIPTION
1 : Statuses CF
cabbage green vegetable
key Columns
"1" "text": "Nom nom nom" "user_id": "5"
key
"2" "text": "@evan Zzzzzz" "in_reply": "8" "user_id": "5"
DESCRIPTION
PRODUCT
2 orange
carrot : Status Audits CF
vegetable
: Status Relationships CF
ke Supercolumns
Figure 9. In contrast, in a columnar database, "5" "user_timeline": "2": "" "1": "" "home_timeline": "8": ""
26
For more information visit: http://www.journaldunet.com
27
http://blog.evanweaver.com/2009/07/06/up-and-running-
with-cassandra
FROM BIG DATA TO BIG BUSINESS - PAPER 1
28 29
Insert (SQL): http://fr.wikipedia.org/wiki/Insert_ Formerly KXEN: See http://en.wikipedia.org/wiki/
(SQL) join: http://fr.wikipedia.org/wiki/Jointure_ KXEN_Inc.
(informatique)
PAPER 1 - FROM BIG DATA TO BIG BUSINESS
CHAPTER 05
The Big Data jobs
MATHS
MACHINE
HACKING LEARNING & STATISTICS
SKILLS KNOWLEDGE
DATA
SCIENCE
DANGER TRADITIONAL
ZONE ! RESEARCH
SUBSTANTIVE
EXPERTISE
Figure 11. Data science (according to Drew Conway) 30 is at the intersection of substantive expertise, hacking skills
and math and statistics knowledge. It is only by combining those expertise areas that the pitfalls described by
Conway will be avoided. It is definitely time to start training our experts
30
Drew Conways Venn diagram on Data science: http://
drewconway.com/zia/2013/3/26/the-data-science-venn-
diagram
FROM BIG DATA TO BIG BUSINESS - PAPER 1
The new Big Data jobs Where no CDOs are appointed, the
IT Department takes care of these
For the sake of simplicity, we will list activities.
4 categories of jobs which, though
sometimes not directly linked to Big
Data, have become visible and ap- A communication job is
pealing thanks to the data awareness essential
created by Big Data :
The CDO is also tasked with setting up
- The CDO (Chief Data Officer) an operational organization and gov-
- The data Steward: administers data ernance for the Governance Board;
- The data Scientist: analyzes data as this term can be somewhat daunt-
using complex statistical and data ing, it is usually replaced by Gover-
mining tools nance council.
- The data Analyst: analyzes data in
light of specific business needs Communication actually plays a key
role in these new jobs. Indeed, all
1. T
he CDO the people holding those positions
(whether it be the CDO or the others)
This occupation is still quite rare, but will need to communicate a lot to jus-
the concept is gaining ground and is tify any action taken based on data
bound to spread. He is a company considerations.
high-ranking executive (C-Level) and
takes part in the executive committee. There will be an inevitable evolution of
He has multiple roles: the landscape in the next five years.
As a specialist in their field, the data Top universities (HEC, EN- SAE, Essec)
Scientist is someone who has ad- are now providing training courses on
vanced training and proven professio- data-related subjects. The fact that
PAPER 1 - FROM BIG DATA TO BIG BUSINESS
HEC has incorporated a data dimen- Big Data cannot be considered out-
sion into its MBA program is in itself a side this reality event if some sources
strong signal since this course is one (Facebook, Google, Amazon, etc.) have
that is attended by people who are a separate existence. There is intelli-
well on the way to occupying key posi- gence in the links between these data
tions in their work organization. and this is what will transform, and not
increase the datas value.
MDM and Big Data are perfectly com- If the retailer has a sound knowledge
plementary. If Big Data is synonymous of their products, provides relevant
to proliferation and chaos (often crea- information, irrespective of channel
tive), then MDM is what will bring or- used, and has up-to-date stock
der to it by structuring something that information (through BI and links
is not, categorizing and organizing the namely with ERPs), he can personalize
phenomenon. customer offering by linking these
various levels of data.
Illustrating Example
CHAPTER 06
Big Data or Big Brother?
Passions usually run high when it co- Heads: the hope for a dynamic
mes to Big Data; it seems you either sector that will boost the whole
hail the phenomenon as the future economy
of computing and a new economic
order or denounce it as the orwellian During a recent work session on eco-
drift of a scientific society that has nomic issues organized by French
spun out of control. newspaper Journal du dimanche and
cosponsored by Business & Decision,
A dispute not limited to the court of Philippe Oddo, from Oddo bank, de-
public opinion since the European clared by way of introduction: Big data
Union ruling against the giant Google is about collecting and processing in-
in June 2014, which was highly symbo- formation to better anticipate future
lic of the brewing unease in a society trends in all sectors, namely in the
where an extremely active minority field of financial analysis. A preamble
of users have launched a democratic that leaves little doubt to the fact that
debate - one we absolutely do not in- Big Data is not only the future of high
tend to avoid, or even challenge here. tech, but also that of numerous more
After all, without going as far as swee- traditional sectors. Such a statement
ping everything in its path, the advent from a business professional is a very
of Big Data did change the economic strong signal indeed.
landscape and emphasize the moral
and ethical duties of companies, for Tails: the privacy debate
which social responsibility has moved
from choice to clear imperative. The subject is not only at the heart
of the economy (for the Press which
In a way, the Big Data debate is remi- has access to a huge volume of data,
niscent of the Web debate which took according to Denis Olivennes), but
place at the beginning of the 90s, at also at the heart of the debate on
the onset of the phenomenon. privacy. Within the context of the
31
JDD article published on 8 February 2014: http://www.
lejdd.fr/Economie/Entreprises/Laurent-Alexandre-La-
strategie-secrete-de-Google-apparait-652106
PAPER 1 - FROM BIG DATA TO BIG BUSINESS
Espionage and Big Data have In the midst of this debate, adverti-
nothing in common sers cannot play the neutrality card.
Companies must choose a visibly
Last but not least, is the conflation strong stance when it comes to
between Big Data, Big Brother and ethics and, if possible, do so proac-
major social networks (not to the tively. Above all else, a clear divide
mention the NSA32) often evident must be imposed between anony-
in the general publics mind and the mous and statistical data use (even
sometimes hasty criticisms leveled at for personalized recommendations)
Big Data. and spying on peoples private lives.
At the end of the day, the difference
between the two is not so much tech-
The use that these players make of nical as it is human and ethical; and
Big Data must not be confused with likewise, any solution to the problem
the goals of companies who only will have to be human and ethical34 .
wish to use Big Data to optimize sales
and performance without however There is no middle ground when it
disrespecting the customer. This last comes to Big Data: either the adver-
category includes the vast majority of tiser adopts an ethical approach to it
Big Data users and is the one which or he intrudes, spams and oversteps
interests us here and in our daily his boundaries; with the obvious im-
work. plications on his reputation if this is
publicly revealed and comes under
Moreover, we are faced today with a public criticism.
double paradox. One the one hand,
users want relevant and personalized But all things considered - and once
information but do not want to be the ethical issue is resolved beyond
watched. Evidenced by the fact that, ambiguity by advertisers - we cannot
for some time now, consumers have dispute the fact that customer ex-
been treating any message not to perience improvement depends on
their liking as spam. in-depth customer knowledge; so,
ultimately, all businesses will turn to
And on the other, users want to Big Data.
protect their privacy but display their
whole life on social networks. This
even led Mark Zuckerberg to, albeit
a bit hastily, conclude that there was
no longer such a thing as privacy 33.
33 34
See. http://bit.ly/privateZuck for his speech on the See Orange Groups code of ethics, clearly in line with
subject at the end of 2009. This sentence sparked a lot this: http://oran.ge/SMYku4
of reaction on social media and elsewhere, compelling
Facebook to prove time and again that they were
respecting peoples private lives; they never really
managed to convince anyone of this.
PAPER 1 - FROM BIG DATA TO BIG BUSINESS
CHAPTER 07
How to convert Big Data
into Big Busine$$
6. T
he technology needed to New knowledge and skills will have
access Big Data is now available to be acquired; and since Big Data is
constantly reconfiguring itself lear-
Big Data is not something of the fu- ning will be continuous.
ture, it is already accessible here, to-
day even if its landscape is evolving 9. Y
ou cannot escape Big Data,
extremely rapidly. Many technologies and all businesses will even-
used for Big Data have in fact been tually embrace it, as is the case
invented by the giants of the Web for the internet
(Google and Yahoo are two pioneers)
and are now available to anyone who All true innovations always have a
can implement them. more or less similar adoption curve,
very well described by Geoffrey
7. D
ata is probably the less known Moore in Crossing the Chasm, his
and less understood raw 90s best-seller.
material
Big Data is at an inflection point
The difference between an informa- where adoption is starting to spread
tion system (all the processes and or- beyond the elite circle of Web giants
ganizations between data, its creation, and social media that invented it. Ap-
life, processing and archiving) and a plying its techniques and approaches
computer system (the hardware and to companies in more traditional sec-
in particular the software that are used tors is now possible. This journey is
to run everything and process data) is just the beginning.
a classic. Data is still today a big mys-
tery to business leaders who think
that computer systems magically and 10. B
ig Data is not just about
effortlessly transform business. real time
But data can be fickle and needs a Even if Hadoop is a major innovation,
lot of work. Its growing importance in Big Data is not just about Hadoop or
a society where computerization has real time. Some of its uses are indeed
pervaded all sectors is compelling adapted to a large volume of data
users to change their perception of and may require, depending on the
data. There is however still a lot to do case, remote processing. It is a savvy
for real change to occur. combination of various approaches
and techniques that will ensure Big
Data project quality and results.
8. M
anaging a Big Data project is
different It is only by integrating these various
points characterizing true Big Data
Big Data is neither a fashion trend, that future companies will be able to
nor a buzz word for datamining. It convert Big Data into Big Business.
has its own vocabulary, professions,
methods, algorithms and specific
project approaches.
Chapitre 1 - p.01
LIVRE 1 - LIVRE BLANC DU BIG DATA AU BIG BUSINESS
Chapitre 1 - p.01
blog.businessdecision.com
LIVRE 1 - LIVRE BLANC DU BIG DATA AU BIG BUSINESS
Chapitre 1 - p.01