Big Data Analytics - Emerging Techniques and Technology For Growth and Profitability

Big Data Analytics: Emerging Techniques and Technology for Growth and
Profitability
Sponsored by Endeca
Speakers: Boris Evelson, Vice President & Principal Analyst, Forrester &
Paul Sonderegger, Chief Strategist, Endeca
Moderated by Ron Powell
Ron Powell: Welcome everyone to our web event; Big Data Analytics: Emerging Techniques
and Technology for Growth and Profitability sponsored by Endeca. Endeca is a leading providing
for agile information management solutions that guide people to better decisions. I am Ron
Powell, I am the Associate Publisher and Editorial Director of BeyeNETWORK, a part of
TechTarget and I will be your Moderator for this web seminar.
Our presentation today features Boris Evelson and Paul Sonderegger. Boris is Vice President
and Principal Analyst at Forrester. He is the Leading Expert in Business Intelligence and he helps
enterprises define BI strategies, governance and architectures and identify vendors and
technologies that help them put information to use in business processes. Prior to joining
Forrester, he held senior positions at Citibank, JPMorgan and Pricewaterhouse. Also speaking
today is Paul Sonderegger, he has helped global organizations turn Big Data into better daily
decisions again competitive advantage. Prior to joining Endeca, Paul was the Principal Analyst at
Forrester Research focusing on search technology and user experiences.
What is the value of Big Data to your organization and what can you do about it? Those are the
big questions facing companies today. In this webinar, Boris and Paul will answer those and other
relevant questions and reveal how innovators have successfully tapped the data at their disposal
to improve customer relationships, drive growth, achieve market leadership and unlock new
revenue streams. We will also have time at the end of the presentations today for questions.
Please feel free to submit your questions at any time during the event. If you would like a specific
person to answer your question, dont forget to include the persons name when you submit it,
and now here is Boris Evelson who will begin todays presentation for you, Boris?
Big Data Analytics: Emerging Techniques and

Technology for Growth and Profitability
Boris Evelson, Vice President, Principal Analyst
October 27, 2011
2009
2011 Forrester Research, Inc. Reproduction Prohibited
Boris Evelson: Ron thanks very much for the introduction and Endeca thank you very much for
the opportunity to present together with you. So, good morning, good afternoon everyone, I know
there are lots of people dialing in from all over the world, so thanks very much for taking your time
to be here with us. You know, its a very exciting time in business intelligence, but whats
paradoxical and interesting is that its never been more exciting, I have personally been in the
business intelligence information management data warehousing business for close to 30 years
and I dont think a year goes by without a new application over these technologies, new
challenges and new technologies. It is never, not exciting times in business intelligence.
Firms use only 1% to 5% of available data . . .

What if that number doubled
Today we stay, we are standing on a cause for something, completely new and once again very
exciting, because in some of our recent surveys, some of our recent anecdotal conversations with
customers we have uncovered that most of the firms out there are only using single digit, single
percentage digits over their data and I am not even talking about unstructured data. This is just a
traditional structured data that is buried all over the place. Firms have done pretty good job
extracting that data from their financial systems, maybe their HR systems, maybe their supply
chain system and they are just beginning to scratch the surface over their sales and marketing
systems for reporting analytics, but if we look at all over the data, all of the structured data that is
stored all over the enterprise, we are really just scratching the surface and what we do know
today is that that even these solutions, solutions that only address a tiny percentage of the data
that you have, they are complex, they are expensive, sometimes they are not flexible enough and
sometimes they take very long to implement.
Innovators turn more data into more value

A hospital saves babies' lives using massive streams (100 million data
points per day) of monitoring data
A telco taps into the Facebook social groups to market friends-and-family
plans
A credit card company retains customers by understanding social
relationships
A public utility company performs sophisticated analysis on smart grid
data (1.5 trillion data points)
A pharma company is collaborating with a healthcare provider to identify

patterns in 360 degree view on drug cost/benefits
So the question that we are asking here, what would happen if that number doubled, what would
that do to the cost and the efforts, but much more importantly what would it do to the insights that
you are getting from all of the data, the possibilities are absolutely amazing. So if you go back to
20 or 30 years ago when basically all we did is just a reported on an analyzed financial data
potentially for better, for stability right you know and that may sound a bit boring but that was
pragmatic but it is absolutely night and day compared to what we are doing with data today, so
just look at some of these examples.
Well literally saving lives and we can process so much data these days so quickly that we can
make life-saving decisions based on that and the whole power of social technology is now
available for companies to understand who is friends with whom and therefore target their offers
to these individuals in a much more focused matter, in a much more focused way. Even if we are
not using social technology, just by examining the gazillion of transactions that that are out there
in a point of sale systems and clouds we can understand some social relationships even without
going to databases like Facebook and LinkedIn and Twitter and others public utility companies
are no longer just a service providers there are the analytics providers because they can monitor
on a sub-second basis what is it that we are doing with our electricity, oil and gas and water
usage and they are there literally have to more its trillions and trillions of data points every hour
and everyday and then previously disconnected analysis such as and insure this company
looking at the profitability and cost of delivering healthcare while a pharmaceutical company
looking at the health benefits over particular drug, what if they could put their information together
and understand the whole end to end lifecycle over drug, not just from the health benefits point of
view but from the financial benefits point of view.
So the possibilities are just absolutely amazing and mind-boggling, and our research uncovered
lot more over these great examples and we can aggregate all of these new technologies and
used cases for Big Data into these five used cases. Number one is exploration and machinelearning right there is obviously tons of data, coming from devices such as medical devices and
smart grid utility devices that I just described. There is operational prediction, pattern where Big
Data feeds operational predictive models, so that they can optimize these models in the real time.
We also not, no longer afraid to use the words dirty data warehouse or dirty operational data
store, because sometimes having all of the data in one logical place even before you reconcile
and cleanse it, still makes lot of sense and still provides tons and tons of benefits. Some of the
Big Data technologies allow us to do bulk loads of our data warehouses and operational data
stores much faster and last but not least sometimes its not the volume of data but its the speed
at over the data changes you know the best examples are always financial markets where by the
time you do anything in a traditional database its already too late, so you need to react to market
changes with sub-second response time so that you can execute that profitable trade when and
where it has to be executed.
Despite the hype, most firms find big data technology

useful to operate on data they already have
What types of data/records are you planning to analyze using big data technologies?
Most big data use cases hype its application to analysis of new raw data from social media, sensors ,and web traffic,
but we found that firms are being very practical, with early adopters using it for operating on enterprise data they
already have.
Transactional data from enterprise applications
72%
Sensor/machine/device data
42%
Social media (Facebook, Twitter, etc.)
35%
Unstructured content from email, office documents, etc.
35%
Clickstream
27%
Locational/geospatial data
27%
Image (large video/photographic) data
13%
Scientific/genomic data
Other
Dont know
12%
7%
5%
Base: 60 IT professionals
(multiple responses accepted)
5
Source: June 2011 Global Big Data Online Survey

So lots and lots of interesting used cases but what was amazing for us in our discovery and
research into Big Data is that, yes absolutely firms are using all of the new data sources. They
are leveraging all of the new data sources such as sensor and machine data, such as social
networks, data from Facebook and Twitter such as unstructured content from e-mails and/or
other office documentation, but what was amazing to us its actually after we thought about it
made a lot of sense is just your basic good old fashion kind of motherhood and apple pie
transectional data from ERP applications that type of data is still the keying of whats being used
by Big Data technology which just goes to prove the point that traditional approaches, data
ware traditional data warehousing, traditional business intelligence, traditional analytics have
pushed their limits in certain senses and when the scalability requires extremes not just in
volumes, but in speed of processing and lets say dirtiness of the data, thats when people turn to
these new Big Data technologies, very, very interesting revelation.
So what I just mentioned is indeed something that we see on a daily basis. We do see that the
criticality, the mission criticality or business intelligence and analytics continues to go through the
roof, but on the other hand, complexity and rigidity probably is a better word to use here, the
rigidity and lack of flexibility and expense of scaling these technologies beyond certain point or
have been pulling this market apart in different directions and as you can see there is this rift in
the market, where traditional technologies is just really reaching their limits, but like here we do
have a way to close that gap, we have a way to address this opportunity and we have a way to
scale and the way to scale is first and foremost with lots and lots of best practices. This is not a
subject of todays, todays webinar, but I highly recommend for all of you to reach out to us for all
sorts of best practices because the way you structure your organization, the way you organize
and govern your business processes around this is infinitely more important than technology, but
you know technology is very important and therefore these two converging arrows best practices
and next generation technology is whats absolutely critical to close this gap and to address the
opportunity.
Now why is traditional business intelligence, traditional data warehousing are kind of meeting
their limits? Well you know number one, traditional BI is very complex and I am only showing you
about a dozen components here but in any real life situation for a large heterogeneous global,
multi product multi service line enterprise, the process of getting from the left side of this picture
where you have to source raw data to the right side of this picture where you can actually start
making decision is about 20, 30 sometimes even 40 components and these components
sometimes come from different vendors and even if they come from same vendors thats the
technology thats been recently acquired, so its not really seamlessly integrated. So people spend
ton and tons of time just integrating these components right, not actually spending the time
looking at the information and making the decision, so we need to change that.
The second reason is that no matter what we have been saying for the last 20 to 30 years in
terms of aligning business in IT really hasnt worked that well specifically for business intelligence
and analytics and there are multiple reasons for that, but the main reason is just prioritize goals
and objectives over typical business person and typical IT person are different and this is not
about rights or wrongs, its just a nature of the businesses that in on an average typical business
person cares about his or her business requirements and they need to be flexible and agile and
they need to react to whats going on in the business world today, and we on the IT side, we try to
do our best to support that, but we also have other priorities. We are tasked with standardizing
technology and we are tasked with all sorts of planning processes so that we dont want in you
know hundred different directions and we need to try to minimize operational risk of all these
applications, so by just nature of our kind of roles in life, alignment is not, is not there and thats
produced lots and lots of tensions and lots and lots of challenges.
Capabilities necessary for limitless BI and DW
Adaptive data models

Exploration and analytics
Advanced data visualization
Source: March 31, 2011, Trends 2011 And Beyond: Business Intelligence Forrester report
The last kind of but not least is that in traditional business intelligence and data warehousing
technology has lots of limitations. One of the major limitations is that technology typically relies on
some kind of a fixed schema and whether you call it schema or you call it a data model, if you
noticed if you had a chance to glance at the previous slide, at the bottom of the previous slide
there was a word, two words that I used, one was called a pre discovery and another one was
called post discovery right. If you think about it every traditional data warehouse or data mart or
cube has a data model. That data model means that you cant really do a true what if analysis,
because the only what if analysis you can do there is already based. It has to be based on
everything that has been pre-discovered and pre-build and pre-modeled into that data schema
and that data model. So whatever you and your GBA and your business liaison and your
business analysts have talked about a year ago when you build that data model thats really the
only the thing you can explore and analyze and predict and based on from that data model if you
have new types of decisions to be made I guess what you have to go back and change that
model and thats not an easy task.
We at Forrester are exploring and researching four categories of technologies that are absolutely
critical to take BI into this new era of agility, flexibility and scalability, and we categorized them as
all sorts of technologies to make BI more automated and I remember those 30 or 40 components
how do you really either reduce the number of components or make them more integrated, more
automated. How do you make BI and data warehousing and predictive analytics unstructured
data analytics in real time versus batch time, batch cycle BI or more unified as opposed to using
different technologies? How do you make business intelligence and analytics more pervasive
because our research shows that still in majority of enterprises less than 10% of people, now they
are using enterprise grade BI applications and last but not least, my original point on the slide you
know what are the technologies that we need to have to make the data models more kind of
adaptive to the current business realities, not business realities that were there a month, a quarter
or year ago when we build that data model but being able to respond to the business
requirements that are happening around us today.
OLTP RDBMS are a poor fit for BI

In order to tune OLTP RDBMS for BI, one has to:
Denormalize data models to optimize reporting and analysis.

Build indexes to optimize queries.
Build aggregate tables to optimize summary queries.
Build OLAP cubes to further optimize analytic queries.

Additionally, Forrester does not see a bright future for OLTP RDBMS to be
able to handle:
Unstructured content.
Diverse data structures (unbalanced, ragged hierarchies, for
example).
10
Just to kind of drill into a point that I made earlier that traditional transection oriented relational
databases are poor fit for analytics and business intelligence. We all of you who are GBAs or data
architects on the phone you do know that you have to really jump through the hooks to make it
work for analytics. You go through lengthy and complex exercise to de-normalize your data
models or we call them to flatten them out, because we want to minimize the number of joints as
we do in database is when we do queries you know that you also spend a lot of time building in
the seas and aggregate tables and OLAP cubes for optimization and we know how to do that just
that it takes a long time and when the requirements change, guess what we got to go back to the
basics and rebuild all those and that is never an easier or fast task and last but not least these
relational databases, no matter what we do they are still a poor fit for unstructured data content.
Its almost like fitting trying to fit a square peg into a round hole, when we tried to do that and also
diverse data structures. Remember relational databases were invented 30 years ago for financial
data primarily where the attributes of any financial transaction are very simple. Its either a debit or
a credit, it belongs to certain chart of accounts and it has a time stamp one at right thats it and
thats very easy to describing the relational structure but in the modern world where we are
dealing with manufacturers or retailers or wholesalers or distributors that are dealing with millions
of products and then each product has completely different set of attributes.
Forrester tracks four types of BI-specific DBMS
Source: May 27, 2011, Its The Dawning Of The Age Of BI DBMS Forrester report
11
Imagine if you are Home Depot and you are selling I dont know kitchen appliances and you are
selling I dont know garden supplies right, the description over these transactions and description
of this products have completely different attributes. So you either have to deal with unbalanced
and rugged hierarchies or you build multiple data marts for each data that its again as I said its
just like trying to fit a square peg into a round hole. So luckily for us the users of these
applications, there is a new breed of technology out there, a one category of technology is just
you know business intelligence specific database management systems, and here we are trying
to present its own you know three dimensions right where we are kind of showing you if your data
is very different or desperate and if your requirements change very quickly and if you have certain
high scalability and scalability requirements, these are the three or four types of different
database technologies and architectures versus traditional, relational database technologies that
are now at your disposal. Then what, so whats interesting is that we really need to underscore
that is not just about volume.
Big volume is the top concern, but velocity, diversity,

cost, and new analytic requirements are also important
What are the main business requirements or inadequacies of earlier-generation BI/DW/ETL
technologies, applications, and architecture that are causing you to consider or implement big data?
In traditional BI and DW applications, requirements come first and applications come later. In other worlds,
requirements drive applications. Big data turns this model upside down, where free-form exploration using big data
technology to prove a certain hypothesis or to find a pattern often results in specifications for a more traditional BI/DW
application.
75%
Data volume
Analysis-driven requirements (big data) versus
requirements-driven analysis (traditional BI/DW)
58%
52%
Data diversity, variety

Velocity of change and scope/requirements
unpredictability
Cost: big data solutions being less expensive than
traditional ETL/DW/BI solutions
Dont know
Other
38%
30%
3%
10%
Cost is also a factor in many cases, and dealing with data using big data technologies is simply cheaper and faster
than other methods.
12
In a recent survey as you see in front of you we are showing, you have data you know about
extreme scales of data volume are at the top priority for people who are using these Big Data or
other alternative database technologies, but its also a data diversity variety. Its the velocity
and/or data change, its also cost right and the very interesting one, the second one is that its
about analysis driven requirements versus kind of experience driven analysis right. If you think
about it in a traditional, in a traditional environment you have to get the requirements first and
then you build your data warehouse and in Big Data environment its almost like a chicken and
the egg syndrome. In order for you to gather the requirements you need to understand whats out
there, but in order for you to understand whats out there you need to have some type of a model
in mind but you cant have a model until you explore it right. So you see this is circle that really
can only be addressed by the new Big Data technology and thats why we say analysis or
exploration based requirements definition.
With that in mind we have updated our, the previous slide that I have shown you specifically for
Big Data and it is characterized by five Vs. Right you only see four Vs on this slide, but I will
introduce the fifth V on the next slide. So its obviously about volume you see that on the
horizontal axis and its about velocity, very important point here, velocity is not just the speed of
the data change, but its the speed of the requirements. Right the speed of the change of data
changes right, that thats the way to think about it. These are your x and y axis and as you can
see as the variety and the diversity of variability of data also gets more complex. The space for
traditional BI gets squeezed even more and opportunities for a Big Data analytics become more
and more important.
Now the fifth V here is really value. I think its time to value, because as you can see in a
traditional approach, you need to do all sorts of integration and cleansing and that being apples to
oranges which in a 20% of the cases and 20% over the applications is absolutely a must right
when we are looking at a financial application or two plus two has to equal four. There is no
question about it but in other applications such as brand management, brand analysis you know
two plus two doesnt have to equal, but we just need to get a high level understanding of what
people are saying about us out there, and we dont want to spend tons of time you know
identifying similar keys, primary foreign keys. You are doing data de-duplication because kind of
accurate enough is good enough for a brand management analysis right, so you can get to a
value much quicker so therefore thats the fifth V in big there and as I mentioned that that arrow
that kind of goes down from Big Data to traditional BI is the point that I made earlier that
sometimes the results of your Big Data exploration get manifested into requirements for a
traditional business intelligence application.
BI DBMS and big data address different use cases
Source: May 27, 2011, Its The Dawning Of The Age Of BI DBMS Forrester report
15
There is unfortunately some hype out there in terms of what these new technologies do in, so
here is one way to look at them. Before you kind of plunge into evaluating different technologies,
make sure that you are looking at apples to apples right and you are not comparing apples to
oranges, because depending on your data volume and depending on whether you want to reuse
your existing traditional BI infrastructure and whether your data is really relational and not
relational and how much unstructured content you have out there or all of these different types of
technologies have their strength and weaknesses.
No one has the answers yet. Although some companies

attempt to fit big data into standard SDLC and PMO
methodologies, Forrester believes that big data requires
different approaches
Do you run your big data initiative using the
same or different PMO standards versus
BI/DW/ETL?
Same
38%
Different
27%
Dont
know
Other
28%
7%
Do you run your big data initiatives using the

same or different SDLC standards versus
BI/DW/ETL?
Same
25%
Different
37%
Dont
know
Other
35%
3%

16
So here is one simple way for you to kind of separate fact from fiction. Probably a much more
important point to make is that while the vendors like Endeca have made great progress, so
where its helping you with technologies, with these new types of solutions. No one out there
really has all the answers about how do we organize this right. We understand quite a bit about
the technology but how do we create you know what kind of organizational structure we create
around this; you know what kind of methodologies do we use. Is our methodology that we have
created then tested for the last let say ten years for developing business intelligence, data
warehousing, ETL processes both software development and lifecycle methodology and project
management methodology, are they one and the same for Big Data?
No one has the answers yet. Most firms intend to

retain their big data for both reprocessing and
compliance reasons
Do you intend to retain your raw big data after the exploration/analysis
stage?
Yes, for reprocessing, more analysis
33%
Dont know
25%
Yes, for compliance and reprocessing
23%
Yes, for compliance
12%
No
Other
5%
2%

17
Well, I dont have the answer what I am trying to show you is that you know folks are all over the
place there you know some use same technologies, some use different and you know good
significant number still dont know how to do this. You know some other interesting questions that
always come up if you are using a known database type of Big Data technology lets say you are
using hadoop type files systems and then processing. What happens when you want to reprocess
your query right? Now hadoop has no persistence, so if you run your analysis and you discover
say you think you have discovered something amazing and you just want to verify it before you
plunge your significant investment into this new project, you want to rerun that analysis but guess
what when you rerun it even a minute later, results have changed because the data has changed.
How do you persist Big Data you know how do you store it, how do you store it how do you, you
know how do you look at security and compliance and disaster recovery and all the things that we
are used to in a traditional data warehouse and business intelligence of the environment, how do
you do this in the Big Data environment. You know all I guess we are going to find out in the next
couple of years as firms kind of try and fail or try and succeed, but today no one really has the
answers yet.
Big data is a big deal . . . its not business as usual
Firms are reconsidering the single version of the truth idea
Big data will redefine security, privacy, and compliance rules

IT will not control big data technology environments
One-size-fits-all technology standards will not be flexible enough
18
So Big Data is indeed big deal right and I hope that I have shown you that for multiple dimensions
and less human, its, it is indeed pushing the limitations or traditional technologies and
methodologies and approaches, but most importantly it is really not business as usual all right
taking from a famous story you are not in Kansas any more right. You look around yourself and
the good old truth that we all thought to be unshakable and thats no longer the case. For
example a single version of the truth is no longer an absolute. It is absolutely now a relative
context driven kind of approach right because as I mentioned to a CFO two plus two absolutely
equals four, even if it takes a week or a month end closing process that cost million dollars every
month to run to calculate that number four, but there is no option right. We have to come up with
that number four but to a marketing person who just woke up in the morning and sees a new
competitive threat form from major competitor and who needs to immediately pour out a new
campaign to address that threat and who needs to immediately do a new customer segmentation
to you know for raiser focus of that managed campaign process, you know two plus two equals
3.9 or 4.1 is a perfectly acceptable answer and because even if he gets at 80% right, but he
sends out that campaign that morning right thats when its going to count if the campaign is based
on 100% accurate data, but send out even a day later its really meaningless, so at that point
single version of the tool becomes meaningless.
We already talked about all of the challenges that Big Data presents when we define security and
compliance right. You cant really explore something without some kind of model out there right.
You need to understand some structures but in order for you to provide structures you need to
put security on it because once there is a structure, people who are not authorized to access
certain structures shouldnt be able to access but in order for you to create those structures and
to put those authorizations, you need to understand what they are but in order for you to
understand what they are unique you need to right once again you are in that kind of vicious
circle or what comes first chicken or the egg. How can you secure something if you dont
understand whats out there but in order for you to understand whats out there you need to
explore it? So chief compliance offices, chief risk offices all sorts of regulatory bodies and
enterprises have a huge task in front of them.
IT is absolutely no longer in charge. IT should be in charge for all of the traditional tasks such as
the data preparation and data cleansing and disaster recovery and more stable environments like
data warehouses, but in the Big Data world where everything changes on a dime where
requirements change on a dime traditional IT approaches will not work. So it has to change its
mentality, it has to let the rains out a little bit and it has to understand and embrace that
businesses do need to run Big Data, a new world of agile information exploration more on their
own and we on IT side need to either embrace it and become true partners or we are going to be
outdated very quickly and last but not least, there is a realization that one size fits all technology,
doesnt really work today. The extremes that we are talking about absolutely call for specific tools
for specific used cases.
When you are doing transaction processing on billions of rows of data you absolutely need the
data, database technologies thats optimized for transaction processing and even you need to run
a fixed report on billions of rows. You need another type over database that is optimized just for
that and when you need to do exploration on not just billions but trillions data points and its not
100% accurate analysis, it is indeed exploration you need to understand whats out there before
you can even start to understand what is it that you are going to analyze. You need to get the
third type of technology optimized just for that. I dont think that in the near future, we will have
one platform that will let us do all that. So specific tools for specific tasks I think is going to be the
name of the game in the near future.
Recommendations
1. Identify opportunities, and have a what if we could conversation with
your business
2. Clearly understand why traditional BI/DW cant solve a problem
3. Start simple, small, and scalable

4. Develop a set of governance policies
5. Develop a business case with tangible ROI
19
How do you get started with Big Data? Well you obviously need to start with very specific used
cases right. You absolutely dont want to go there just because everyone is doing it. So take a
look at the requirements, take a look at what is that you think you could do if all of the limitations
of your traditional environments werent there right. You know I am sure you had lots of
discussions with your business counterparts over the last couple of years where your answer was
Well we cant do it because its too complex or we can do it because its too expensive, I will
come back to those used cases and see if you indeed can do that now with the new technology,
but number two absolutely understand that why is it that you are going there. If its really just
about volume you know maybe adding more RAM, adding more CPUs more sockets etc., to your
existing data warehouse maybe thats a much simpler solution to your problem.
You know we really see when is the intersection of two or more over those Vs that we talked
about, when its not just volume but its velocity and its not just volume but its variety of data, its
not velocity of data, but its velocity of change of data. Thats when you kind of should ask yourself
a question, Is my traditional data warehouse the right platform for it? just like with anything else
obviously big bang approach never works, you need to start simple and small. You need to
identify the low hanging fruits just like with any traditional BI technology you know it can very
easily lose momentum and enthusiasm and support over your business stakeholders. So you
need to deliver value very quickly and the best way to do this is as I said you know Start small
but think big and I deliver something quickly literally within weeks, hopefully even within days.
They should understand the value and that should drive their decision to invest in more scalable,
more integral enterprise grade technologies. You obviously need to have some kind of
governance policies right. We unfortunately at this point as I said early we cant advice you as to
what those policies are, you know but you should put some constraints around it obviously if you
take just anybody and let them lose in at the entire universe over your data then you know the
operational risk complications are so huge that I cant even begin to talk about it.
Almost 50% of firms surveyed are undertaking big data

projects with no business case or intangible benefits
only; business performance is the most common
success goal
Do you have a business case for
the big data initiative in place?
How do you plan to measure the

success of the big data initiative?
Yes, with a projected ROI
28%
With quantitative metrics tied to

business performance
No business case
25%
With qualitative metrics tied to

business performance
Yes, with intangible benefits

only
8%
Other
7%
13%
With quantitative metrics tied to IT

performance
12%
With qualitative metrics tied to IT

performance
12%
Other
3%

20
22%
Dont know
10%
Dont know
32%
No specific measurement
methodology in place
22%
Yes, with a proven ROI
47%
So please do put some constraints around it and please share what you find out with us because
we definitely at Forrester wants to start build a knowledge base of what works there and what
doesnt work and absolutely no business person will ever understand use of any technology
without a rock solid business case that you absolutely need to do that. Unfortunately lots of your
peers and competitors arent still doing a good job of supporting their Big Data initiatives with very
kind of tangible business case; lots of people dont even know how to measure success.
Recommendations: Use Forresters Business

Intelligence DBMS Effort Estimation Model
Enter project/application parameters.
Adjust Forrester estimates for % savings for all of the substeps under data
preparation, data modeling, data usage.
Initial effort with row
RDBMS-based BI
$750,000
Initial effort savings
Data
preparation
Total effort
Columnar RDBMS
In-memory index
Inverted index
Associative
Ongoing yearly effort
with row RDBMSbased BI
28%
31%
31%
30%
$207,000
$232,500
$235,500
$222,000
34%
8%
8%
8%
Data
modeling
$102,000
$24,000
$24,000
$24,000
25%
54%
54%
54%
Data
usage
$75,000
$162,000
$162,000
$162,000
20%
31%
33%
24%
$30,000
$46,500
$49,500
$36,000
$1,658,100
Ongoing effort
savings
Data
preparation
Total effort
Columnar RDBMS
In-memory index
Inverted index
Associative
21
44%
68%
68%
61%
$735,375
$1,131,375
$1,129,125
$1,016,625
46%
11%
11%
11%
Data
modeling
$12,750
$3,000
$3,000
$3,000
41%
79%
79%
79%
Data
usage
$75,750
$146,250
$146,250
$146,250
45%
68%
68%
60%
$646,875
$982,125
$979,875
$867,375
So you absolutely have to have that, what are the success factors and what is the actual value
and put some, put some hard numbers around it. We do have little bit of advice and little bit of
help. We have created this effort, model way you can take all of the steps that you use in a
traditional data, warehouse data mart business intelligence environment, all the way from data
sourcing to implementation, all of the steps that you usually execute and put a value on each step
and then we compared, we compared the final results to, if you implemented the same type of
initiative using these new types of agile and Big Data type technologies and as you can see, we
are definitely projecting significant savings right, I dont recommend looking at these numbers as
hard numbers, but come to us, get a copy of this model plug in your own numbers and hopefully
you will have similar results and hopefully it will be a good input into your business case.
Thank you
Boris Evelson
+1 617.613.6297
bevelson@forrester.com
http://blogs.forrester.com/boris_evelson
Twitter: @bevelson
www.forrester.com
So with that in mind, hopefully we whet your appetitive about Big Data and what can I do for you
and kind of loaded you to some potential gut shells and what you should be looking out for, here
is information where you can find me and Forrester colleagues when you have more questions
but at this point let me turn this back to Endeca, so Paul take it away.
Paul Sonderegger: Boris, thank you very much. That was a fantastic analysis of Big Data
landscape and you know the thing that I want to cover here is actually just a piece of that
landscape and that is, give a couple of examples of Big Data, in daily decisions and kind of value
thats created in doing that. Now so at Endeca the main thing is that we do is turn Big Data into
better daily decisions and we have actually been doing this for a long time, for well before this
practice had a name, well before Big Data was a buzz word. We actually worked on what is
arguably the original Big Data problem; online exploration in e-commerce.
So if you think back to the early 2000s, and you think about what was going on with e-commerce
you had this problem where there was great amounts of diverse data and content, product out of
the catalogue, contents on products from content management system, as well as reviews started
coming up and then of course there is offline transactional data from enterprise systems that
ought to be used to improve the user experience and so we are bringing together all of that
diverse data and content and then making it available to consumers so they could make better
buying decisions. So we were making this diversity of data, this big variety available to people
with no training in technology and people who have had a big variety of questions. Every
consumer thought about, oh say Home Depots, product inventory slightly differently and
completely differently from the way Home Depot thought about it. So we were solving these Big
Data problem and daily decisions in e-commerce a long time ago and what we have found is that
some of the innovations that we created in the user experience as well as in data integration
apply directly to solving Big Data in their decisions at work and so we have two products Endeca
Infront which is a customer experience management platform and Endeca Latitude which is for
agile BI and its Endeca latitude we are going to talk about here.
Both of these products though are based on the MDEX engine and we will talk in a little bit about
what makes that MDEX engine different, but first I want to hit well what is the matter and to do
that I want to talk about three examples, three examples of our customers who have solved these
Big Data problems at work and as, as we are talking about these examples the thing is that I
would like you to think about is this is really BI beyond the warehouse. So as Boris described the
BI industry is very, very sophisticated, has worked out enormous difficulties around merging large
volumes of data, doing analysis on large volumes of data, but there is a key requirement that
comes from traditional BI technologies, traditional relational BI technologies and that is, that you
build the model first and then fill it with conforming data. That approach is in conflict with the big
variety of Big Data, not necessarily in consult with the big volume of Big Data, but certainly with
the big variety and the big velocity of change of Big Data, and so what we are going to talk about
here is BI beyond the warehouse where you have great variety of information. Its constantly
changing and it needs to serve people who have no training in technology, but they have
questions that matter to them.
So if we take a look at some of these Big Data problems at work, then the first one is that I want
to mention is some work we gave to Toyota and as many of you may know in 2010 so you
already had a very, very big product recall. That was actually the biggest in their history and the
main problem that the main problem Toyota faced was here is a company whose brand is based
on the outstanding work they have done in creating high quality products for decades and now
that brand identity, the very soul of the company was under attack by these claims of unintended
acceleration with Toyotas and so the challenge they have was to somehow figure out how to
isolate the root causes behind these claims, in order to restore the brand, but to do that they had
to pull together a great variety of data. So more than 12 different systems each with their own
schemas each with their own data models and in some cases with long form of text that has no
data model at all and so this included extract out of the vehicle warehouse, it included structured
data out of enterprise applications including Oracle system that was a quality touch point that
captures information about the manufacturing processes on the manufacturing floor. It also
included data from National Highway Transportation Safety Authority NHTSA and these were
claims and in those claims there were some field of data, so some structured data but there are
long form of text to do the description of the problems themselves and this all needed to be
brought together somehow and very, very quickly.
In addition, in addition to the difficulty of bringing together that diverse data, there was huge
variety in the questions that had to be asked; so which parts are we talking about, which parts are
mentioned in those claims, which suppliers provided those parts but which factories, are in which
vehicles? No one knew ahead of time that any of these questions would have to be answered
and no one of course knew ahead of time that these questions would matter so much. So with
that kicks, CIO of Toyota, North American Motor Sales, when he tells the story he you know the
way he tells the story is that he found with Endeca Latitude a technology that he believes would
make a difference. So at a time like that where there is great anxiety, great uncertainty and this is
generally not a good time to introduce new technology into the organization, but he felt that he
had found a technology that would make a difference because it dramatically reduced the cost
and time and effort of integrating diverse data and content together and it dramatically reduced
the time, cost and effort to the users in this case design engineers in making sense of that data
so that they could get fast answers to complete the new questions, as that latitude app was
instrumental in housing Toyota established that in fact there was nothing demonstrably wrong
with those pedals or with the electronics to control them.
So the second customer, second company who has solved a Big Data problem at work is Land
OLakes and Land OLakes had a similarly large problem. Their problem is that they want to
figure out how to analyze seed performance, corn especially to feed a growing world. They know
that they need to help farmers increase the yield of their planted acres in order to feed a growing
population which is going to hit seven billion this October, and so in this case the diversity of data,
this great variety of data they had to bring together included data from their transaction
warehouses as well as information from their marketing programs which indicated to which
farmers they had marketed, which products in the past. It also included information that was from
outside the company, government acreages reports, which indicate how much land is actually
under cultivation, and there were other stories as well, that came from Land OLakes own test
plots, plots of land at various different locations in the U.S where they test these seed variants, so
they can demonstrate planters the yield you get with this kind of seed at this kind of latitude,
because the users of this application were sales people and distributors who are trying to
persuade farmers that they are to change the seed they use.
These farmers cannot risk a poor growing season and so the data has to tell a very clear story
and it has to do it persuasively, but of course there is a huge variety of questions which seeds,
which soils, which farms, which locations and so again you get this issue of sort of double
uncertainty which Boris described. You cant know what kinds of questions will be asked of the
data and so you know exactly what data you are going to have but you are also not sure what
data you should pull together in order to answer the questions that may come up. The way to
resolve that double uncertainty is to exploring this information, but again it has to be cost effective
and that exploration has to be easy to use.
So the last example I mentioned here was with the British Transport Police and the problem they
are trying to solve is, how to identify threats faster to keep more citizen safe. So here they are
using a technology as part of securing London for the 24/12 Olympic Games and this is a highly
different example because here is the scope of the data being brought together, is not as larger
than the other cases but of course is the magnitude of the problem is immense. So here we are
talking about data out of the command warehouse we are talking about stop and search reports
from local policing forces as well as local service gazettes, that have information about particular
events in neighborhoods and things like that, so that the officers can get faster answers to new
questions about whats going on at the street level, which people, which places, which events,
which threats and again you get immense variety in the data thats coming from different sources
and all the different schemas, but then also big variety in the questions and this is what
characterizes Big Data problems at work, its BI beyond the warehouse data that does not
conform to a nice neat model, to answer questions which cannot be fully anticipated.
So how do you do that; how do you implement such a solution like that? Well we are going to be
just a little window into that. We take an approach that we call agile delivery and it really means
an iterative approach to provisioning these latitude applications and what we mean by that is you
dont have to have all the requirements upfront, remember you know the problem that Boris
described in a Big Data world, well you have double uncertainty. You cant exactly be sure which
questions you are going to ask until you know which data you have available to you and you cant
exactly know which data you want to make available and how you want to link it together and so
after you have an understanding of what kinds of questions would be asked a bit, so the chicken
and the egg problem and the way to resolve that is to load the data as is, start to explore it and
then add new data sources as the business realizes, Oh now that you show me these two
sources together I just remember there is a database under Georges desk, he has been here for
20 years, he has been collecting notes on whats been going on in the field. Can you add that in
and the answer needs to be Yes we can do it really quickly. In addition the business will say Ah
now you show me see this I want to see the data in a different work and you show it to me on a
map and the answer needs to be Yes and it needs to that answer needs to be delivered very
quickly that new visualization needs to appear very quickly.
So this is the approach we take, one of these iterative turns very, very rapid in adding sources
changing the visualizations and the business in the BI team sit together. They work together hand
in hand because they are in a situation where neither can specify all the requirements that matter
ahead of time. So here is the effect in service of delivery time and FTEs full time employees to
deploy these projects. One of things that Toyota had said about their project is that they expect
that with traditional relational BI technologies, the Latitude App that we did for them would have
taken 55 weeks but we have taken over a year and they said we didnt have the kind of time, but
with Endeca Latitude it actually took 12 weeks and if you will notice one of the big difference is
here between the two bars is the way work changes and the big change is because of the MDEX
engine has got a technology, you dont spend time building a predetermined model but rather
pour the data in and then expose it, so that the quality engineers and the BI team together can
get a look at it. Land OLakes for their project, they have to make that it would have taken 15 full
time equivalents, with relational technologies but with Endeca Latitude instead required only
seven.
So I had mentioned that we would talk about, we will talk about the technology itself what makes
this possible and so on, and so I want to touch on that just for a moment. The key thing is this
MDEX engine inside latitude, has the hardest latitude platform is the MDEX engine and it is a
post relational innovation. It is a hybrid search analytical database it borrows some of the best
ideas from the world of search, some of the best ideas from the world of analytical databases and
unifies them into a new architecture. So inside that MDEX engine there is no predetermined
model for the data, but instead it has a dynamic schema. The engine derives its model of the data
from the incoming records and documents themselves and then on top of that it provides
integrated navigation analytics search on that highly diverse and changing data and of course we
are using latest ideas in analytical databases to deliver really some fast performance interactive
response time, because we learned in serving consumers thats whats required. Thats what is
required to support speed of thought analysis. So this is in-memory performance but not memory
balance because we are often working with data sets that exceed the limits of memory, and this is
all wrapped up, this engine is wrapped in a scalable reliable and secure platform.
First I want to stop here is actually where we began which is whats going on in the industry at
large. So Big Data is a huge opportunity whether you work with Endeca to try to capture or
whether you work with someone else, there is a huge opportunity and the reason is that whats
happening with the advent of Big Data is that the real world is increasingly reflected, increasingly
reflects the connectivity of the digital one and the digital world increasingly reflects the diversity of
the real one and in a situation like that relational technologies which require that you build a
model first and then fill it with conforming data are at odds with this new world and there is whole
generation of technologies coming out now that take a post relational approach and this is the key
in order to reduce both the time, cost and effort of integrating diverse and changing data together
and to reduce the time, cost and effort to the user of making sense of it, to make better daily
decisions, and so with that I will hand it back to Ron, Ron
Ron Powell: Great excellent, Boris and Paul excellent presentation. We will now move on to
Q&A, and again if you have a question, please submit it and put Boriss name or Pauls name in
front if you want to direct it directly to them. So we will start with the first question Boris, it seems
that Big Data is a natural fit for cloud based for a cloud based data marketplace perhaps
increasing the data value when third parties have interest. If you think this is in the cards, if so
can you comment on how you see this evolving?
Boris Evelson: Well thats interesting. Once again one of the hypes out there that we will
uncover it in our research is that Big Data is mostly about open source technology and a Big Data
is mostly about cloud technologies but majority of our respondents said No we are running our
Big Data initiatives in-house and we are running them on some proprietary technology So again I
am not advocating one versus another I just want to make sure that the listeners understand that
its not really about open source versus proprietary, and its not about a cloud versus on premise, it
is all about what is your business used case, what kind of questions you are trying to ask and you
know then and only then what kind of a technology has the best fit.
So indeed if you are in a situation where most of the data that you are going to be analyzing
already resides in the cloud such as social for example is probably little reason to bring it inhouse for processing unless, unless you are really trying to build 360-degree view of a customer
where all of the financial and HR and supply chain and your own sales and marketing data is
inside your firewall right, then old business social data is just one small component. So again you
know cloud is an absolutely great technology for the right use case and I just dont want the
listeners on the phone to think about cloud as a panacea you know its all about what is it you
want to do, how you are going to do it and you know at the end of that line of questioning you
know where you are going to do it thats when that question needs to be asked.
Ron Powell: Great, great. Paul this question for you how did you tackle, how do you tackle data
that exceeds a memory bound just as you mentioned something that maybe bound but you
stopped short could you please elaborate and clarify?
Paul Sonderegger: Sure. So we have you know intellectual property thats, that exploits the full
memory hierarchy all the way from right next to the processor through DRM all the way down to
the disk, and so we are constantly optimizing what is kept in memory. Recent queries, recent
intermediate results, recent full sets of results, and so by doing that we are able to deliver
interactive response time in search and navigation and analytics on data that is so large, it cant
be fully held in memory. Some of it has to be held on disk as well and this is something that we
ran into in the e-commerce world because search industries for example are just simply too large
to hold in memory. So we actually ran into this problem, years ago and so it has been developing
and refining the intellectual property we used to fully exploit the flow memory hierarchy in addition
to exploiting the 64 bit addressable space that comes from these new commodity super
computers.
Ron Powell: Great another question for you Paul, would you say that Endeca is an index-based
implementation of the federated distributed model?
Paul Sonderegger: Well there are a number of terms in there that would require definition but let
me put it this way. A good way to think of that MDEX engine is that it is acting as a very high
speed cash on top of the source systems underneath, that MDEX engine does not become a
transactional system of record in you know at any time, rather it updates at the same rate as the
underlying transactional systems. So we are indexing the data from the systems, we are indexing
the structure from those underlying systems but that representation in the engine does not
become a transactional system of record.
Ron Powell: Okay now this question is how does Big Data management deflect and toured
intervention on data sets? In other words when is relational data decoupled or shielded to prevent
infiltration of skewed sets stemming from names, measure uppers, I dont know which one you
would like to handle that, Paul
Paul Sonderegger: Well thats Boriss question.
Ron Powell: Boris, go right ahead.
Boris Evelson: Yeah, thats a loaded question obviously you know the person who asked that
question definitely reached out to me via email. I would need to understand more details behind
it. I think the question was about how do you, how do you merge you know traditional relational
data sets with Big Data, a type of kind of unstructured, un-modeled data sets, again I think there
are no easy answer there. I dont think there is any direct way to do this other than once you have
gone through the exercise of exploration, then as an output of your exploration exercise you
found some relational like structures in your Big Data environment. You know, you indeed turn
them into relational structures and you then either federate it with your traditional data
warehousing you physically load it into the data warehouse. I know its not an exact answer to that
complex question but thats the best I could do it in such a short time.
Paul Sonderegger: Yeah, let me just add one thing to that. One of our customers, a CIO told us
that his idea of how latitude compliments is existing BI infrastructure is very straightforward. If its
in warehouse they already have the tools in this case it was OBIEE to query that data and then if
its outside the warehouse, or if its going to be used by the people with no expertise in BI
technology, its latitude.
Ron Powell: Great, great. Well we are just out of time so I want to thank, you and Boris, Paul and
Boris for such great presentation. I want to thank you for that and I would also like to thank
everyone for attending the web seminar today and it will be available on the Bitpipe at TechTarget
and please we will follow-up with the rest of the questions that were not answered but if you do
have a question please e-mail either Boris or Paul directly, thank you.

Big Data Analytics - Emerging Techniques and Technology For Growth and Profitability

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Big Data Analytics - Emerging Techniques and Technology For Growth and Profitability

Hochgeladen von

Copyright:

Verfügbare Formate

Big Data Analytics: Emerging Techniques and Technology for Growth and

Big Data Analytics: Emerging Techniques and

October 27, 2011

Firms use only 1% to 5% of available data . . .

2011 Forrester Research, Inc. Reproduction Prohibited

Innovators turn more data into more value

A pharma company is collaborating with a healthcare provider to identify

2011 Forrester Research, Inc. Reproduction Prohibited

Despite the hype, most firms find big data technology

Transactional data from enterprise applications

Social media (Facebook, Twitter, etc.)

Unstructured content from email, office documents, etc.

Image (large video/photographic) data

Source: June 2011 Global Big Data Online Survey

Capabilities necessary for limitless BI and DW

Adaptive data models

Advanced data visualization

2011 Forrester Research, Inc. Reproduction Prohibited

OLTP RDBMS are a poor fit for BI

Denormalize data models to optimize reporting and analysis.

Build OLAP cubes to further optimize analytic queries.

2011 Forrester Research, Inc. Reproduction Prohibited

Forrester tracks four types of BI-specific DBMS

2011 Forrester Research, Inc. Reproduction Prohibited

Big volume is the top concern, but velocity, diversity,

Data diversity, variety

2011 Forrester Research, Inc. Reproduction Prohibited

BI DBMS and big data address different use cases

2011 Forrester Research, Inc. Reproduction Prohibited

No one has the answers yet. Although some companies

Do you run your big data initiatives using the

Source: June 2011 Global Big Data Online Survey

2011 Forrester Research, Inc. Reproduction Prohibited

No one has the answers yet. Most firms intend to

Yes, for compliance and reprocessing

Yes, for compliance

Source: June 2011 Global Big Data Online Survey

2011 Forrester Research, Inc. Reproduction Prohibited

Big data is a big deal . . . its not business as usual

Firms are reconsidering the single version of the truth idea

Big data will redefine security, privacy, and compliance rules

2011 Forrester Research, Inc. Reproduction Prohibited

3. Start simple, small, and scalable

2011 Forrester Research, Inc. Reproduction Prohibited

Almost 50% of firms surveyed are undertaking big data

How do you plan to measure the

Yes, with a projected ROI

With quantitative metrics tied to

With qualitative metrics tied to

Yes, with intangible benefits

With quantitative metrics tied to IT

With qualitative metrics tied to IT

Source: June 2011 Global Big Data Online Survey

Yes, with a proven ROI

2011 Forrester Research, Inc. Reproduction Prohibited

Recommendations: Use Forresters Business

2011 Forrester Research, Inc. Reproduction Prohibited

2009 Forrester Research, Inc. Reproduction Prohibited

Das könnte Ihnen auch gefallen