You are on page 1of 19

STRATEGIC GUIDE TO

Big Data Analytics


Five Analytics Trends to Exploit

Big Data 101: What You Should Know

Key Questions to Get You Started

Don’t Forget About Data Security

The Top 5 Big Data Challenges

Coping with the Talent Shortage

FROM THE EDITORS OF


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

Table of Contents Editorial

Adapted from articles EDITOR’S NOTE Editor


published at CIO.com Mitch Betts
mbetts@cio.com
Five Business Analytics Competing on Analytics
Trends — And How to IT leaders have been on a quest to obtain Contributors
Exploit Them competitive advantage through technology for 25 David F. Carr
[page 3] Joab Jackson
years. And that quest is about to enter a new Thor Olavsrud
With Big Data, chapter: the era of big data analytics. Bob Violino
Semantics One survey found that 70% of respondents can
Really Matter envision a “killer application” for big data that Editorial Management
[page 7] Brian Carlson
would be “very useful” or “spectacular” for their Maryfran Johnson
Big Data: business. The catch: Most chose not to disclose Dan Muse
How to Get Started what that application would be because it would
[page 10] provide a competitive advantage. Image Credits
AllyB208/Fotolia [cover]
To get started, you’ll first need to ask blue-sky AGSAndrew/Fotolia [page 3]
Don’t Forget About
Securing Big Data questions of your business execs— such as “if only Andres Rodriguez/Fotolia
[page 11] we knew…” or “if we could predict…”— so you’ll [page 16]
know what information or answers would qualify
Big Data 101: as “spectacular.”
What CIOs Should Know Copyright 2012
[page 13] Then you’ll have to figure out what internal and CXO Media Inc.
external data could help. Third, you’ll need to find 492 Old Connecticut Path,
P.O. Box 9280,
Coping With the “data scientists” who can help you make sense of
Big Shortage of Framingham, MA 01701
that data, which won’t be easy.
Big Data Talent
[page 16]
This report is intended to help guide you along
the way.

Mitch Betts

[2] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

Advances in analytic technologies


and business intelligence are
allowing CIOs to go big, go fast, go
deep, go cheap and go mobile with
business data.
Current trends center as much
on tackling analytics challenges
as they do on taking advantage of
opportunities for new business
insights. For example,
technologies for managing and
analyzing large, diverse data sets
are arriving just as many
organizations are drowning in
data and struggling to make sense
of it. Still, many of the cost and
performance trends in advanced
analytics mean companies can
ask more complicated questions
than ever before and deliver more

Five Business Analytics


useful information to help run
their businesses.

Trends — And How To


In interviews, CIOs
consistently identified five IT
trends that are having an impact

Exploit Them
on how they deliver analytics: the
rise of big data, technologies for
faster processing, declining costs
for IT commodities, proliferating
mobile devices and social media.

[3] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

1. Big Data Nustad, CIO of HMS, a firm that

70%
Big data refers to very large data helps contain healthcare costs for
sets, particularly those not neatly Medicare and Medicaid
organized to fit into a traditional programs, as well as private
data warehouse. Web crawler businesses. Its clients include
data, social media feeds and health and human services
server logs, as well as data from programs in more than 40 states
supply chain, industrial, of respondents can envisage and more than 130 Medicaid
environmental and surveillance a “killer application” for managed care plans. HMS helped
sensors all make corporate data its clients recover $1.8 billion in
big data that would be
more complex than it used to be. costs in 2010 and save billions
Although not every company “very useful” or more by preventing erroneous
needs techniques and “spectacular” for their payments. “We’re getting and
technologies for handling large, business. The majority chose tracking so much material, both
unstructured data sets, Verisk structured and unstructured
Analytics CIO Perry Rotella not to disclose what that data, and you don’t always know
thinks all CIOs should be looking application would be what you’re looking for in it,”
at big data analytics tools. Verisk, because it would provide a Nustad says.
which helps financial firms assess One of the most talked about
risk and works with insurance competitive advantage. big data technologies is Hadoop,
companies to identify fraud in ------------------------------------- an open-source distributed data
Source: AIIM survey of 345 information
claims data, had revenues of more processing platform originally
professionals, 2012
than $1 billion in 2010. created for tasks such as
Technology leaders should compiling web search indexes. It’s
adopt the attitude that more data one of several so-called “NoSQL”
is better and embrace correlations between things that technologies (others include
overwhelming quantities of it, you don’t know up front.” CouchDB and MongoDB) that
says Rotella, whose business Big data is an “explosive” have emerged to organize web-
involves “looking for patterns and trend, according to Cynthia scale data in novel ways.

[4] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

Hadoop is capable of
processing petabytes of data by
assigning subsets of that data to
Horizontal & Vertical Applications
hundreds or thousands of servers, Big Data technology can be deployed for business processes such as
each of which reports back its the following:
results to be collated by a master • Customer relationship management (sales, marketing, customer service)
job scheduler. Hadoop can either • Supply chain and operations
be used to prepare data for • Administration (finance and accounting, human resources, legal)
analysis or as an analytic tool in • Research and development
its own right. Organizations that • Information technology management
don’t have thousands of spare • Risk management
servers to play with can also In addition, big data technology can be used for industry-specific
purchase on-demand access to applications such as the following:
Hadoop instances from cloud • Logistics optimization in the transportation industry
vendors such as Amazon. • Price optimization in the retail industry
Nustad says HMS is exploring • Intellectual property management in the media and entertainment
the use of NoSQL technologies, industry
although not for its massive • Natural resource exploration in the oil and gas industry
Medicare and Medicaid claims • Warranty management in the manufacturing industry
databases. These contain • Crime prevention and investigation in local law enforcement
structured data and can be • Predictive damage assessments in the insurance industry
handled with traditional data • Fraud detection in the banking industry
warehousing techniques, and it • Patient treatment and fraud detection in the healthcare industry
makes little sense to depart from --------------------------------------------------------------------------------------------
traditional relational database Source: IDC, 2012
management when tackling
problems for which relational role in fraud and waste analytics, Among the CIOs interviewed
technology is the tried and true perhaps analyzing records of for this story, those who had
solution, she says. However, patient visits that might be practical experience with Hadoop,
Nustad can see Hadoop playing a reported in a variety of formats. including Rotella and Shopzilla

[5] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

CIO Jody Mulkey, are at an economical way of running


VERBATIM
companies that provide data complex mortgage portfolio
services as part of their business. “Many organizations will analytics for the company, which
“We’re using Hadoop for what struggle to deploy big data manages eight timeshare resort
we used to use the data applications until they improve properties across Florida. “That’s
warehouse for,” Mulkey says, and, their current levels of a potential solution to a very real
more importantly, to pursue information management and problem we have now,” he says.
“really interesting analytics that reduce content chaos.”
we could never do before.” For 2. Business Analytics Get
-----------------------------------------
example, as a comparison Faster
Source: “Big Data: Extracting Value
shopping site, Shopzilla Big data technologies are one
from Digital Landfills,”
accumulates terabytes of data element of a larger trend toward
study by AIIM, 2012
every day. “Before, we would have faster analytics, says University of
to sample data and partition data- Kentucky CIO Vince Kellen.
it was so much work just to deal there is great potential to glean “What we really want is advanced
with the volume of data,” he says. healthcare quality information analytics on a hell of a lot of data,”
With Hadoop, Shopzilla is able to from the data, he says, but that Kellen says. How much data one
analyze the raw data and skip the will probably happen through has is less critical than how
in-between steps. regional or national healthcare efficiently it can be analyzed,
Good Samaritan Hospital, a associations rather than his “because you want it fast.”
community hospital in Southwest individual hospital. It’s unlikely The capacity of today’s
Indiana, is at the other end of the he’ll invest in exotic new computers to process much more
spectrum. “We don’t have what I technologies himself. data in memory allows for faster
would classify as big data,” says John Ternent, CIO at Island results than when searching
CIO Chuck Christian. One Resorts, says that whether through data on disk—even if
Nevertheless, regulatory his analytic challenges are driven you’re crunching only gigabytes
requirements are causing him to by big data “depends on how of it.
store whole new categories of data capital your B and D are.” But he’s Although databases have, for
such as electronic medical records seriously considering using decades, improved performance
in great quantities. Doubtless Hadoop instances in the cloud as with caching of frequently

[6] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

accessed data, now it’s become


With Big Data, Semantics Really Matter more practical to load entire large
datasets into the memory of a
Getting ready to enter the much-hyped world of big data analytics? Here’s some server or cluster of servers, with
advice from David Saul, senior vice president and chief scientist at State disks used only as a backup.
Street, a major financial services firm. Because retrieving data from
One of the keys to extracting useful business insights from spinning magnetic disks is partly
unstructured data—audio, video, images, text, tweets, wikis, forums and blogs— a mechanical process, it is orders
is to create a semantic data model as a layer that sits on top of your data of magnitude slower than
and helps you make sense of everything, Saul says. processing in memory.
The traditional approach is to pull data from disparate sources into a single Rotella says he can now “run
repository for analysis, but Saul says that’s too time-consuming for big data sets. To analytics in seconds that would
make the process more efficient, State Street established a semantic layer that take us overnight five years ago.”
allows data to stay where it is but provides descriptive information about it. His firm does predictive analytics
For example, if State Street needs “a risk profile for all the exposures we have on large data sets, which often
to a particular entity or geography,” a semantic description of the various involves running a query, looking
information sources “means we can quickly pull together a consolidated risk for patterns, and making
profile or an ad hoc request,” Saul says. adjustments before running the
He adds that using the semantic layer means State Street doesn’t have to “go next query. Query execution time
back and redo all of our legacy systems and database definitions. It lays on top of makes a big difference in how
that, so it’s much less disruptive than another type of technology that would quickly an analysis progresses.
require us to go to a clean slate.” “Before, the run times would take
State Street built a set of tools to help end users—generally a business longer than the model building,
person who understands the data well, rather than a programmer or but now it takes longer to build
database administrator—write the semantic data description. “For years we’ve the model than to run it,” he says.
talked about being able to blur the line that exists between IT and the business and Columnar database servers,
having business be able to have tools where they can more clearly express which invert the traditional row-
requirements. This is a step in that direction,” Saul says. and-column organization of
-------------------------------------------------------------------------------------------- relational databases, address
Thor Olavsrud, CIO.com, March 2012 another category of performance

[7] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

requirements. Instead of reading processors found in engineering for


entire records and pulling out gaming systems-to VERBATIM Pentaho, an open-
selected columns, a query can his arsenal. “The “Some organizations source business
access only the columns of math that goes into will discover that big intelligence
interest—dramatically improving visualizations is very data solutions create company, and
performance for applications that similar to the math new opportunities to worked as a
group or measure a few key that goes into consultant focusing
launch additional lines
columns. statistical analysis,” on BI and open
of business that are
Ternent warns that the he says, and graphics source. “To me, open
performance benefits of a processors can focused on selling source levels the
columnar database come only perform calculations information as well as playing field,” he
with the right application and hundreds of times analytic services says, because a mid-
query design. “You have to ask it faster than based on the data.” sized company such
the right question the right way conventional PC and ------------------------------- as Island One can
for it to make a difference,” he server processors. Source: IDC, 2012 use R, an open-
says. Meanwhile, he says, “Our analytic people source application,
columnar databases only really love this stuff.” instead of SAS for
make sense for applications that statistical analysis.
must handle over 500 gigabytes 3. Technology Costs Less Once, open-source tools were
of data. “You have to get a certain Along with increases in available only for basic reporting,
scale of data before columnar computing capacity, analytics are he says, but now they offer the
makes sense because it relies on a benefitting from falling prices for most advanced predictive
certain level of repetition” to memory and storage, along with analytics. “There is now an open-
achieve efficiencies.” open-source software that source player across just about
To improve analytics provides an alternative to the entire continuum, which
performance, hardware matters, commercial products and puts means there’s tooling available to
too. Allan Hackney, CIO at the competitive pressure on pricing. whoever has the gumption to go
insurance and financial services Ternent is an open-source and get it.”
giant John Hancock, is adding evangelist. Prior to joining Island HMS’ Nustad sees the
GPU chips-the same graphical One, he was vice president of changing economics of computing

[8] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

Big Data: Problem or Opportunity?


altering some basic architectural
choices. For example, one of the
traditional reasons for building
data warehouses was to bring the Only 30% of IT pros consider big data a problem. There’s no doubt that big
data together on servers with the data presents technical challenges due to its volume, variety and velocity. Data
computing horsepower to process volume alone is a show-stopper for some organizations.
it. When computing power was The vast majority (70%) considers big data an opportunity. Through
scarcer than it is today, it was exploratory, detailed analyses of big data, a user organization can discover new
important to offload analytic facts about their customers, markets, partners, costs, and operations—then use
workloads from operational that information for business advantage.
systems to avoid degrading the --------------------------------------------------------------------------------------------
performance of everyday Source: The Data Warehousing Institute survey of 325 IT professionals, 2011
workloads. Now, that’s not always
the right choice, Nustad says. says. While John Hancock’s per help them monitor and manage
“With hardware and storage so unit cost for storage dropped by 2 healthcare expenses. It’s “a
cheap today, you can afford to to 3 percent this year, customer delight feature that was
juice up those operational consumption was up 20 percent. not demanded five years ago, but
systems to handle a BI layer,” she is demanded today,” she says.
says. By factoring out all the steps 4. Everyone’s Mobile For CIOs, addressing this
of moving, reformatting and Like nearly every other trend has more to do with
loading data into the warehouse, application, BI is going mobile. creating user interfaces for
analytics built directly on an For Nustad, mobile BI is a priority smartphones, tablets and touch
operational application can often “because everybody wants screens than it is about
provide more immediate answers. Nustad herself wants access to sophisticated analytic
Hackney observes, however, reports on whether her capabilities. Maybe for that
that although the price/ organization is meeting its service reason, Kellen dismisses it as
performance trends are helpful level agreements “served up on fairly easy to address. “To me,
for managing costs, potential my iPad when I’m very mobile that’s kind of trivial,” he says.
savings are often erased by and not at my desk.” She also Rotella doesn’t think it’s that
increased demands for capacity. wants to deliver mobile access to simple. “Mobile computing affects
“It’s like running in place,” he data for her firm’s customers, to everyone,” he says. “The number

[9] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

of people doing work off of iPads resources in an enterprise.” For estimates. That’s a way to
and other mobile devices is example, Verisk has developed “leverage our analytics and put it
exploding. That trend will products to give claims adjusters at the fingertips of the people that
accelerate and change how we access to analytics in the field, so need it,” he says.
interact with our computing they can run replacement cost What makes this challenging
is how much more quickly
technology changes, Rotella says.
With multiple device operating
Big Data: How to Get Started systems in play, “we’re trying to
understand how to best leverage
our development so we’re not
Ask blue-sky questions of your business such as “if only we knew…” or “if we
writing these things three, four,
could predict…” or “if we could measure…” Consider how useful that might be to
five times over,” he says.
the business before thinking about how it can be done or at what cost.
On the other hand, the
Play those questions off against the data you already have, data you could requirement to create native
collect, or data that you could get elsewhere. applications for each mobile
Include in your thinking structured transactional data, semi-structured logs platform may be fading now that
and files, and text-based or rich media content. the browsers in phones and
tablets are more capable, says
Incoming communications from your customers, outbound communications
Island One’s Ternent. “I’m not
to your customers, and what customers (or employees) are saying about you on
sure I’d invest in a customized
social sites can all be useful for monitoring sentiment, heading off issues and
mobile device application if I can
analyzing trends.
just skin a Web-based application
Consider high-volume streams such as telemetry, geolocation, voice, video, for a mobile device.”
news feeds, transactions, Web clicks, or any combination of these.
If your content is currently “digital landfill” spread across disparate file shares 5. Social Media in the Mix
and content systems, consider how this could be rationalized prior to any With the explosion of Facebook,
big data projects. Twitter and other social media,
-------------------------------------------------------------------------------------------- more companies want to analyze
Source: “Big Data: Extracting Value from Digital Landfills,” study by AIIM, 2012 the data these sites generate. New
analytics applications have

[ 10 ] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

emerged to support statistical


techniques such as natural
language processing, sentiment
Don’t Forget About Securing Big Data
analysis, and network analysis
that aren’t part of the typical BI Collecting all this data and making it more accessible also means organizations
toolkit. need to be serious about securing it. And that requires thinking about security
Because they’re new, many architecture from the beginning, says David Saul, chief scientist at State
social media analytics tools are Street, a financial services provider that serves global institutional investors.
available as services. One “I believe the biggest mistake that most people make with security is they
prominent example is Radian6, a leave thinking about it until the very end, until they’ve done everything else:
software-as-a-service product architecture, design and, in some cases, development,” Saul says. “That is always a
recently purchased by mistake.”
Salesforce.com. Radian6 presents Saul says that State Street has implemented an enterprise security framework
a dashboard of brand mentions- in which every piece of data in its stores includes with it the kind of
tagged positive, negative, or credentials required to access that data.
neutral-based on Twitter feeds, “By doing that, we get better security,” he says. “We get much finer control. We
public Facebook posts, posts and have the ability to do reporting to satisfy audit requirements. Every piece of
comments on blogs and data is considered an asset. Part of that asset is who’s entitled to look at it,
discussion board conversations. who’s entitled to change it, who’s entitled to delete it, etc. Combine that
When purchased by the with encryption, and if someone does break in and has free reign throughout
marketing and customer service the organization, once they get to the data, there’s still another protection that
departments who use them, such keeps them from getting access to the data and the context.”
-----------------------------------------------------------------------------------------------------------
tools may not require heavy IT
Thor Olavsrud, CIO.com, March 2012
involvement. Still, University of
Kentucky’s Kellen believes he
needs to pay attention to them. educating the right people,” he administrators learn earlier when
“My job is to identify these says. students are having academic
technologies, see what the match For example, monitoring trouble. IT developers should also
is for the organization in terms of student posts on social media build alerts generated by social
competitiveness, and start could help faculty and media analytics into applications

[ 11 ] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

Does Big Data = Big Spender?


for responding to those events,
he says.
While John Hancock’s efforts Big data is a powerful lure, but that lure can lead you into an expensive trap if you
in this area are “nascent,” don’t plan carefully.
according to Hackney, he “Big data has big spending risks,” says Jeff Muscarella, IT spend management
envisions a role for IT in consultant with NPI Financial. Muscarella warns that big data projects can easily
correlating the data provided by ring up seven-figure price tags after you finish paying for the hardware, software
a social analytics service with and services, and sometimes the glowing business cases presented by vendors lose
corporate data. For example, if their luster when you look closely. “A lot of times, when you pull them apart, they’re not
the social media data shows as rosy as they seem,” he says.
comments about the company in That’s not to say that harnessing the power of big data is a mistake, Muscarella
the Midwest are becoming more explains. But it does mean that organizations need to start by gathering real data
negative, he would want to see if on how a big data project will benefit the business.
the company has made price or “It’s new technology solving a business problem that we often haven’t proved,”
policy changes in that region Muscarella says. “The business is going to be coming to them with all sorts of half-
that might explain the trend. baked ideas for what they can do with big data. They have to ask: Will it really drive
Finding such correlations revenue? How, and for how long? What will it take to build it? They need to make sure
could make a big difference in they have a crisp focus on the mission; that it is going to have a return on
getting company leaders to investment.”
believe in the return on Muscarella recommends starting with open-source tools like Apache Hadoop and
investment of social media, build a test case. “Pick something that’s manageable. Start on a small scale to prove
Hackney says. your hypothesis.”
“In my industry, everybody’s “Don’t get trapped into building the infrastructure yet,” he adds. “Prove it first and
an actuary, everyone’s looking then go back and architect your solution. Assume that however you solve the problem,
for the numbers—they don’t you’re probably going to throw it away and start over. That’s OK because at least you
take anything on belief.” proved the business need before you spent a lot of money.”
------------------------------------- --------------------------------------------------------------------------------------------
David F. Carr, CIO.com, March 2012 Thor Olavsrud, CIO.com, March 2012

[ 12 ] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

Big Data 101: firm McKinsey and Company


anticipated in a recent report.
There is an air of inevitability

What CIOs Should Know with Hadoop and big data


implementations, says Eric
Baldeschwieler, chief technology
officer of Hortonworks, a Yahoo
1. You will need to think spinoff company that offers a
DEFINITIONS
about big data. Hadoop distribution. It’s
Big data analysis got its start from applicable to a huge variety of
“Big data analytics is the
the large Web service providers customers. Collecting and
application of advanced analytic
such as Google, Yahoo and analyzing transactional data will
Twitter, which all needed to make techniques to very big data sets.” give organizations more insight
the most of their user generated -------------------------------------------- into their customers’ preferences.
data. But enterprises will big data Source: “Big Data Analytics” study by It can be used to better inform the
analysis to stay competitive, and The Data Warehousing Institute, 2011 creation of new products and
relevant, as well. services, and allow organizations
You could be a really small “Big data is a new generation of to remedy emerging problems
company and have a lot of data. A technologies and architectures more quickly.
small hedge fund may have designed to extract value
terabytes of data, says Jo 2. Useful data can come
economically from very large
Maitland, GigaOm research from anywhere (and
volumes of a wide variety of data everywhere).
director for big data. In the next
by enabling high-velocity
couple of years, a wide number of You may not think you have
industries—including health care, capture, discovery and/or petabytes of data worth analyzing,
public sector, retail, and analysis.” but you will, if you don’t already.
manufacturing—will all -------------------------------------------- Big data is collected data that used
financially benefit by analyzing Source: IDC, 2012 to be “dropped on the floor,”
more of their data, consulting Baldeschwieler says.

[ 13 ] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

Big data could be your server’s the Internet of Things, in which that todays analysts versed in
log files, for instance. A server cheap sensors are connected to business intelligence tools may
keeps track of everyone who Internet, offering continual readily know.
checks into a site, and what pages streams of data about their usage. Such people may be in short
they visit when they are there. They could come from cars, or supply. By 2018, the United States
Tracking this data can offer bridges, or soda machines.”The alone could face a shortage of
insights into what your real value around the devices is 140,000 to 190,000 people with
customers are looking for. While their ability to capture the data, deep analytical skills as well as 1.5
log data analysis is nothing new, analyze that information and million managers and analysts
it can be done don to dizzying drive business efficiencies,” says with the know-how to use the
new levels of granularity. Microsoft Windows Embedded analysis of big data to make
Another source of data will be General Manager Kevin Dallas. effective decisions, McKinsey and
sensor data. For years now, Company estimated.
analysts have been speaking of 3. You will need new Another skill you will need to
expertise for big data. have on hand is the ability to
When setting up a big data wrangle the large amounts of
Big Data Drivers analysis system, your biggest hardware needed to store and
hurdle will be finding the right parse the data. Managing 100
Analysis of…. talent who knows how to work servers is a fundamentally
1. Operational data the tools to analyze the data, different problem than handle 10
2. Online customer data according to former Forrester servers, Maitland pointed out.
3. Sales transactions data Research analyst James Kobielus. You may need to hire a few
4. Machine or device data Big data relies on solid data supercomputer administrators
modeling. Organizations will from the local university or
And… have to focus on data science, research lab.
5. Service innovation Kobielus says. They have to hire
-------- statistical modelers, text mining 4. Big data doesn’t require
Source: IDC survey of 2,699 data professionals, people who organization beforehand.
professionals, 2012 specialize in sentiment analysis. CIOs who are used to rigorously
This may not be the same skill set planning out every sort of data

[ 14 ] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

losing some of the Recently legal research giant


Top 5 Big Data Challenges granularity,” he says. LexisNexis, no slouch at big data
“Later on, if you change analysis itself, open-sourced its
1. Deciding what data is relevant your mind, or want to own platform for analysis, HPCC
do a historical analysis, Systems. MarkLogic has also
2. Cost of technology infrastructure you’ve limited yourself.” outfitted its own database for
3. Lack of skills to analyze the data “You can use a [big data unstructured data, the MarkLogic
repository] as a Server, for big data style jobs as
4. Lack of skills to manage big data projects dumping ground, and well. Another tool gaining favor is
5. Lack of business support run the analysis on top the Splunk search engine, which
--------------------------------------------------------- of it, and discover the can be used to search and
Source: IDC survey of 2,699 data professionals, relationships later,” analysis data generated by
Norris says. Many machines, such as the log files
organizations may not from a server. “Whatever data
that would go into an Enterprise know what they are you can extract from your logs,
Data Warehouse (EDW) can looking for until after they’ve there is a good chance that Splunk
breathe a little easier with big culled the data, so this kind of can help,” notes Curt Monash of
data setups. Here, the rule is, freedom “is kind of big deal,” he Monash Research.
collect the data first, and then says. -----------------
worry about how you will use it Joab Jackson, CIO.com, May 2012
5. Big data is not only about
later.
Hadoop.
With a data warehouse, you
When people talk about big data, VERBATIM
have to lay out the data schema
before you can start laying in the most times they are referring to “Big data is not just big. It’s
data itself. “This basically means the Hadoop data analysis also diverse data types and
you have to know what you are platform. “Hadoop is a hot button streaming data.”
looking for beforehand,” says Jack initiative, with budgets and -------------------------------------
Norris, vice president of people being assigned to it” in Source: “Big Data Analytics” study
marketing for MapR. As a result, many organizations, Kobielus by The Data Warehousing Institute,
“you are flattening the data and pointed out. Ultimately, however, 2011
you may go with other software.

[ 15 ] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

Coping with the


Big Shortage
Of Big Data Talent
Just as historical reports alone 1.5 million
aren’t sufficient for making managers and
corporate decisions—executives analysts who talent shortage says Boris
want business intelligence to know how to use analysis of large Evelson, analyst at Forrester
identify current and future trends data sets to make effective Research. “Every single client I
—IT staffers need to know more decisions. talk to tells me they are struggling
about BI than how to run a data “We see our BI leader as being with finding and retaining BI
warehouse or build a dashboard. the catalyst to drive our talent.”
That puts CIOs in a bind, organization away from pure To fill the gap, CIOs are
according to industry experts, historical reporting to true competing for workers with
who have raised alarms about a inferential analysis, says Greg strong math skills, proficiency
data analytics skills deficit. Meyers, vice president of global working with massive databases
For example, a report released IT at Biogen Idec, a $5 billion and with emerging database
last spring by the McKinsey biotech company. “This is both a technology as well as with
Global Institute predicts that by technical and change expertise in search, data
2018, the United States could lack management challenge.” integration, and other areas such
140,000 to 190,000 workers with Yet despite continued high U.S. as business knowledge, Evelson
deep analytical skills and another unemployment rates, there’s a BI says. In fact, he says, business

[ 16 ] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

“The big deal is


demand and pay happen for analytics/statistics/
levels, pins the BI,” Foote says. “Fill the pipeline
accumulating gloomy outlook for BI with students eager to enter the

knowledge of best
talent on a low supply field and focus on careers.”
of young workers in Unfortunately, academic
practices and roles such as
architects, modelers,
credentials, like a class or even a
related degree, go only so far.
lessons learned integrators, analysts Qualified workers require several

from successful and and developers. The years of experience to understand


finding is how to deal with “real world” BI
failed preliminary, says challenges. “One can learn the

implementations.”
David Foote, the technical skills needed for BI in a
company’s co- six-month class; that’s not a big
founder, CEO and deal,” Evelson says. “The big deal
chief research officer, is accumulating knowledge of
knowledge, such as but he suggests that one problem best practices and lessons learned
understanding processes, is that many colleges and from successful and failed
customers and products, “is at universities haven’t yet risen to implementations.”
least equally as important as the the challenge of teaching the
tech skills.” skills that are potentially needed Wanted: Real World
IT leaders are thinking about for analytics jobs. Analytics Experience
how to get the needed analytics He cites the need for So IT executives are scouring the
talent now as well as developing government and industry country to find people with the
the pipeline for technologists partnerships with academia, such data analytics skills they need
with the right skills for the future. as the U.S. Cybersecurity now. Douglas Menefee is CIO at
Challenge, that use online Schumacher Group, a privately
Business Analytics competitions and incentives to held company that provides
Education Gap attract students to possible emergency room management
Foote Partners, a research and careers in information security. services to hospitals nationwide.
advisory firm that tracks IT skills “The same sort of thing needs to He wants to hire database extract,

[ 17 ] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

transform and load (ETL) as well as people who want to


developers, as well as The Most Difficult Big Data move back to Louisiana.
presentation analysts “who know Skills to Find: Schumacher Group has also taken
how to tell a story with data.” advantage of job recruitment,
1. Advanced analytics,
“Both need to have very strong hiring and training services
critical thinking skills and need to predictive analytics provided through the Louisiana
be able to draw on asking the hard 2. Complex event processing state government’s FastStart
questions,” Menefee says. The 3. Rules management workforce program.
ETL developers need “strong Like Menafee, Meyers at
4. Business intelligence tools
math and logic matching skills,” Biogen Idec wants to hire staff
he says, while the presentation 5. Data integration with data warehousing, ETL and
developers “need to be able to use ---------------------------------------- reporting experience. That’s
right and left brain thinking”-in Source: IDC/Computerworld, 2012 relatively easy, he says. But he
other words, be both logical and also wants these workers who
creative. “We want them to use know how to elicit details from
creativity to tell a graphical story.” such as when the company was users about the metrics they use
Because projects change very building a BI center of excellence to make decisions and the
quickly depending on the “fire of and needed architecture and unanswered questions they have
the day,” Menefee says, design expertise. Hitachi that data can help them to find.
Schumacher looks for individuals Consulting “worked with us a Identifying them is more
who also are experienced with couple of years until we were challenging. He thinks he can
agile development and can adapt [internally] staffed,” Menefee recruit them from other
to change easily. says. companies, where they’ve done
The hiring cycle takes three to Adding to Menefee’s challenge similar work.
six months, sometimes is the company’s Lafayette, La.,
preventing the company from location. It’s hard to convince Planning for the Future
moving as quickly as it would like people who aren’t already Meyers says he can find people to
on projects, Menefee says. If familiar with Lafayette to relocate. staff current projects. “These skill
necessary, consultants fill the gap, He concentrates on hiring locals, sets might have been fine for the

[ 18 ] cio.com Data Analytics | 2012


A STRATEGIC GUIDE FROM THE EDITORS OF CIO

“We want them to use between what is with technology skills and teaches

creativity to tell a
being taught and them about the business.
what our business “Understanding the business

graphical story.” needs are,” he says.


He’s also working
is the more complex side of the
equation, so the best way to
with the school to develop those skills is to expose
develop an technology-oriented people to the
past several years, but to truly use internship program. business and put them in the field
BI as a competitive advantage you Pacific Coast Companies and let them learn,” O’Dell says.
have to focus decision support on provides business and IT services --------------------------------------------
predicting the future—not simply to a dozen subsidiaries that Bob Violino, CIO.com, March 2012
reporting the past,” he says. supply building products and
Menefee adds: “Our next related services. CIO Mike O’Dell
VERBATIM
generation of skills will be needs workers with statistical and
blended heavily on the business analytical skills as well as “The application of big data
side with statistical modeling and knowledge of economics “and the technology will fall into two
quantitative analysis.” However, wisdom to understand causal primary categories: doing
he adds, “these skills probably relationships in the data,” he says. more efficiently tasks that
won’t live within IT.” “The types of projects we are have been done for years; and
He is making an effort to working on are focused on doing completely new things
develop a compatible IT making the people we have more
that were never before
workforce in part by working effective, from the executive to the
possible, driving up long-term
with the University of Louisiana, salesman to the shop floor.”
which graduates 70 students The need for business skills, in strategic organizational value.
annually with computer science fact, is driving O’Dell’s staffing Identify opportunities to apply
master’s degrees. He serves on a and recruitment strategy. He taps big data to both.”
course curriculum committee at business people internally and -------------------------------
the university’s Lafayette campus, teaches them technology. He also Source: IDC, 2012
which gives “feedback on gaps recruits local college students

[ 19 ] cio.com Data Analytics | 2012