Beruflich Dokumente
Kultur Dokumente
Contents
Introduction
The most successful businesses understand “Notching small analytics victories may not be The 2018 Gartner CIO Agenda: Mastering the
that data is the currency of their digital enough. For leaders with their eyes on the prize, New Job of the CIO, shows “BI/Analytics” as
transformation initiatives. At the core of many it’s all about connecting analytics capabilities the #1 technology “expected to help businesses
of these initiatives, is a focus on using data across the enterprise,” reports Deloitte in its differentiate from their competitors” from over
and analytics to deliver business insights that “Analytics Trends 2016” research.1 3,000 CIOs surveyed.2 BI/Analytics has been a
provide better customer service, business top priority for many years. The fact that it has
results and competitive differentiation. More “Leaders are beginning to take serious steps been a top priority for years tells us that the job
often than not, success depends on first toward connecting these successes to create is as yet unfinished. And data management is
investing in a data management architecture something bigger.” often the bottleneck in business value delivery.
as the basis of your new business initiatives.
05
But while businesses are applying such strategic Analytics. For simplicity’s sake, in this eBook This encompasses technology and processes
importance to analytics, a full 86 percent of we’ll refer to these individuals as CAOs. to ensure business decision makers, data
executives consider their organizations to be scientists, and analysts have access to shared
“at best only somewhat effective at meeting the The details of the CAO’s role are still being fully |data that is drawn from internal and external
primary objective of their data and analytics defined at many organizations. But by and large, sources, while also making the data trustworthy,
programs,” according to McKinsey 3. More than CAOs are charged with aligning the analytics timely, and easy to discover and access
one-quarter say they’ve been ineffective. strategy across the enterprise with the business for analytics.
strategy. They’re responsible for making sure
What’s more concerning is that this report priorities are matched and the analytics initiatives
also identified data management as the #1 directly deliver business value. They do this by: 86% of organizations say
technical bottleneck to delivering value. they have been only some-
• Leading multidisciplinary teams with what effective with their data
To ensure businesses get better at using data senior executive peers across business and analytics initiatives.
and analytics to drive business transformation, functions and IT to make data and analytics
many companies are appointing a dedicated, a competitive advantage for the organization.
accountable C-suite executive to lead their
• Leveraging new analytics technologies More than one-quarter say
initiatives: the Chief Analytics Officer (CAO).
such as data catalogs, Hadoop, predictive their organizations have
analytics, AI, machine learning, or streaming been ineffective.
While analytics leaders in some companies
data to accelerate business value delivery.
may not have that exact title, they may also
be known as the Chief Data Officer, VP of
Key to any analytics success (and ultimately
Business Intelligence, or VP/ Director of
driving successful data-driven digital trans-
formation) is the deployment of an intelligent
enterprise-wide data management strategy.
07
This eBook is about using these tools and Enterprise data management Data self-service
processes to build three foundational pillars To make all data available to every analytics New business initiatives require fast time to
of intelligent, enterprise-wide analytics: user—even data that is traditionally locked in market and one of the critical requirements
application silos. This includes external data, is to empower a new class of business users,
partner data, streaming data or data that is in analysts, and data scientists to access and
the cloud or new analytics applications. The manage data without IT assistance.
most interesting analytics insights often come
from combining data from widely disparate In this eBook, we won’t touch on many of the
sources. Starting with an enterprise data catalog organizational, political, and cultural challenges
that enables the discovery of data across your that are no doubt on your mind.
entire organization is a great place to start.
It’s not that these concepts are unimportant,
Data governance but our focus will be to show you how a logical,
To manage data as an asset and ensure that collaborative, and scalable approach to data
every analytics user has trusted data that is management can help you overcome some of the
“fit for purpose” for their analytics needs. This most important business analytics challenges
includes managing data’s meaning, business and take advantage of new analytical possibilities.
context, and quality over time.
Let’s dive in.
08
Part One
Your organization is practically swimming in It’s why it makes sense for your analytics The Next-Generation Integration
data. Much has been written about the growth initiative to approach data and data manage- Competency Center
in data volume, variety (complexity), and ment from an enterprise-wide perspective.
velocity. Not only is your data growing, but To better support the increasingly complex
industry analysts estimate that 50 percent To be sure, this doesn’t mean implementing needs of the business, the traditional
is coming from external sources 4 , which processes to manage all your data at once. Integration Competency Center must
means that you may not know the data structure, Instead it’s about starting with the right analytics shift to become an Enterprise Data
quality level, or meaning and business context – projects and logically building toward a state Competency Center. The key is to connect
making data management that much more where you have a consistent, repeatable, and technical data competency to business
challenging. trusted way to deliver data for all of your analytics value. By broadening “Integration” to
initiatives and users. “Information”—the data that drives
But if business users need to find answers business—and focusing on true business
to questions like “which customer should a Turning Integration Competencies into transformation competency, you can
sales rep call on and what should they say”— Information Competencies better position and manage the business
questions that touch multiple lines of business Many organizations have created an Integration context and value of information for
and teams—they’re going to need to access Competency Center (ICC) which is focused on competitive advantage.
data sources scattered across silos within creating common standards and practices for
your company, as well as data sources from data management. Read our blog series about the Next-
outside. This is arguably the defining challenge Generation ICC to find out more.
of the CAO role—to look beyond existing
organizational boundaries and “integrate”
the enterprise and outside data.
10
Your current ICC, if you have one, is most likely Inventorying Your Data Assets Aiming for Hybrid and Multi-Cloud Data
focused on best practices for IT. The challenge In large organizations that have many application Management
going forward will be to extend this to solve data- and analytics systems, it’s often hard to know As you plan your analytics and data management
related issues that affect the whole business– what data you have today and where it all resides, strategy, one very important option to consider is
and engaging business experts in the manage- let alone how it’s used. In such cases, the first cloud deployment. The advantages include:
ment of the data. challenge is to do a comprehensive data discovery
of all the data evaluable to the enterprise, whether • Faster time to value: For example, you can get
Take the following steps to get started: internally or externally sourced. This calls for an results from new analytics initiatives faster
• Inventory your data assets to find out what data enterprise-ready data catalog. when you stand up a new cloud data warehouse
you have today. in hours rather than quarters.
By standardizing the management of business
• Conduct a data maturity assessment to find • Flexibility: The cloud offers the ability to quickly
and technical metadata on a single data manage-
out how standardized and effective your data scale up and down as your business requires.
ment platform, you can then start to see how data
management is across the organization.
moves between systems and what that means • Cost: Cloud options typically provide the
• Prioritize your analytics initiatives so you start for different lines of business. advantages of pay-as-you-go and OPEX versus
with the ones likeliest to have the greatest CAPEX.
business impact.
• Innovation: The rate of analytics technology
• Find out what internal and external data is innovation is greater in the cloud and easier
required, but missing to accomplish these to “try out.”
initiatives.
As you consider your multi-cloud and hybrid There are three important options to consider:
options, you will also have data management
challenges. And it will pay to plan for them early • On-premises data management capabilities
in the process. You’ll need to plan for migrating with connectivity to cloud data sources. Best Practices
data to new cloud analytics systems and
• On-premises data management capabilities
applications, while also maintaining the quality
that are hosted in the cloud. This provides an
of data in the new system.
“exact-same” experience for IT developers
while delivering some of the advantages of
You’ll also need to ensure the new system’s data Skills Code
cloud for data management.
is synchronized with other on-premises and
cloud systems so that all users have access to • An Integration Platform as a Service (iPaaS).
the best and most current data. This provides integrated, cloud-based data
management services with an easy-to-use
Most larger organizations have a significant interface for “citizen integrators.” iPaaS is
investment in on-premises applications, data rapidly growing beyond simple data integration
warehouses, and analytics that will mostly solutions to become end-to-end data
stay in place and be augmented by new cloud management solutions.
systems. So when it comes to selecting a data
Maximum flexibility and productivity
management platform, it will be important to Whichever options you choose, you should
choose a platform that spans on-premises, look for an environment where skills, code,
cloud, and big data anywhere. and best practices can be shared across the
environments for maximum flexibility and
productivity.
12
What is Integration Platform as a Service (iPaaS)? The traditional definition on an iPaaS is that it Leading iPaaS solutions are built to serve the
Today, iPaaS is a set of cloud services that provides delivers cloud integration services (including data diverse needs of new types of users, particularly
a single solution to manage data integration, integration and application integration services business users. They provide simple governance
application integration, and process integration for batch and real-time scenarios), native connec- for IT, reusable logic for line of business developers
with a very user-friendly user interface. It powers tivity, a robust API integration framework. A newer and mobile application development teams, and
development, execution, and governance of these generation of iPaaS is emerging. It includes the ease of use for business users, for example.
integration pattern between on-premises, public data integration capabilities of the current iPaaS
cloud, and private cloud applications, databases, and expands into a broader array of data manage- Put simply, iPaaS is a cloud-based solution that
and other data sources. ment services that provide an integrated, end- enables enterprises to rapidly execute any
to-end solution. Some of the new capabilities integration pattern, logically manage any data, and
include: master data management, big data address the new use cases that are emerging in the
management, data quality, test data management, multi-cloud and hybrid data management world.
and data security. This enables a broader range
of users to productively engage in the full
lifecycle of data management.
13
Technological Imperatives for Enterprise Data From a data management point of view, consider • Data integration: So you can connect disparate
Management the following technologies: data sources with faster batch loads or in
As you scale up processes for data management real time.
• AI / machine learning: So that the platform
to meet the requirements of enterprise
can make intelligent recommendations and • Data quality: So you can automate rules for
information management, you and your team
automate data management tasks. data quality at scale and deal with exceptions
need to be able to get more done with minimal
and anomalies as they arise.
budget increases. So, it is vital that your platform • Metadata management: So that the platform
has intelligence to automate tasks as much collects and manages technical metadata, • Master data management: So that data from
as possible. business metadata, operational metadata, across many systems and data sources can
and usage metadata for maximum intelligence be reconciled and managed to provide a single,
and data visibility. This provides data visibility trusted, 360-degree view of a business entity,
and also is the “fuel” for automation via such as a customer, or product.
machine learning.
15
Part Two
We’ve all been in meetings where people show Driving business meaning and context: In order
dashboards with conflicting data. Or, even for data to be usable for any analytics user it
worse, a compelling analysis is delivered but needs business meaning and context. This
management does not trust the data enough includes metadata such as terms, definitions,
to act on it. That is where the importance of data stewards/owners, data domain, data
data governance comes in. policies, etc.
Data governance is all about managing data as Engaging different departments: In order for
an asset. IT is part of the solution, but it can’t a CAO to be successful, you need to operate
do it alone. In fact, data governance must be across many departments in both business and
approached as a collaboration between business IT. One of the critical interfaces is with the data
experts and IT technologists. Success will come governance body within the organization. This
from enabling this collaboration from the process is fundamental to making data useful
beginning of your data governance initiative. for analytics purpose.
IT has the technical knowledge, but only the
business knows the critical meaning of data, the
business context, the relative priority of data
to be managed, and how to define data quality
metrics to determine if the data is trustworthy
or not.
17
We’re often asked how to go about implementing Cleveland Clinic: Data Governance as In addition, they formed an advisory
data governance for analytics projects that a Foundation for Predictive Analytics council that collected input, feedback,
are well underway. A great example of an and concerns from a large cross-
organization that has done this is the Cleveland Cleveland Clinic is a non-profit health- section of the organization. Insight
Clinic, featured in the sidebar. care leader that specializes in heart gained from the multidisciplinary
and brain healthcare. Cleveland Clinic advisory council was incorporated
Maintaining procedural hygiene and data quality: wanted to make the transition from into the data governance council,
Data quality erodes, on average, between 1 and traditional business intelligence where decision-making occurred. This
1.5 percent every month 6 when it isn’t actively reporting to predictive analytics. ensured quicker decision making while
managed. Data governance includes assigning maintaining close alignment to the
data steward owners, processes, and policies To accomplish this, Cleveland Clinic opinions and voice of the customers.
needed to ensure that data is ready for any began an Enterprise Information Win-win.
analytics use case. This does not mean that Management and Analytics (EIM& A)
the data always has to be perfect. For rapid initiative focusing not just on tech- The result: EIM&A laid the foundation of
exploration and faster innovation data may nology but also around four pillars: data, data governance which is increasingly
be more important than perfect data. But for people, process, and technology. A key enabling the delivery of timely, trusted
critical business processes and decisions, it is component of the data domain is data data, Cleveland Clinic can now expand
important that the data is trustworthy. governance. on its advanced analytics, such as
forecasting operating room activity
It’s important to note that a completely manual The focus on governance led to the eight weeks out for over 100 operating
approach to data quality is costly to scale. So it establishment of a council comprised of rooms, increasing efficiency and
pays to invest in automating data quality rules executive leaders, senior stakeholders, enabling better resource planning.
that are shareable and repeatable with minimal clinical representation, and a newly
human intervention. created Senior Director of Data Read all about Cleveland Clinic’s impres-
Governance position. sive journey toward predictive analytics.
18
In the same way that analytics initiatives fail Technological Imperatives for Data • Business glossary: So business subject matter
when they try to encompass more than they Governance experts can create, manage, and share the
were designed for, data governance should A critical starting point for choosing the tech- business meaning and context of data.
also follow a “start small, prove value, scale nology for data governance is that it must
• Data security: So you can ensure a policy-
fast” approach. Depending on the data you include built-in business-IT collaboration
based approach to data access with data
need to kick-start your analytics program, as a fundamental capability. Some other
masking to obfuscate sensitive information,
focus on one area that needs governance capabilities include:
application security to proliferate policies
and then prove its value before you scale it.
within applications, and encryption to protect
• Data quality: So your governance team can
data where it lives.
To read more on data governance for a world set metrics and track data quality trends on an
of next-generation use cases driven by ongoing basis, remediating anomalies and
data-driven digital transformation, see the data quality problems as they arise.
eBook “Just Enough Data Governance.”
• Metadata management: So IT stakeholders
can have an understanding of the data
integration environment, the data flows, and
how data is being transformed.
20
Part Three
Providing business analysts with self-service There needs to be a plan for providing data You will probably also need to do some data-
business data is one of the newer requirements that is appropriate for the business use case, centric security work so you can identify
for CAOs. However, due to the data complexities making that data easily available for rapid sensitive data. Also, you may be creating new
that we’ve described in this eBook, IT has iteration and innovation, and ensuring the sensitive data by joining different data types.
struggled to deliver data at the quality level and data itself is reliable enough given the types of In which case, you may need data masking to
in the time frame required by the business. business decisions it will be used for. obscure personally identifiable information.
As a result, businesses are increasingly moving In some cases, raw data might be just For additional reading on the subject of data
toward a data-driven self-service model where what your scientists and analysts need for self-service for Tableau users see “Developing a
IT makes data available to business analysts exploration. But more often than not, you’ll need Governed Self-Service BI Strategy.”
and data scientists who can do their own data some data management to make sure users
discovery and data prep. That’s led to far less spend less time in data prep—a process that
IT involvement (and cost) and much greater can take as much as 80 percent of their time 7.
business agility, particularly in the area of
analytics. For instance, everybody will need some data 80% of an analyst’s time can
cleansing, but they will also need ways to join be taken up by data prep.
But self-service isn’t dumping a load of raw data sets to provide interesting and useful
data into a repository such as a data lake and insights. That will take some pre-planning in
hoping for the best. terms of data structure, tags and keys.
22
This isn’t to say you need to make sure everyone enough data quality depending on the use This may require some data preparation work
has access to fully conformed, penny-perfect case—for instance, light data preparation for by IT, but the idea is to do the minimal and
data. In fact, when it comes to self-service, the experiments and strong data preparation for appropriate level required for the intended use.
priority should be on delivering data that is “fit operationalized queries. This will reduce IT costs, and speed up analytics
for purpose.” delivery.
What Self-Service Users Need
• For innovation, perfect data is often not A TDWI report 8 found that business-side users
required. Often the data only has to be good prioritized the following four tasks as things
enough to see if the question is worth exploring they need to be able to do on their own:
in more detail. The priority here is to ask
questions quickly, iterate, and find the useful • Discover data: To use a data catalog that
questions as quickly as possible. provides visibility into all enterprise data
wherever it may reside.
• For critical business decisions and business
processes the data will typically have to be • Prepare data: To make changes to the data
more trustworthy—in direct relation to the without permanently changing IT’s core data
business impact of not having good data. assets.
o
24
For Business Users reuse existing work is to intelligently recommend developers from these changes by abstracting
A crucial consideration when selecting data relevant “recipes” that will get the job done rules away from the underlying technology.
visualization, data catalog/discovery, data for them.
preparation, and data integration tools for self- That way, new data and tools won’t break your
service is user experience. In particular, less For IT Developers environment and cause expensive and time-
technical users need intuitive interfaces that Many studies have shown that a good GUI- consuming rework.
make it easier to interact with the data. based, no-coding development environment is
5X or better more productive than hand coding.
For instance, the most popular data prep tool Look for tools that:
used by business analysts today is Excel. The
better new data prep tools offer an Excel-like • Are easy to learn and use.
interface that feel familiar to these users. Also,
• Have a large base of trained practitioners who
cloud-based integration tools like “Integration
can be hired.
Platform as a Service” or “iPaaS” make it easy
enough for moderately-technical, “citizen • Enable code reuse, skills reuse, and sharing.
integrators” to manage data.
• Work across the widest variety of analytics
use cases.
Another important factor to consider is how
repeatable your users’ experiments and • Shield the organization from technology change.
explorations are. Data management platforms
should enable users to discover, share, and The last point is crucial—data types are 5X more productive
reuse data prep “recipes” created by others changing and so is analytics technology. Your
before them. The best way to get your users to data management tools must protect your
25
A large North American automotive conglo- The four different layers are: Projects
merate with 24 business units needed to The most tightly focused of the four layers;
overcome its siloed approach to data so Raw data is stored and managed here with specific
different business units could run their own For data that’s pulled in right from the source, projects and use cases in mind.
experiments based on a common view of unchanged. In many cases, it can actually
customer relationships. leverage this raw data as-is, directly from the By dividing the platform up into these four layers
source. of data quality, the company has ensured
So it pioneered a big data platform capable of developers and analysts can focus on different
supporting both a “big data lab” and a “big data Published users and use cases, while IT has the common
factory.” For lightly modified data that’s streamlined for foundation it needs to then quickly operationalize
use by the business. The data in this layer is successful experiments. It’s a great solution
Using Hadoop, the team built four separate data cleansed and where appropriate, masked, while to the trade-off between speed of delivery and
staging environments or ”layers” for its data also reflecting the latest changes from source quality of data.
based on the amount of data preparation work systems.
that had been done. In this way the environment Read “From Lab to Factory: The Big Data
provides “fit for purpose” data: less prepped Core Workbook” to learn how to power experimen-
data for rapid iteration and fully-prepped data Here, business units can build new metrics, tation and rapid operationalization with a
for critical business uses. assets, and business rules to apply to the data. single platform for big data management.
For instance, the business units can build
reusable metrics that join customer data with
inventory data.
26
• Reusable transformations: So all users can Read our book “The Marketing Data Lake.”
leverage a common set of transformation
patterns and create “recipes” that make their
data prep work more reproducible.
27
When it comes to building out an intelligent Drive increased data management productivity Abstract data management away from
data management architecture that can The goal here is to make the data management underlying technology
support your business initiative’s goals, it’s processes more responsive to the needs of the Data formats (or the lack thereof), analytics
important that you decide where to start business—for both IT and business users. This technology, and applications will continue
based on your company’s specific needs means less time doing data prep, and more to change at a rapid pace. It is essential that
and capabilities. time sharing, automating, and delivering your development environment protects your
actionable insights. developers from changes in the underlying
But whatever you decide, it’s crucial that you technology.
build an architecture that allows you to do And crucially, your data management platform
the following: needs the ability to capture and store the steps For instance, a change from MapReduce to
(“recipes”) that users create as they are doing Spark, should not break your data integration,
data preparation work. The platform should data quality, or data security logic.
make these steps discoverable and shareable by
other users to accelerate productivity. It should Abstraction allows you to maintain your data
also take these steps back into IT and easily mappings and transformations, so you can
and quickly operationalize them into repeatable leverage new analytics platforms and technolo-
processes that IT can run for the business. gies as they emerge—without having to reinvent
the wheel. Your data management platform
These types of capabilities will give business must connect to anything so your architecture
users the flexibility and speed they need while can evolve and grow over time without creating
at the same time increasing the overall productivity new “islands of data.”
of the organization.
29
Leverage all available skills Centrally manage metadata: When your data
When technologies such as MapReduce and management platform uses a common
Spark first emerge, they can become difficult repository of technical, business, operational
and expensive to find analysts and developers and usage metadata, you ensure IT and
who have expertise with these them. By business analysts know where the data came
abstracting the data management away from from and what it means. It also provides the
those technologies, you ensure that your training sets to enable AI-driven intelligence
developers can always use their existing data in the platform that will result in intelligent
management assets and skills to keep working recommendations and automation of data
on the data. Even better, nothing will break management tasks.
when moving data management “code” to a
new technology. Think hybrid: You need your data management
platform to span on-premises, cloud and big
As a result, you aren’t constantly searching data anywhere in order to make sure you can
for new hard-to-find and expensive talent. And work with any data and support any analytics
you aren’t forcing your analysts to spend too use case.
much of their time on data preparation instead
of analysis.
30
Conclusion
Advanced analytics will change the way But as McKinsey found, most businesses’ data Data management plays a crucial role in
enterprises make decisions—one might and analytics programs are still only “somewhat delivering that vision. So we hope the best
justly argue that they already have. Analytics effective”. Clearly there is still a lot of work to be practices we’ve shared here show you how to
spending remains the top CIO spending priority done. And a lot of that work, as the McKinsey take advantage of emerging technologies,
for the sixth year in a row.9 study has found, is in the area of data management. current enterprise realities, and most
importantly—the analytical talent distributed
Before the vision of a dashboard on every desk across your enterprise.
and data-driven decision-making at scale can
become a reality, enterprises have to lay the right Improve the productivity of your people, the
data management foundations. reproducibility of their work, and the reliability
of the data they use, and there’s no telling how
It’s why the role of the Chief Analytics Officer much they’ll be able to innovate.
is so crucial. And it’s why the decisions you
make around data are critical to the success of
your organization. By laying the groundwork for
enterprise data management, data governance,
and self-service, you’re empowering everyone
in your organization to take advantage of its
institutional knowledge.
32
Further Reading
READ MORE
34
Sources
1 “Analytics Trends 2016: The Next Evolution,” 6 “Data Migration Customer Survey,” Bloor.
Deloitte, 2016. February 24, 2014.
2 Gartner, Mastering the News Business 7 “For Big-Data Scientists, ‘Janitor Work’ is Key
Executive Job of CIO. Insights from the 2018 Hurdle to Insights,” New York Times.
Gartner CIO Agenda Report. 29 September 2017. August 17, 2014.
3 “The need to lead in data and analytics,” 8 “TDWI Best Practices Report: Emerging
McKinsey & Company. April 2016. Technologies For Business Intelligence,
Analytics, and Data Warehousing,” TDWI.
4 “CIOs are underprepared for data-driven October 1, 2015.
business,” Computer Weekly. March 12, 2015
9 “BI & Analytics Top Priority for CIOs in 2016,”
5 “IT budgets 2016: Surveys, software and Business Intelligence Solutions Review.
services,” ZDNet. October 1, 2015. March 11, 2016.
About Informatica
We invite you to explore all that Informatica has to offer—and unleash CONTACT US
the power of data to drive your next intelligent disruption.
IN18-0318-3255
© Copyright Informatica LLC 2018. Informatica and the Informatica logo are trademarks
or registered trademarks of Informatica LLC in the United States and other countries.