Beruflich Dokumente
Kultur Dokumente
OVUM VIEW
Summary
The highlights of Informatica's recent 9.1 platform release target Big Data integration, self-service, upgraded data quality, master data management (MDM), and data service capabilities. It provides solid functional updates to what is already a rich and ever-broadening data integration platform. The Informatica platform already supported data movements with Hadoop through partnerships with Cloudera and EMC, but the new release adds direct, bidirectional connectivity between Informatica and Hadoop, tapping an emergent use case for customers seeking the raw power of this NoSQL target. The 9.1 release also adds new connectors to social networks, supporting the increasingly popular use case of social media analytics.
behaviors from social media data. That, of course, presents Informatica with an opportunity to apply its data integration, profiling, and quality know-how directly to Big Data sets and processing environments to enrich data sets as well as master data. Not surprisingly, Informatica is calling Big Data the next big growth opportunity for its business, with 9.1 the first stab of many. However, Ovum believes focus on Big Data is a natural corollary to the company's last stated big growth opportunity the Informatica cloud as both a data source target and a platform on which to host its products. A big part of Big Data will be driven by enterprises seeking to build hybrid architectures that store and integrate data residing in onpremise systems and in the cloud.
Informatica PowerExchange provides the technical foundation for 9.1's Big Data play
Informatica supports Big Data in two ways backing both Hadoop and non-Hadoop processing platforms and it is doing so largely through its PowerExchange family of data access products. In May 2011 the company announced support for EMC Greenplum's distribution of the Hadoop file system. The 9.1 release builds on this by adding a new PowerExchange for the Hadoop Distributed File System (HDFS) connectivity tool, which augments Big Data processing by moving enterprise data into Hadoop clustered environments for highly scalable parallel processing and out to targets (such as data warehouses) for consumption and analysis. The benefit is being able to reuse existing Informatica development skills in Hadoop environments. This addresses a major gap identified in the Ovum report What is Big Data: The Big Architecture: the lack of skills for Hadoop, MapReduce, and related technologies is currently one of the biggest impediments to adoption of NoSQL platforms. In the next release, Informatica plans to build a more robust offering that includes a graphical integrated development environment (IDE) for Hadoop; codeless and metadata-driven development; the ability to prepare and integrate data directly inside Hadoop environments; and end-to-end metadata lineage across the Informatica, Hadoop, and target environments. The 9.1 platform also includes a new set of connectors to various Big Data transactional systems to make it easier to meld structured transactional with largely unstructured interaction data (including social media). Informatica already offers connectors to popular databases such as Oracle, DB2, Teradata, and IBM Netezza, and is planning to put purpose-built advanced SQL analytic databases onto its price list, including Teradata/Aster Data, EMC Greenplum, and HP Vertica. Informatica has taken the logical first step in supporting social network integration by adding connectors for published Twitter, LinkedIn, and Facebook APIs.
Big Data integration is the big deal in Informatica 9.1 (OI00141-026) Ovum (Published 06/2011) This report is a licensed product and is not to be photocopied Page 2
Informatica has also enhanced its B2B Data Exchange Transformation product to make it easier to connect to other interaction data gleaned from call detail records (CDR), device/sensor data and scientific data (genomic and pharmaceutical), and large image files (through managed file transfer). Although the initial set of social media adapters are prescriptive to certain sites, Ovum expects Informatica to eventually offer a software development kit (SDK) approach that provides flexible connectivity to broader social media data sources. Informatica is not alone in providing support for loading and accessing of data to and from Hadoop. The race is on to provide a standardized set of visual Hadoop-focused tools that build around pillars such as MapReduce and access and transformation languages such as Hive and Pig. The leader will be the one that makes the NoSQL environment comfortable enough for the SQL developer mainstream.
this research note. However, one common thread that stands out across many of these additional enhancements in 9.1 is a continued focus on self-service provisioning of (in Informatica parlance) "authoritative and trustworthy" data. Informatica has worked hard to make its core business more accessible to a broader, nontechnical IT audience. This is a challenge, as data integration is a complicated IT task that has traditionally been the almost exclusive preserve of skilled DBAs and developers. Notable functionality to support this accessibility initiative includes the introduction of so-called "proactive data quality assurance" services to identify data exceptions more quickly. This is based on a complex event processing (CEP)-like model, which allows ETL developers to provide comparative profiling analysis to map certain data quality rules and logic against data profiles at early stages of the transformation pipeline in order to prevent costly errors from surfacing downstream. The model works by dynamically generating and comparing profiles of data as it flows through the mapping pipeline. It also enables "top-down" validation of actual versus expected data in data integration projects which is particularly useful when upgrading applications. There is also a new interactive, self-service Data Integration Analyst workbench for data analysts and data stewards, which extends a similar capability introduced for data quality analysts in its 9.0 release. This workbench aims to empower non-technical users who are close to the business and arguably have better business understanding of data to define their own data integration mapping and routines without having to constantly toggle back to IT developers. The creation and validation of source-to-target mappings is handled through a browser-based, guided interface that enables business analysts and data stewards to pinpoint data using business terms, define source-to-target mappings, selectively apply transform rules (including ETL and data quality) from a predefined inventory, validate the rules on the fly, and preview the results of their specifications. For example, analysts can find and navigate data sources and targets using metadata such as a business glossary or data lineage trails; specify, save, and share their own transformation logic with other analysts, projects, or both; and embed existing ETL mapping logic and data quality rules into their specification. The Data Integration Analyst tool then automatically generates the relevant PowerCenter or Informatica Data Services (IDS) transformation mapping logic, which can be deployed as virtualized SQL views, published web services, or batch ETL routines.
Informatica leverages this data virtualization solution as part of the overall platform to enable physical and virtual data integration depending on business needs. Informatica call this "multiprotocol data provisioning." It is technically an extension of Informatica's core data services architecture, and uses SQL endpoints via ODBC or JDBC as a web service, or to PowerCenter as a batch process. The key benefit is governance since the multi-provisioning is based on a common logical data object and policy definitions.
APPENDIX
Disclaimer
All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the publisher, Ovum (a subsidiary company of Datamonitor plc). The facts of this report are believed to be correct at the time of publication but cannot be guaranteed. Please note that the findings, conclusions and recommendations that Ovum delivers will be based on information gathered in good faith from both primary and secondary sources, whose accuracy we are not always in a position to guarantee. As such Ovum can accept no liability whatever for actions taken based on any information that may subsequently prove to be incorrect.
Big Data integration is the big deal in Informatica 9.1 (OI00141-026) Ovum (Published 06/2011) This report is a licensed product and is not to be photocopied Page 5