Sie sind auf Seite 1von 70

KAVE: core

platform for
the Data Lake
How the KAVE offers the relevant technology to cover the use cases of a
Data Lake solution
Table of contents
• Open Source for Analytics: an established yet evolving trend

• Closed-source D&A: drawbacks & risks

• Data Lakes

• What is the KAVE?

• KAVE & the fulfillment of the Data Lake evolution


• Data Warehouse & Business Intelligence functionalities
• Controlling the access and usage of the data
• From experiments to production
• The modern Cloud experience

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 2
Open Source for
Analytics: an
established yet
evolving trend
The most successful Fortune 500
companies run and grow on Open Source
Data&Analytics software

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 4
OpenSource technology is empowering:
Processing 510,000 comment postings, 293,000
status updates, 136,000 photo uploads at FB, per
second

An estimate 40K+ nodes cluster storing 500+ PB


of data at Yahoo!

A repository of almost 1B citizens biometric data at


Aadhaar India
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 5
Walmart’s investment in open source is as
big as they look: 6M+ $ (2016, estimate)

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 6
Google has released
over 900 open source
projects, totaling over
20M code lines.
Developer time spent
on open source
amounts to about 1B$
worth of salaries per
year
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 7
Barclays claimed to
have cut costs up
to 90% in the last
five years by
adopting
opensource for its
cloud strategy
Closed-source D&A:
drawbacks & risks
Less efficiency &
flexibility for data
exploration: analysts’
tools & techniques are
too different

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 10
Data analytics software
usage & relevance growth:
Open vs proprietary
300,00%

250,00%

200,00%

150,00%

100,00%

50,00%

0,00%
KDnuggets 2016 Software Poll
Open source Proprietary (also partially)

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 11
Lock-in solution: cannot
easily integrate,
customize and migrate

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 12
Federal Source Code Policy: 20% minimum of newly
developed software released as open source:
encourage usability, prevent lock-in

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 13
Fall-behind: cannot
introduce state-of-art
techniques or
redesign

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 14
In 2008 Nokia very successfully open-
sourced its SymbianOS handset
system, with an expense about 300M $

It was too late: most hardware


vendors had already moved to
Android, the open source mobile
system by Google
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 15
No agility: rigid licensing and scaling modeling,
difficult market reaction & evolution
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 16
“In prior eras industry players
lacking technical competence
outsourced the job […], game
changes detemined innovation
was not coming from there, and
even if it did, licensing would be
non-starter in scale-out
environments” S. O’Grady

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 17
Data Lakes
“A Data Lake is a centralized,
integrated and large-scale data
repository for the organization.
The Data Lake empowers a
pan-organizational and holistic
view on the information.
It collects all of the relevant
organizational data assets with
a structure-oblivious approach.”
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 19
The Data Cycle

BU #11 SALES R&D

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 20
The Data Lake: driving the analytics evolution

TRANSFORMATION
TARGET

PEOPLE
TECHNOLOGY
&
COLLABORATION IMPACT

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 21
The Data Lake: analytics processes & new strategies

Predictive
Risk analysis Data-driven
maintenance
decision
making

Inventory &
Pricing Chain
models intelligence

Optimized Data
marketing integration

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 22
Data: not a by-product but a source of value

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 23
Enterprise Data Lake: analytics-driven organization

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 24
Data analytics: make value out of data
Collect &
Integrate
Data-on-demand,
agile access

Frameworks for
data
exploration,
proof-of-
concept’s,
production

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 25
Focus: people
Not just tech-trend,
real value for CIOs Enhanced customer
experience, ad-hoc
Main reference for scenarios
CDOs

Comply to the
Valorize your
organization
analysts team,
structure with
attract new talent
respect to data

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 26
What is the KAVE?
KAVE: extension of the HortonWorks Hadoop distribution
KAVE extension

HortonWorks
Data Platform
distribution

Hadoop core
software

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 28
Data Lakes established technology ecosystem: Hadoop

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 29
Data Lakes established technology ecosystem: Hadoop
Modern
architecture &
De-facto industry service
standard for Big Data

Opensource:
• Free, no license cost
• OK commercial products
• Customizable - no lock-in
• Professional support

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 30
KAVE: extension of the HortonWorks Hadoop distribution
Hortonworks Data Platform
distribution:
• Standard installation, partially
automated
• Additional software
(management, monitoring,…)
• Vendor solution: global tech
support

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 31
KAVE: extension of the HortonWorks Hadoop distribution
Data
Collaboration & BI/visualization
exploration & Web interfaces
Development integration
analysis

Integrated security
layer

Automated installation
on Microsoft Azure

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 32
KAVE: extension of the HortonWorks Hadoop distribution
&
Development

BI

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 33
KAVE: extension of the HortonWorks Hadoop distribution

Continuous
improvement, up-to-
date with Data Lake &
Analytics technology

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 34
KAVE & the
fulfil ment of the
Data Lake evolution
Enterprise Data Lake: topics & directions
Security & Compliance
ETL

DWH/BI functionality

Modern cloud
deployment

Agile PoC & development

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 36
Data Warehouse &
Business Intelligence
functionalities
The traditional DWH/BI stack

ETL

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 38
Traditional DWH/BI stack: capacity scale
• Costs ?
• Performance ?
• SQL-only ?

SQL

SQL SQL

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 39
Enterprise Data Lake: ELT scaling in KAVE

T L
L
DWH

T T
T
T
T
T T
T
T
T
T

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 40
Enterprise Data Lake: evolution of the DWH/BI stack

ELT, Extract- Streaming


Load-
&
Transform
Realtime

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 41
KAVE: fully-automated ETL facilities

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 42
KAVE: fully-automated ETL facilities

Define, schedule and manage ETL pipelines in a Integrated and automatic metadata creation and
graphical way management

Ad-hoc RDBMS import Build pipelines of any


(Oracle, Postgres, complexity for the best
MySQL…) transformation strategy

Seamless import of
heteogeneous data
sources (logs, queues,
files, webpages…)
Advanced and optimized Hadoop storage formats

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 43
Enterprise Data Lake: OLAP & OLTP workloads

OLAP OLTP

JDBC/ Wrappers
ODBC

OLAP

OLTP

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 44
KAVE: OLAP & OLTP workloads

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 45
BI & BigData: are we there yet?
R&D CRM

LOGISTICS
CRM
R&D

LOGISTICS

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 46
KAVE: reports & visualization

JDBC/ODBC

BI platform

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 47
KAVE: reports & visualizations

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 48
Controlling the
access and usage
of the data
Enterprise Data Lake: security & governance

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 50
Enterprise Data Lake: security & governance

Sales and R&D


departments data

Finance department:
no access to their data

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 51
Enterprise Data Lake: security & governance

Finance department
cannot run Spark on
test cluster

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 52
KAVE: full data management on secured infrastructure

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 53
From experiments
to production
Data-centric software products with cycle optimization
Solution
Productization &
monitoring and
deployment
value measurement
5
6
Business- 3
significant Corrections &
proof-of- improvements
concept 7
2
1
Definition &
Brainstorming &
Consolidation of
exploration on data
needed datasets
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 55
KAVE & data-centric development model: a glance

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 56
Prototype data product deployment for the web

WEB
TRAFFIC

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 57
The modern Cloud
experience
The many unknowns of low-automation infrastructure
IT support: Sustained
slow & old infrastructure Quality of
“open a ticket” investment service,
title
model service-level
agreements

Security
Process
bottleneck: wait title
Premises vs
Compliance &
for IT Cloud
Regulations
dichotomy

Uncertainty in title
Rollout &
operational Releases
costs Modularity &
Isolation

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 59
Satisfying the data product cycle: continuous delivery

Deployment- Deployment Deployment On-demand


oblivious blueprint resources solution
solution definition allocation deployment

AUTOMATED

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 60
The need for an integrated, dynamic and automated
infrastructure
Seamless
across API
Scale-as-
premises interface
you-go
& cloud automation

Guaranteed Failure Turn-key


service escalation solution
support

Direct Accurate
control but costs
intelligent control
self-healing Modularity

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 61
The preferred infrastructure model for KAVE: Azure

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 62
Datacenters PaaS (platform as- Security IaaS (infrastructure
a-service) as-a-service)

Tens of locations Extensive support Fully automated


Enterprise
worldwide: for web virtual
customizable
compliance & applications infrastructure
security levels
localization management

Service & Costs & Billing Marketplace Modularity &


Availability Coverage

Ad-hoc minute- Basic Microsoft


Per-service precision billing services and
guarantees, schemes; vendor offerings: Dozens of
virtually 100% suspendable vast offer, direct independent and
services vendor contact integrated services
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 63
Azure: modern web user experience

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 64
Azure: preferred infrastructure for KAVE

A full data-lake core


deployment, in just a wizard!

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 65
The idea of Data Lake as a service with KAVE

Deploy mixed on-


prem / remote
solutions Scale the solution
in no time with a
few clicks

Agile for solution


PoC, enterprise-
class for Save budget with
production exact billing and
direct systems
access

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 66
Try a fully working KAVE instance on the Azure
marketplace!

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 67
Try a fully working KAVE instance on the Azure
marketplace!

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 68
Open source: contribute & extend to fit your needs!

© 2016 KPMG Advisory N.V., registered with the trade register in the Netherlands under number 33263682, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. The KPMG name, logo and ‘cutting through complexity’ are registered trademarks of KPMG International.

© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 69
Thanks !

Das könnte Ihnen auch gefallen