Sie sind auf Seite 1von 67

Applying a three tiered data

strategy for success with Tableau


and Hadoop
Dan Kogan Director, Product Marketing, Tableau
David Spezia Strategic Sales Consultant, Tableau
Big Data Focus

Connectivity Performance Discovery


Access to all Fast interaction Finding the
data with all data right data.
Tableau Data Architecture
A combination of in-memory and live query engines

Designed for adaptability and flexibility


Unified user experience for full range of data stores

+
Live Query Engine
Supports the worlds largest databases

Data
warehouses
Attaches directly to enterprise data stores Datamarts
and cubes
Compatible with major data platforms Fast
databases
Leverages enterprise data models and
security
In-Memory Engine
Extremely fast computational ability

Breakthrough in-memory database


Column storage, architecture-aware
Fast performance with massive data
APIs available to support the extended
data ecosystem

Data Extract API


& Web Data Connector
Big Data Connectivity Roadmap
Today

Tableau v6.1.4 Tableau v7.0.7 Tableau v7.0.10 Tableau v8.0.1


Tableau v8.3.2
Cloudera Hadoop MapR Hadoop Datastax Enterprise Amazon Redshift
Amazon EMR
& Cassandra

2010 2011 2012 2013 2014 2015 2016

Tableau v5.2 Tableau v8.2.3 Tableau v9.0


Pivotal Greenplum Tableau v7.0.10 Tableau v8.1.4
Splunk MarkLogic Spark SQL
& HAWQ Hortonworks
Hadoop

Tableau v8.2.3
IBM BigInsights
Tableau v7.0.10
Tableau v8.0
Cloudera Impala
Google BigQuery
What is Big Data?
Modern data drives need for a new generation of
databases

Relational Databases
Application independent
Scale-up architecture
Structured data only
Schema-on-write
Limited data processing
High cost

Hadoop & NoSQL Databases


Structured & Unstructured
Massive scale scale-out
Schema-on-read
Storage with Compute
Low cost
Analytics for all your data

Analytics for all of your data:


Tableau empowers people
throughout the organization to
answer questions of their data, large
or small, in real-time. The more
Too close to questions they ask, the more value
call
they extract from the data, leading to
smarter business decision every day.

Tableau works with best of breed


Small Data Big Data
technologies and works seamlessly
Co
with Big Data databases in addition
to more traditional databases, you
can have one interface into all of
your data. This makes Tableau itself
the best Tableau for Big Data tool.

Volume of Data
DATA WAREHOUSE v DATA LAKE
structured, processed DATA structured, semi-
structured, un-structured,
raw
schema-on-write PROCESSING schema-on-read
expensive for large data STORAGE designed for low cost
volumes storage
less agile, fixed configuration AGILITY highly agile, configure
and reconfigure as
needed
mature SECURITY rapidly maturing
business professionals USERS data scientists, business
professionals
DATA WAREHOUSE v DATA LAKE
structured, processed DATA structured, semi-
structured, un-structured,
raw
schema-on-write PROCESSING schema-on-read
expensive for large data STORAGE designed for low cost
volumes storage
less agile, fixed configuration AGILITY highly agile, configure
and reconfigure as
needed
mature SECURITY rapidly maturing
business professionals USERS data scientists, business
professionals
Challenges: Query speed
minutes

Hive on
MapReduce sub-minute

Hive on Tez or
Hive on Spark
sub-second
Modern Analytics (aka Big Data) Stack

There is more to this stack than Tableau + the


database (i.e. Hadoop)
Three areas to consider for success analytics projects:
1. Data ingestion & preparation in the lake: moving
from landing to production
2. Hadoop the core components from storage,
security, resource management, etc.
3. The Hot Tier making queries run fast
Data ingestion and prep in the Data Lake
Storage & processing
Query Acceleration

Query design
best practice Tableau Data Engine

Fast analytical
DBs
Is OLAP back?
Cold Warm Hot Framework

+ The Data Lake + Data Warehouses + In-memory computing


+ Store Everything + Data marts prepared + Precomputed aggregates to
and Anything for entity analytics answer specific questions
+ Unknown + Known questions + Known questions with
Questions with with unknown known answers
Unknown Answers answers + Dashboards
+ Unstructured / + Regularly refreshed
Data Mining / business concepts
Data Science
Cold, Warm, Hot Strategy
Data
Size
Hadoop Technology Creep

Tableau Hyper Creep

Performance

Large data (raw or prepared) Prepared data Aggregated data


Hadoop Cold Hadoop Hive
Use Case
Data Exploration/Mining
Ad-Hoc Report Conceptual Modeling
Explore Concepts to Migrate to Analytically Optimized Data Stores

Technologies
Hive on MapReduce Cloudera, Hortonworks, MapR, Amazon EMR, and others

Hadoop

HDFS Hive
Cold Use Cases

Store All of the Data, Analyze Some of the Data


Financial Transaction Records
1970s Style Fixed Width Records
Airline Booking Records
Very Similar to Financial Transitions
eCommerce
Browsing and Click Activity
AB Testing and Campaign Optimization
TelCo Call Detail Records & Web Detail Records
We Know Where You Are
We Know Who You AreAs you Browsing History would suggest
Warm Relational DB

Use Case Technologies


Data/Reports Migrated from Hadoop Microsoft SQL Server
Regularly Analyzed Data Oracle
Production Reporting MySQL
Compliance/Regulatory Reporting PostgreSQL
Ad-Hoc Data Discovery on Data Entities Impala
Data/Reports that migrated from the Cold Storage Teradata
Hadoop

HDFS Hive Relational


DB
In-
Memory
Hot TDEs & Analytical DBs

Use Case Technologies


BI Applications Tableau Data Extract
Speed up specific elements in a dashboard HP Vertica
Analytical Query Acceleration Layer Teradata Aster
Query Isolation from overburdened Warehouse Exasol
Pivotal Greenplum
AtScale
Jethro
Hadoop

HDFS Hive
TDE
+
Tableau Data Extracts When to use them?

Extracts Recommended Live Connection Recommended

Slow SQL-on-Hadoop execution Fast SQL-on-Hadoop engine available

Smaller dataset sizes needed Larger dataset sizes needed

Offline analysis required Real-time analysis required

Reduce load on Hadoop cluster


Optimize your Tableau Data Extracts

Extract Sampling Techniques


Filters
Keep only well-known dimensions and measures
Use short date ranges
Aggregates
Aggregate dimensions and measures when possible
Roll-up dates when possible
Samples
Utilize Custom SQL with sample function
Top N
May be skewed since non-random sampling
General Techniques for
Hadoop
Improvement

Partition field as filter

Storage file format

Initial SQL & Custom SQL

Single denormalized table

Monitor for long running queries

Execution engines
Certain pitfalls must be
Hadoop
avoided
Data blending large datasets
Avoid large Hadoop dataset to second dataset blending
Executed on the Tableau client side
Unnecessary joins
Imperfectly implemented on many big data systems
Connections with huge number of schemas and columns
Inefficient formulas
Leverage a multi-tiered approach based
on your data
Aggregated
data
TDE

Hadoop

Impala
Raw data
+
Spark SQL (large)
HDFS Hive LLAP
Presto
Drill

Prepared
data Fast
analytical
database
Human Scale of Data
Single
Consumable
Chunk of Data
Chunks of Human
at the Human Scale
(Dashboard)
Aggregation Level Consumable Data
Aggregation of Data Tiers

Year (4)
Filter Year

Year to Quarter to Month to Week to Day


to Records

Month (48)
Filter Month
Region to Country to State to County to
Select Dimension Select Dimension
Zip Code

Week (105)
Filter Week Drill Down to Raw Data with
Context

Day (90)
Select Month Filter Day
Use Aggregates for Guided
Raw Data Drilling
In the Weeds
Select Week
Use Action Filters to Navigate
the Pyramid
Action Filters: Big Data Secret Weapon

Dashboard or Document Acceleration Use Action Filters to Jump from


High Performance Tier to Tier with a filter context
HOT
Aggregations
Persistence Drill Down to the Details
Leave the Data in the Appropriate
Data Architecture
WARM
Row Level Security Hot - Analytical Query
Live Connections
Core Report Development Warm - Entity Query
Cold - Data Discovery

Data Mining
COLD Detailed Data
Raw Data
Machine Learning
Tableau Big Data Customer Use Cases

1. Wargaming MMO Gaming Focus

2. King Mobile Gaming Focus

3. GoPro Cameras for Todays Yolo Adventure Obsessed Youth


Wargaming

1. Use Case
I. Move from Legacy System & Excel to Agile Visual System
II. Game Health: Are games being played as designed?
III. Business Health: Metrics and measures of regional revenue objectives
IV. Market Health: Global competitive, economic and other external factors

2. High Level Architecture


I. Tableau for Analysis and Self Service
II. AtScale for Business Friendly Data
III. Cloudera Impala for Raw Document Storage
IV. Ingest Raw Data from 8,000+ game server
Wargaming Lessons Learned

1. What did we Learn?


I. Tableau with AtScale Cubes on Hadoop are effective for providing self-service concepts at
scale
II. AtScale cubes can store business logic, hierarchies and measures
III. AtScale can use logical cube structures not forcing time to do data processing
IV. Cloudera is a leader with 30%+ of market share
V. Users can both fish and persist the data
VI. AtScale is a technology partner changing the game with big data analytics clients
King

1. Use Case
I. 1.5 Billion Game Plays per day with 149 million daily active users
II. Generate 1 Petabyte+ of data per year; 25 Billion Events per Day!
III. Striking the right balance between making games fun while also making them challenging to
players
IV. Understanding how, when and why players make in-game purchases
V. Overcoming the limitations of Hadoop as an open-source solution for data management and
analytics
2. High Level Architecture

Data Scientist
Game Server
Logs/Activity
Tableau Desktop
Business Analyst
King Lessons Learned

1. What did we Learn?


I. 64 Node Exasol Clusters are Crushingly Fast for Analytical Query
II. Exasol (a rapid fire analytics database from Germany) loves Tableau
III. Exasol create a Tableau Turbo product line
IV. AtScale was preferred by the Data Scientists and Data Miners at King and lives in
complements Exasol
V. Like for like on billions of rows AtScale took 13 seconds and Exasol 5 seconds on average
during Kings testing
VI. Business Benefits:
1. Self-optimizing analytic database helps keep the focus on gaming
2. Queries that used to take hours now run in seconds
3. Business insights are available at the right time
4. Flexible solution means staying responsive to rapidly changing business requirements
Go Pro

1. Use Case
I. Created Data Science and Engineering Team inside Product Org
II. Cloudera Data Lake Platform called TPS (The Philosophers Stone)
III. Combining Data from Email Campaign Data + Google Analytics (CSV & JSON)
2. High Level Architecture

Tableau Desktop
GoPro Lessons Learned

1. Tableau Democratizes the Analysis of Data


2. Trifacta Democratizes the Data Wrangling
3. Impala/Parquet is the Exploratory Data Source for Tableau
4. Extracts for production workbooks
5. Combination of Hadoop/Tableau/Trifacta gives Agility and reduces Time
to Insight
Lesson Learned Summary

Tableau is a Leader in Big Data Analytics


Self-service Big Data Analytics require some logic
to an appropriate data architecture
Tableaus live connectors to big data platforms are a
differentiator
The Data Engine is useful as a document and
dashboard acceleration layer on top of Big Data
Tableau works with the leaders in the Big Data
ecosystem to develop tight product integrations to
improve your experience
Tableau deployment and data architecture flexibility
allow for agile deployments driving business value
Please complete
the session survey
from the Session
Details screen in
your TC16 app
Main topic 1:
Subtopic copy goes here
Subtopic copy goes here
Main topic 2:
Subtopic copy goes here
Subtopic copy goes here
Main topic 3:
Subtopic copy goes here
Subtopic copy goes here
Main topic 1:
Subtopic copy goes here
Sample text to show bullet indent on wrap around. Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Main topic 2:
Subtopic copy goes here
Subtopic copy goes here

Main topic 3:
Subtopic copy goes here
Subtopic copy goes here
Lorem ipsum dolor sit amet, error possim
abhorreant vix ne, ne mel debitis iudicabit
voluptatibus. Affert timeam debitis no nam. Sint
democritum complectitur his an.

Ex mei admodum inciderint, cum cu nihil


commune atomorum. Vix ea possit similique
elaboraret.
Header Header

Subheader
Sample Text

Sample Text

Sample Text

Sample Text

Sample Text

Sample Text
Please add a 5% gray, .5pt
thickness border around
vizzes
ullamcorper ipsum suscipit in.
Curabitur fermentum lacinia
lectus non laoreet. Sed volutpat,
dui eu rutrum volutpat, nulla mi
accumsan dui, non venenatis
mauris augue nec lectus.
Sample Code:
var pd = require('pretty-data').pd;

var xml_pp = pd.xml(data);

var xml_min = pd.xmlmin(data [,true]);

var json_pp = pd.json(data);

var json_min = pd.jsonmin(data);

var css_pp = pd.css(data);

var css_min = pd.cssmin(data [, true]);

var sql_pp = pd.sql(data);

var sql_min = pd.sqlmin(data);


Sample Code Slide (blue)
Sample Code:
var pd = require('pretty-data').pd;

var xml_pp = pd.xml(data);

var xml_min = pd.xmlmin(data [,true]);

var json_pp = pd.json(data);

var json_min = pd.jsonmin(data);

var css_pp = pd.css(data);

var css_min = pd.cssmin(data [, true]);

var sql_pp = pd.sql(data);

var sql_min = pd.sqlmin(data);


S E S S I O N R E P E AT S

Day of Week, Month Day


Session Title
Time Time | Location

Day of Week, Month Day


Session Title
Time Time | Location

Day of Week, Month Day


Session Title
Time Time | Location
R E L AT E D S E S S I O N S

Friday, October 23
Beginning Your Geographic Analysis Journey
11:30am 12:30pm | Location

Friday, October 23
Beginning Your Geographic Analysis Journey
11:30am 12:30pm | Location

Friday, October 23
Beginning Your Geographic Analysis Journey
11:30am 12:30pm | Location
Please complete
the session survey
from the Session
Details screen in
your TC16 app
email@email.com
To modify table, first click anywhere in table, Layout > Shrink or expand column widths by adjusting the
so the Table Tools menu is highlighted at top Cell Size, or set them to same size with Distribute

To modify the table layout, click Table Tools > Layout To Layout > Use Alignment settings to adjust text
modify the table style, click Table Tools > Design alignment and cell margins

Layout > To add rows, click into cell and choose,


Insert Above or Insert Below

Layout > To add columns, click into cell and choose Tip: To quickly add a row, place cursor in this last cell
Insert Left or Insert Right and hit Tab key
Header 3 Header 4

Content Content Content Content Content

Content Content Content Content Content

Content Content Content Content Content

Content Content Content Content Content

Content Content Content Content Content

Content Content Content Content Content


Header 1 Header 2 Header 3

Content Content Content Content Content

Content Content Content Content Content

Content Content Content Content Content

Content Content Content Content Content

Content Content Content Content Content

Content Content Content Content Content


Apply the template to an existing PowerPoint presentation
(that uses a different template)

1. Save this template to your Desktop.


2. Open an existing PowerPoint you wish to update.
3. Click Design, then scroll down and select "Browse for Themes."
4. Browse to the TC16 template file (.potx) you saved to your Desktop and click Open.
5. The template should update the design and font, but you might need to make a few
adjustments after applying the template.
Fonts
This template has been formatted with Arial (bold) and Arial which are standard fonts set in the Slide
Master to avoid compatibility issues.

Slide Titles and Headers


Slide Title and Header text should be set in Arial (bold) have each word capitalized
Main topics and Headers should be set at 32pt when possible, use colored text for visual
differentiation/focus.

Body Copy
Body copy should be set to Arial 24pt when possible.
Try to limit each slide to a maximum of 3 font sizes.

Type Tips
Create visual differentiation/focus by using scale and color versus using bullets.
PowerPoint palette for this Slide Master

Text & slide background colors Accent colors

Text/ Text/
Background Background Accent 3 Accent 4 Accent 6
Light 1 Light 2

Type and background combinations must meet a minimum 4.5:1 contrast


ratio or greater for accessibility and large format screen legibility.
Examples above demonstrate correct color use for
on-screen applications.
To view the grid and guidelines, check the
Guides box in the Show section under the
view tab.
To further aid in
alignment of objects,
click the expand icon
in the bottom-right
corner of the Show
section and check
Display smart guides
when shapes are
aligned.
To view an itemized list of
objects on the slide and their
order of appearance, under
the File tab, expand the
Select drop-down in the
Editing section and click
Selection Pane.

To view the formatting options pane for objects on


the slide, right-click the desired object and select
Format Shape.
Access Big Data Any Data Anywhere Any Platforms Best Practices Calculations Library Collaborate Community Dashboard Data Blending Deep Statistics

Embedded Analytics Folder Location Mobile Menu Mobile People Scalable Search Security Server Admin

Settings Tableau Desktop Tableau Online Tableau Public Tableau Server

These icons are provided so you can use them to for diagrams showing architecture, workflow, etc. Icon colors can be
modified by right-clicking item and selecting theme color.

Das könnte Ihnen auch gefallen