Hadoop Summit Socialize & Splunk

Copyright
2013 Splunk Inc.
Big Data at the Speed of Business

Isaac Mosquera
Director of Mobile, ShareThis
Clint Sharp
Principal Big Data Product Manager, Splunk
What Well Talk About

Our quest for visibility Analyzing at scale Splunk and Big Data Where do you start? Q&A
About Splunk
Company (NASDAQ: SPLK)
" "
Founded 2004, rst so?ware release in 2006 HQ: San Francisco Industry-leading machine data plaHorm On-premise, in the cloud and SaaS 63 of the Fortune 100 Largest license: 100 Terabytes per day
Business Model / Products

" "
5,600+ Customers
" "
#1 Big Data Innovator*

* Fast Company's Most Innova1ve Companies Issue (March 2013)
About ShareThis and Socialize

"
ShareThis makes the world more connected, trusted and valuable through sharing Powers the social web, touching the lives of 95 percent of U.S. Acquires Socialize, which makes mobile and social more engaging Socialized integrated into thousands of iOS and Android Apps Installed on 80M+ devices
"
"
"
"
Evaluating 20 Billion
Ad Impressions Monthly
Little Bit About Real-Time Bidding

R T B
Ad Impression Ad Click
Ad Request Winning Bidder's Ad
Ad Request Bid Response
Socialize Bidder
All this needs to happen in less than 100 milliseconds!
So What Are Some of the Problems?

OperaTonal
" IngesYng more than 10,000
queries per second " Which bids are > 100ms " Quickly nding any errors within the system
Decision Making (Bid Algorithms)

" Campaign spending " Campaign eciency " Dissect data by:
apps users devices
Analyzing Big Data Efficiently

1. 2. 3. 4.
CollecYon
Storage
AnalyzaYon/ AggregaYon
Retrieval
Some Options
RDBMS RDBMS NoSQL SQL funcYons like count() presents problems at scale

Write operaYons too high for a single DB, as well as a single point of failure Would work well for high inserts and queries, however we would need to build alerYng, charYng and reporYng dashboards Easy to setup and query using Hive however we would have to setup a new environments and learn new technology
Hadoop
Splunk Fits the Bill

OperaTonal ReporTng AdHoc Queries ApplicaTon ReporTng Scalability Easily idenYfy problems and prevent erroneous spending. When an alert goes o we hit a script which shuts o the bidder. Allows us to nd pacerns in the data to improve our bid algorithms Instantly know campaign metrics for us and our clients Adding new RTB Service providers means billions of new ad requests. Scaling horizontally is key
Analysis/Aggregation
index=ad_events displayed_ad | bin _time span=1m | stats count(meta.displayed_ad) as displays sum(price/1000) as dollars_spent avg(price) as avg_cpm_price by campaign_id _time | mysqloutput spec=ads-prod table=ads_analytics insert="campaign_id, stat_date, displays, dollars_spent, avg_cpm_price"
Indexer Indexer Indexer Search Head RDBMS (Generated Reports)
Using Splunk to Analyze Operational Data

InteracYve analysis with Search Processing Language:
source="nginx-prod.log" | stats avg(ResponseTime) as avg_rtime, p95(ResponseTime) as p95_rtime , stdev(ResponseTime) as stdev_rtime
Easily digest informaYon through charts
Final Architecture
Socialize Bidder
Splunk
Indexer Indexer Indexer Memcache
Cache Cluster
Memcache Memcache
S3 Snapshots
Search Head
RDBMS (Generated Reports)
So, What is Splunk?
14
Expanding Universe of Data Sources

2012-12-05 07:04:44 Id=00Q000000Rd910EAJ City=New York Country=US CreatedDate=2012-12-05 07:06:44 Email.jdoe@gmail.com Email_Opt_In_c Customer_Street _Address_c=123 Main St. purchased_product_id= product_i BD-01 twitter_username john_t_doe
Business ApplicaTon Data

Highly Structured
Machine-generated Data
Human-generated Data
Arbitrarily Structured
Industry Leading Platform for Machine Data

Any Machine Data Operational Intelligence
Ad hoc search
Monitor and alert
Report and Custom analyze dashboards
Developer Pla^orm
HA Indexes and Storage
Commodity Servers
Analyzing Heterogeneous Data

Universal Index Schema-on-the-y Flexibility and Fast Time to Value
NormalizaYon as its needed Faster implementaYon Easy search language MulYple views into the same data
No data normalizaYon AutomaYcally handles Ymestamps Parsers not required Index every term & pacern blindly No acempt to understand up front
Structure applied at search-Yme No bricle schema to work around AutomaYcally nd transacYons, pacerns and trends
Gain Critical Insights in Real-time

Sources
Order Processing
Customer ID
Order ID
Product ID
Order ID
Middleware Error
Customer ID
Time WaiYng On Hold

Care IVR
Customer ID Twicer ID Customers Tweet
Twieer
Companys Name
Deep Visibility and Insight for IT and Business

IT OperaYons Management ApplicaYon Management Security and Compliance Web Intelligence Business AnalyYcs Industrial Data / Internet of Things
Over 5,600 organizations using Splunk across IT and business users
from Big Data
Driving Insights
The ShareThis Insights Platform

On Fathers day: Who were the most shared about topics? ? What type of type of beers do people drink?
Hadoop API ETL

Pre- aggregaTon AnalyTcs
Finding the Optimal Approach

What should be the core focus or competency of your team?
"
Hadoop and MapReduce are great for complex data science on data at rest the previous architecture took 9 months with a team of engineers, data architects, etc. The Splunk plaHorm delivers real-Yme, interacYve analysis we can build many of the same insights within 1 hour Conclusion: nd the most opYmal approach for the business
"
What About
Ad Hoc Analysis?
PR Insights Example
" " " "
What was the situaTon? (e.g. fast moving business, needed real-Yme insights) What was the PR team struggling with? Dicult to nd useful data to build interesYng use-cases What did they want? They wanted a exible real-Yme reporYng environment to extract insights useful for the market How my team helped? Delivered a single dashboard that contained real-Yme data into the sharing behaviors across our network
PR Insights Dashboard
Lets not forget

The low-hanging fruit
Operational Analytics for an Online World

Driving Superior Customer Experience
How many 500 errors have I had over Yme?
Look for anomalies and spikes!
Zone in directly to the customer!
Online Device NoYcaYons
NoTcaTons Systems
website
API NoYcaYon Apple (APNS) Feedback Processor Google (GCM)
One More Thing
28
Copyright 2013 Splunk Inc.
Announcing Hunk Beta

New product from Splunk delivers interacTve data exploraTon, analysis and visualizaTons for Hadoop
Splunk AnalyYcs for Hadoop
Derive Actionable Insights from Raw Data

1
Point Splunk at Hadoop Cluster
2
Explore Analyze Visualize Dashboards Share
Immediately start exploring, analyzing and visualizing raw data in Hadoop
Hadoop Storage
Learn More
splunk.com/bigdata
31
Copyright 2013 Splunk Inc.
Questions?

Hadoop Summit Socialize &amp; Splunk

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Hadoop Summit Socialize &amp; Splunk

Hochgeladen von

Copyright:

Verfügbare Formate

Copyright

2013 Splunk Inc.

Big Data at the Speed of Business

Principal Big Data Product Manager, Splunk

What Well Talk About

Business Model / Products

#1 Big Data Innovator*

About ShareThis and Socialize

Little Bit About Real-Time Bidding

Ad Request Winning Bidder's Ad

Ad Request Bid Response

All this needs to happen in less than 100 milliseconds!

So What Are Some of the Problems?

Decision Making (Bid Algorithms)

Analyzing Big Data Efficiently

Splunk Fits the Bill

Using Splunk to Analyze Operational Data

Easily digest informaYon through charts

RDBMS (Generated Reports)

So, What is Splunk?

Expanding Universe of Data Sources

Business ApplicaTon Data

Industry Leading Platform for Machine Data

Monitor and alert

Report and Custom analyze dashboards

HA Indexes and Storage

Analyzing Heterogeneous Data

Gain Critical Insights in Real-time

Time WaiYng On Hold

Customer ID Twicer ID Customers Tweet

Deep Visibility and Insight for IT and Business

Over 5,600 organizations using Splunk across IT and business users

from Big Data

The ShareThis Insights Platform

Hadoop API ETL

Finding the Optimal Approach

Lets not forget

Operational Analytics for an Online World

How many 500 errors have I had over Yme?

Look for anomalies and spikes!

Zone in directly to the customer!

Online Device NoYcaYons

One More Thing

Copyright 2013 Splunk Inc.

Announcing Hunk Beta

Splunk AnalyYcs for Hadoop

Derive Actionable Insights from Raw Data

Point Splunk at Hadoop Cluster

Immediately start exploring, analyzing and visualizing raw data in Hadoop

Copyright 2013 Splunk Inc.

Das könnte Ihnen auch gefallen

Hadoop Summit Socialize & Splunk

Hadoop Summit Socialize & Splunk