Beruflich Dokumente
Kultur Dokumente
Clint Sharp
Our quest for visibility Analyzing at scale Splunk and Big Data Where do you start? Q&A
About Splunk
Company
(NASDAQ:
SPLK)
" "
Founded 2004, rst so?ware release in 2006 HQ: San Francisco Industry-leading machine data plaHorm On-premise, in the cloud and SaaS 63 of the Fortune 100 Largest license: 100 Terabytes per day
5,600+
Customers
" "
ShareThis makes the world more connected, trusted and valuable through sharing Powers the social web, touching the lives of 95 percent of U.S. Acquires Socialize, which makes mobile and social more engaging Socialized integrated into thousands of iOS and Android Apps Installed on 80M+ devices
"
"
"
"
Evaluating 20 Billion
Ad Impressions Monthly
Socialize Bidder
CollecYon
Storage
AnalyzaYon/ AggregaYon
Retrieval
Some Options
RDBMS
RDBMS
NoSQL
SQL
funcYons
like
count()
presents
problems
at
scale
Write operaYons too high for a single DB, as well as a single point of failure Would work well for high inserts and queries, however we would need to build alerYng, charYng and reporYng dashboards Easy to setup and query using Hive however we would have to setup a new environments and learn new technology
Hadoop
Analysis/Aggregation
index=ad_events displayed_ad | bin _time span=1m | stats count(meta.displayed_ad) as displays sum(price/1000) as dollars_spent avg(price) as avg_cpm_price by campaign_id _time | mysqloutput spec=ads-prod table=ads_analytics insert="campaign_id, stat_date, displays, dollars_spent, avg_cpm_price"
Indexer
Indexer
Indexer
Search
Head
RDBMS
(Generated
Reports)
Final Architecture
Socialize
Bidder
Splunk
Indexer
Indexer
Indexer
Memcache
Cache
Cluster
Memcache
Memcache
S3 Snapshots
Search Head
14
Machine-generated Data
Human-generated
Data
Arbitrarily
Structured
Ad hoc search
Developer Pla^orm
Commodity Servers
No data normalizaYon AutomaYcally handles Ymestamps Parsers not required Index every term & pacern blindly No acempt to understand up front
Structure applied at search-Yme No bricle schema to work around AutomaYcally nd transacYons, pacerns and trends
Customer ID
Order ID
Product ID
Order
ID
Middleware
Error
Customer ID
Twieer
Companys Name
Driving Insights
Hadoop and MapReduce are great for complex data science on data at rest the previous architecture took 9 months with a team of engineers, data architects, etc. The Splunk plaHorm delivers real-Yme, interacYve analysis we can build many of the same insights within 1 hour Conclusion: nd the most opYmal approach for the business
"
What About
Ad Hoc Analysis?
PR Insights Example
" " " "
What was the situaTon? (e.g. fast moving business, needed real-Yme insights) What was the PR team struggling with? Dicult to nd useful data to build interesYng use-cases What did they want? They wanted a exible real-Yme reporYng environment to extract insights useful for the market How my team helped? Delivered a single dashboard that contained real-Yme data into the sharing behaviors across our network
PR Insights Dashboard
NoTcaTons
Systems
website
API
NoYcaYon
Apple
(APNS)
Feedback
Processor
Google
(GCM)
28
2
Explore
Analyze
Visualize
Dashboards
Share
Hadoop Storage
Learn More
splunk.com/bigdata
31
Questions?