Sie sind auf Seite 1von 12

Capturing & Analyzing

High Velocity High Volume


Machine Data

December 3, 2013
Jason Lobel
CEO
@jasonlobel

Internet of Endpoints
THINGS (IOT)

Everything (IOE)
Data & Machines
Data is

Primarily sensor-based

50B

12.5B

Machine readable (API)


Accessible on-demand
Possibly even open (Public)

Includes non-machine generated data or


streaming data (catalogs, locations,
historical data, etc.)

Collect > Unify > Transform > Report > Predict

Capturing Streaming Data Considerations


Smart storage / backend setup is a key catalyst for downstream analysis

Backend Architecture
NoSQL datastore

Why Important
Long-term scale with data volume



High availability

Ideal for unpredictable demand
No joins for queries in reporting

Auto scaling cloud hosting


(AppEngine, AWS)

Spend less time on server tuning


Enable REST APIs

Enable JavaScript & mobile applications

Writeable and Retrievable

Real-time data

JSON over XML

Power dashboards or visualizations

APIs for history, real-time, query (SQL), Tracking/ How is data consumed
and even predictive
Unify with other sources
OAuth2.0 Security

API management
Multi-party (internet/external) access

Dedicated caching

Faster data retrieval speed

APIs Fuel Any Channel & Big Data Analytics


Public vs. Private: Estimate 10x more private APIs
Open: Gartner predicts 75% of the Fortune 500 are predicted to have open APIs by 2014
Competition: By 2015, APIs will be default, like websites in 2000 (Kin Lane, ex White House Fellow)

Growth In Public APIs

Unify IOT Data with Other Sources

APIs Fuel Interactive Visualizations


D3.js (d3js.org)
JavaScript library for manipulating documents using HTML, SVG and CSS

APIs => Programmable => Smart Controls

Make Apps Smarter with Machine Learning


Recommendation:
Analyzes users' preferences and finds items users might like
Frequent Pattern Mining:
Discovers unique frequently co-occurring items in a transaction list



Classification:
Learns from existing categorized data
and assigns a category to
uncategorized data

Clustering:
Organizes items from a large volume of data into groups of similar items
and features

Machine Learning Algorithm APIs?

Hard

Eas{ier}

Human

Human

Finding a data scientist

Finding an engineer that can use an API


Training (if needed)

Technical
Database selection
Algorithm(s) selection
Model training & iteration
Embedding predictions into applications
Security
Query speed / caching
Scaling
On-Demand Access


Technical

Common ML Applications for Retail


Item Recommendation: observes what the user likes and finds similar items
(I like the Chicago Bulls, I may like the Chicago Bears)
User Recommendation: recommend items finding similar users and sees what
they like (e.g., Kin and I are friends. He likes IPAs. I may like IPAs)



Item/Action Affinity: if X user wants X, what else is Y user likely to want based on
the relationship between X and Y (men
who buy diapers, also buy beer)
Predict Inventory: based on history, predict future sales (next 7, 30 days, etc.)
Discover Customer Segments: examine purchasing habits to identify clusters of
shopper segments
Prevent Fraud: identify anomalies in cashier activity, such as voids (is this likely
fraud? yes/no)

What We Do with Streaming Data


Focus = at least one massive data source can be transformed into many
insights that were not possible before at a fraction of the cost of legacy tools
Supermarkets: point-of-sale data, product catalog, sensors, etc.
eCommerce: web behavior, point-of-sale data, product catalog, etc.

Supermarket / C-Store

Before SwiftIQ
Unable to store POS order and cashier history
After SwiftIQ
Detailed transaction history available on-demand
Able to pursue real-time supply chain initiatives
Now can analyze product affinity to plan merchandising
strategies, promotions and optimize localization
Capable of visualizing data or generating interactive reports
Able to better predict inventory requirements
Better optimize hiring
Identify cashier fraud

Retail/eCommerce
Before SwiftIQ
Unable to unify disparate data (POS, web, mobile, CRM)
Unlikely to store web behavior
After SwiftIQ
Enable relevant, personalized digital experiences
Know specific customer segments vs. using intuition
Analyze product affinity to plan merchandising strategies,
promotions and optimize localization
Capable of visualizing data or generating interactive reports
Able to better predict inventory requirements

Das könnte Ihnen auch gefallen