Sie sind auf Seite 1von 48

Data

Products Deep Dive


Pete Skomoroch
@peteskomoroch
3/31/14
Berkeley CS194-16: Intro to Data Science

Some Background

Physics/Math BS Undergrad
Analyst/ SoGware Engineer @ProtLogic - 3.5 years
Biodefense Engineer / ML Student @ MIT - 3.5 years
Sr. Research Engineer @ AOL Search - 1 year
Director @ Juice AnalyScs - 1 year
ConsulSng @ Cloudera, Amazon etc - 1 year
Principal Data ScienSst @ LinkedIn - 4 years

Four types of data scienSst (at least)

source: "Analyzing the Analyzers" O'Reilly


Media

Data ScienSsts create data products

The data product process

Verify you are solving the right problem


Theory + model design
Measurement: data collecSon and cleaning
Feature engineering & model development
Error analysis and invesSgaSon
Iterate and improve each step in the process
Leverage derived data to build new products

Data factories & ywheels

Source: h`p://www.linkedin.com/
channels/disrupt2013 Steve Jennings/Ge`y
Images Entertainment

Data Product Example: LinkedIn Skills

Skill ExtracSon and StandardizaSon Pipeline


Skill Pages
Skills SecSon on Member Proles
Suggested Skills Algorithm and Email
Skill Endorsements

Skill Discovery: Unsupervised Topics


from Prole SpecialSes SecSon

Extract

10

Topic Clustering & Phrase Sense


DisambiguaSon

11

DeduplicaSon Signals from Mechanical


Turk

12

Sample Task for Mechanical Turk


Workers

13

Mechanical Turk StandardizaSon

Skill Phrase DeduplicaSon

15

Tagging Skill Phrases

Document
(ex: Prole)

Tagging: Extract potenSal skill phrases from text


Lead designer and engineer for the implementaSon of a user-centric,

fully-congurable UI for data aggregaSon and reporSng.

Developed over 20 SaaS custom applicaSons using Python, Javascript


and RoR.

JavaScript

RoR

Python

SaaS

Standardize unambiguous phrase variants


ror
rubyonrails
ruby on rails development
ruby rails
ruby on rail

Ruby on Rails

TokenizaSon
Phrases
(up to 6 words)

Skills Tagger
Skills
(unordered)

Skills Classier

Skills
(ranked by relevance)
16

30

Skills Related to Big Data

31

Skills Correlated with the Job Title


Data ScienSst

32

SkillRank: Algorithm for Top People

33

How do we get more people into the


skill graphs?

Prole

Suggested Skills Inference

How suggested/inferred skills work:


Extract
a`ributes

The skill likelihood is a condiSonal model

ProbabiliSes are combined using a Nave Bayes Classier



If you are an engineer at Apple, you probably know


about iPhone Development.

Feature
Vectors

- Company ID
- Title ID
- Groups ID
- Industry ID
-

Skills Classier



Skills
(ranked by likelihood)

35

Skill RecommendaSons for Your


LinkedIn Prole
4% Conversion

49% Conversion

41

ReputaSon: Build Endorsements


Product to Collect More Graph Edges

42

PYMK + Suggested Skills

43

Viral Growth: 1 Billion Endorsements in 5 Months

44

Social Viral Tagging = Lots of Data


Skill markeSng

Skill recommendaSons
Virality only
Suggested endorsements

How Did We Gather this Data?


1. Desire + Social Proof
2. Viral Loops + Network Eects
3. Data FoundaSon + RecommendaSon
Algorithms

46

Recap: Data Product EvoluSon

Skill ExtracSon and StandardizaSon Pipeline


Skill Pages
Skills SecSon on Member Proles
Suggested Skills Algorithm and Email > 20M members
Skill Endorsements > 60M members, 3B+ Edges
Big product wins in engagement, recall, relevance
SkillRank & ReputaSon integraSon
Sets stage for next generaSon of products

QuesSons?
@peteskomoroch
h`p://datawrangling.com
h`p://www.linkedin.com/in/peterskomoroch

Das könnte Ihnen auch gefallen