Sie sind auf Seite 1von 29

Building

Rich Social Network Data Schema to aid designing, collec1ng and evalua1ng social network data

Eamonn OLoughlin, University College Dublin

Diane Payne, University College Dublin

Why Social Networks


Social interac9ons and social networks are an enduring component of our everyday lives. Social Networks are (among other things): Basis upon which informa1on and behaviours diuse through a popula1on Cornerstone for trade and coopera1on Key component in determining the languages we speak, goals we aspire to, and values we hold

Background & Mo9va9on


Eamonn OLoughlin: Early stage PhD Researcher in the Dynamic Lab with an interest in predic1ve modeling of social behavior using social posi1on and structure. Also interested in large network visualisa1ons and policy design that recognises / leverages network eects. Mo9va9on: Social Network Analysis techniques to uncover paMerns and rela1onships between network structure/ac1vity and micro network outcomes (individual ac1ons or decisions). Conclusion

Data Strategy
Design & Collec1on

Mo1va1on + Intui1on + Problem + Hypothesis

9me

Hypothesis Evalua1on

Background & Mo9va9on


Intended Audience: Researchers who are / will be crea/ng a social network dataset (1) Precau2onary: Nobody wants to realise that they didnt consider some easy-to-collect yet suddenly vital-for-analysis feature a;er their data has already been collected (2) Not Straigh?orward: Social Network data & data design is complex compared to tradi/onal mul/-dimensional data there are many dierent assump/ons that must be made and (as we will see) quite a few trade-os Conclusion

Strategy Todays Data Design focus & Collec1on


Mo1va1on + Intui1on + Problem + Hypothesis

(not covered today: domain of analysis & specic domain challenges)

9me

Hypothesis Evalua1on

What is Social Network Data


Social network views social rela/onships in terms of network theory, consis/ng of nodes (represen/ng individual actors within the network) and /es (which represent rela/onships between the individuals

Brief (Subjec9ve) History of Social Network Analysis

Mar1n EvereM David Krackhardt Nicholas Mullins S.D. Berkowitz Ronald B urt Barry Wellman Anatol Rapoport Stanley Wasserman J. A. Barnes Katherine Faust Nan Lin Peter Marsden Tom A. B. Snijders M ark G ranoveMer Stephen Garry Robins Linton Freeman David Knoke BorgaV Kathleen Carley Karen Harrison White Cook Douglas R. White

Brief (Subjec9ve) History of Social Network Analysis


Diusion of Innova1on Social Networks as a Science Social Capital (& structural holes) Network Theory of Social Capital Sta1s1cal Models for Social Networks UCINet

Mar1n EvereM

David Krackhardt Nicholas Mullins S.D. Berkowitz Ronald B urt Barry Wellman Anatol Rapoport Stanley W asserman Mul1level J. A. Barnes Katherine Faust Nan Lin Analysis & SIENA Peter Marsden Tom A. B. Snijders M ark G ranoveMer Stephen Garry Robins Linton Freeman David Knoke BorgaV Kathleen Carley Karen Harrison White Cook Douglas R. White
Exchange & Trust The Strength of Weak Ties (economic networks) ERGMs Dynamic Network Analysis

Social Constructs / Persistent Social Forma1ons

Brief (Subjec9ve) History of Social Network Analysis


Diusion of Innova1on Social Networks as a Science Communica1on, co-authorship, and colleagueship Social Capital (& structural holes) Network Theory of Social Capital Sta1s1cal Models for Social Networks UCINet Social Network Visualisa1on Social Structures Compara1ve Network Methods

Mar1n EvereM

David Krackhardt Nicholas Mullins S.D. Berkowitz Ronald B urt Barry Wellman Anatol Rapoport Stanley W asserman Mul1level J. A. Barnes Katherine Faust Nan Lin Analysis & SIENA Peter Marsden Tom A. B. Snijders M ark G ranoveMer Stephen Garry Robins Linton Freeman David Knoke BorgaV Kathleen Carley Karen Harrison White Social Networks & Cook Douglas R. White
Social Structure & Cogni1on Exchange & Trust ERGMs Network Realism Inter-organisa1onal poli1cal networks & Terrorist Networks the Internet Dynamic Network Analysis

Social Constructs / Persistent Social Forma1ons Formal Organisa1ons & Social Networks Consensus Analysis

The Strength of Weak Ties (economic networks)

Why is this a Problem


Many design decisions Different Practitioners Difficult to sample network data

Privacy Concerns

Increase in ability to analyse data

Network Data Collection is Expensive

Reduced cost of data storage

Rapid Sensor Tech. Advancement

Dimensional Data -vs- Network Data

Cross-Sec1onal Time Series Panel Data

?? No Standard Representa1on ??

What is the Solu9on


A schema allows us to represent in a par/cular way the structure and features of a par/cular object ! A schema is a mechanism that allows us to dene the design, content, and to some extent, the semantics of a dataset.

. Cross-Sec1onal Time Series Panel Data . . . .

Approach Taken
1. Searched for publically available social network datasets (20-30 dierent datasets) 2. Accesses datasets & related publica1ons. Reviewed structure and collec1on approach 3. Created draf schema 4. Added 110 more datasets to analysis. Rened / iterated schema design 5. Published dataset wiki / solicited input from social network analysis community (INSNA) 6. Completed schema design TBC Dataset Wiki: hPp://dl.ucd.ie

Schema Overview: Structure


. . . . . . . . . . . .
.

Schema Overview: Minimal Representa9on


Overview:
What does a node represent (Individuals? Employees? Researchers? Firms? Organisa1ons? Countries? poli1cal posi1ons?) What does an edge represent (friendship? communica1on? Interac1on?)

Node Represents Edge Represents . . . . . . . . . .


.

Examples: UK MPs on TwiMer (Personal TwiMer Accounts) (Men1ons) Co-authorship in network science (Academic Journal Authors) (Co-Authorship) Infec1ous SocioPaMerns (Visitors to Science Gallery) (face-to-face proximity)

Schema Overview: Node Types


Overview:
Does the network contain > 1 node types? Bipar1te networks are a par1cular class of complex networks, whose nodes are divided into two sets X and Y, and only connec1ons between two nodes in dierent sets are allowed.

Node Represents Edge Represents


Mul1ple Node Types?

Is bipar1te? .

Examples: Terrorist Network Nodes Types: Terrorist, Leader, Poli1cian, Ci1zen Primary School Cumula1ve Network Node Types: Teacher, Student Edge Type: Physical Interac1on between student and teacher

. . . .

.
.

Schema Overview: Edge Types


Overview:
Does the network contain > 1 edge types Are these edges: directed? undirected? weighted (e.g. strength / frequency) signed (e.g. posi/ve / nega/ve)

Node Represents Edge Represents


Mul1ple Node Types?

Is bipar1te?

Examples: The Policy Network of Toxic Chemicals Regula1on in Germany in the 1980s Edge Types: Shared CommiMee Membership, Informa1on Exchange Students data sets (van de Bunt) Edge Types: Unknown, best friend, friend, friendly rela1on, neutral, troubled rela1on, item non-response, actor non-response

Mul1ple Edge Types? => -=> . . . . w +/-


.

Schema Overview: Edge Types


Overview:
Does the network contain > 1 edge types Are these edges: directed? undirected? weighted (e.g. strength / frequency) signed (e.g. posi/ve / nega/ve)

Node Represents Edge Represents


Mul1ple Node Types?

Is bipar1te?

Examples: Enron Email Dataset Nodes: Senior Enron Employees Edge Types: Email Sent, Email Recieved Weight: # of Emails sent

Mul1ple Edge Types? => -=> . . . . w +/-


.

Dining-table partners in a girls dormitory at a New York State training school Nodes: Girls in a New York state dormitory Edge Types: preferred dining partner Weight: order of preference

Schema Overview: Node APributes / Communi9es


Overview:
Do Nodes have aMributes? Are these aMributes sta1c (e.g. gender) or dynamic (e.g. smoking preference)? Are the nodes belonging to some known community?

Node Represents Edge Represents


Mul1ple Node Types?

Is bipar1te?

Examples: Lawyers data (Lazenga) Node APributes: seniority, formal status, oce in which they work, gender, law school aMended, individual performance measurements (hours worked, fees brought in), aVtudes concerning management policy

Mul1ple Edge Types? => -=> w +/-


Commu ni1es

Node AMributes . .

Irish Poli1cians & Organisa1ons on TwiMer Communi9es: Poli1cal Alia1on (Fine Gael, Fianna Fil, Labour, Sinn Fin, )

Schema Overview: Dynamic Data


Overview:
Is the Network Dataset Dynamic? If Dynamic, is the type of temporal data: Event Driven? Con1nuous / Real1me? Periodic Snapshots?

Node Represents Edge Represents


Mul1ple Node Types?

Is bipar1te?

Examples: Kapferer Tailor Shop Interac1ons recorded at two dierent 1me points seven months apart; a strike happened in between (snapshot) Southern Women Network It contains the observed aMendance at 14 social events by 18 Southern women. (event driven)

Mul1ple Edge Types? => -=> w +/-


Commu ni1es

Node AMributes Dynamic .

Schema Overview: Dynamic Data


Overview:
Is the Network Dataset Dynamic? If Dynamic, is the type of temporal data: Event Driven? Con1nuous / Real1me? Periodic Snapshots?

Node Represents Edge Represents


Mul1ple Node Types?

Is bipar1te?

Examples: Norwegian Boards (Aug09) Board membership evolu1on from 1999 to 2009 (con1nuous or real-1me)

Mul1ple Edge Types? => -=> w +/-


Commu ni1es

Node AMributes Dynamic .

Schema Overview: Parallel Data


Overview:
Does the Network come with Parallel Data? Is this parallel data 1me-series? What is the rela1onship of this parallel data to the network data?

Node Represents Edge Represents


Mul1ple Node Types?

Is bipar1te?

Examples: Wiki-Vote Nodes: Wikipedia Editors Edges: Vo1ng Behaviour Parallel Data: Vote outcome MathSciNet: Co-authorship network Node: Journal Ar1cle Authors Edges: Co-authorship Parallel Data: Detailed informa1on about MathSciNet papers: numerical IDs of papers, authors, and categories

Mul1ple Edge Types? => -=> w +/-


Commu ni1es

Node AMributes Dynamic .

Parallel

Schema Overview: Parallel Data


Overview:
Does the Network come with Parallel Data? Is this parallel data 1me-series? What is the rela1onship of this parallel data to the network data?

Node Represents Edge Represents


Mul1ple Node Types?

Is bipar1te?

Examples: Extended Epinions dataset Nodes: Consumers on trust site Epinions.com Edges: Trust / Distrust Parallel Data: Details of all product reviews hosted on the Epinions website

Mul1ple Edge Types? => -=> w +/-


Commu ni1es

Node AMributes Dynamic .

Parallel

Schema Overview: Metadata


Overview:
What are the network boundry condi1ons? Does the network have mising data? Does this missing data have a paMern? Was the data sampled / sub-selected from a larger dataset? What was the process for sampling?

Node Represents Edge Represents


Mul1ple Node Types?

Is bipar1te?

Examples: Newcomb Fraternity 15 weekly sociometric preference rankings from 17 men aMending the University of Michigan in the fall of 1956; data from week 9 are missing. Enron Email Dataset (Boundary Condi1ons)

Mul1ple Edge Types? => -=> w +/-


Commu ni1es

Node AMributes Dynamic

Parallel

Collec1on Metadata

Schema Overview: Metadata


Overview:
What are the network boundry condi1ons? Does the newwork have mising data? Does this missing data have a paMern? Was the data samples / sub-selected from a larged dataset? What was the process for sampling?

Node Represents Edge Represents


Mul1ple Node Types?

Is bipar1te?

Examples: Yahoo! Messenger User Communica1on PaMern Dataset contains a small sample of the Yahoo! Messenger community's communica1on (IM) log at a high level for a period of 4 weeks. Specically, this dataset only records the rst communica1on event from one user to another on a par1cular day, and generates such records for a period of 28 days.

Mul1ple Edge Types? => -=> w +/-


Commu ni1es

Node AMributes Dynamic

Parallel

Collec1on Metadata

Social Network Data Schema


(1 page overview)
A schema is a way to dene the structure, content, and to some extent, the seman/cs of a dataset
What does a node represent (Individuals? Employees? Researchers? Firms? Organisa1ons? Countries? poli1cal posi1ons?) What does an edge represent (friendship? communica1on? Interac1on?) Does the network contain > 1 node types Is the network bipar1te, where 1es can only exist between nodes of two dierent groups. Does the network contain > 1 edge types Are these edges: directed? / undirected? weighted (e.g. strength / frequency) or signed (e.g. pos. / neg) Do nodes have aMributes? / Are these aMributes sta1c or dynamic? Are the nodes belonging to some known community? Is the Network Dataset Dynamic? If Dynamic, is the type of temporal data: Event Driven? / Con1nuous / Real1me? / Periodic Snapshots? Boundry Condi1ons? Missing Data? Sampled from larger dataset? Sampling

Node Represents Edge Represents


Mul1ple Node Types?

Is bipar1te?

Mul1ple Edge Types? => -=> w +/-


Commu ni1es

Node AMributes Dynamic

Parallel

Collec1on Metadata

Eamonn OLoughlin, Dynamics Lab, UCD (eamonn.oloughlin@ucdconnect.ie)

Proposed Use of Schema

Direct Observa9on / Survey

Proposed Use of Schema

Retrieving Data (subset) from an exis9ng system

Proposed Use of Schema

Iden9fying / Assessing publically available data

Thank You

Ques2ons