Sie sind auf Seite 1von 527

SOCIAL NETWORK ANALYSIS

USING STATA
Thomas Grund
University College Dublin
thomas.u.grund@gmail.com

Rome, 2016
http://nwcommands.org
/TUTORIALS AND SLIDES
Dialog boxes and Stata menus
Commands in the command line
Own simple .do files
Own advanced .do files
Own .ado files
Code in Mata
Object-oriented programming
Plugins
Networks? Never heard of it
Isnt this something about Facebook?
Some basic knowledge about networks
Am using it somehow in my work
Sunbelt hospitality suite!
Experienced network scholar
I am a network guru
NETWORK ANALYSIS
- Simple description/characterization of networks
- Calculation of node-level characteristics (e.g. centrality)
- Components, blocks, cliques, equivalences
- Visualization of networks
- Statistical modeling of networks, network dynamics
- .
Purpose-built

Excel/R extensions

C++/Python libraries
NWCOMMANDS
Software package for Stata. Almost 100 new Stata commands
for handling, manipulating, plotting and analyzing networks.
Ideal for existing Stata users. Corresponds to the R packages
network, sna, igraph, networkDynamic.
Designed for small to medium-sized networks (< 10000).
Almost all commands have menus. Can be used like Ucinet
or Pajek. Ideal for beginners and teaching.
Not just specialized commands, but whole infrastructure for
handling/dealing with networks in Stata.
Writing own network commands that build on the
nwcommands is very easy.
BOOK
Grund, T. and Hedstrm, P. (in preparation) Social
Network Analysis Using Stata. StataPress.

http://nwcommands.org -> Tutorials and Slides


password: nwcommands
http://nwcommands.org
GoogleGroup: nwcommands

Twitter: nwcommands

Search nwcommands to find a


channel with video tutorials.
GITHUB
HTTPS://GITHUB.COM/THOMASGRUND/NWCOMMANDS
Prelude
Stata introduction and basics
Command line
Result window
Variables in data
Opens do-file editor
GETTING DATA
Use existing Stata dataset
. use C:\...\mydata.dta, clear
. use mydata.dta, clear

Example datasets
. sysuse auto, clear
. help dta_examples
GETTING DATA

. import
. insheet
MANAGING VARIABLES
In Stata a variable is a column of data. Later on we will use
macros (what programmers normally call a variable).

Setting observations
. set obs 100

Generate variable:
. gen myvar = _n

Replace variable:
. replace myvar = 50 if _n > 50
MANAGING VARIABLES
Recode variables
. recode myvar (1/10=1) (11/20=2) (21/49=3)

Tabulate variable
. tab myvar

Summarize variable
. sum myvar
. help command
. help adopath
. help nwcommands
Data editor
. edit
RETURN VECTOR
Many commands, e.g. summarize, display output, but also leave
some information in the so-called return vector. When you write
sophisticated programs, it makes sense to return your result in the
return vector as well.

In later analysis one


might want to to use
this value
RETURN VECTOR
Whatever is in the return vector, you can access it now in another
program or just display it.
Session VII
Installation
Theoretical motivation
Social networks
INSTALLATION
. findit nwcommands
=> (manually install the package nwcommands-ado)

Or
. net from http://nwcommands.org
. net install nwcommands-ado

. nwinstall, all
UPDATE
Latest version of the software is: 1.6
In case you have a previous version you can update with:

. adoupdate, update all

Or by removing the package first and installing it again as outlined


on previous slide.

. ado uninstall nwcommands-ado


..
INSTALLATION
After installation/update of the package nwcommands-ado, just
run the following command to install other dependencies,
documentation and dialog boxes.

. nwinstall, all
. nwinstall, all
. help nwcommands
THEORETICAL
MOTIVATION
How can we explain
something?
How can we explain
something?

We show how it was


brought about.... and then,
the usual thing, Pinky.
Macro Macro

Situational Transformational
Mechanism Behavioral Mechanism
Mechanism

Micro Micro
Macro Macro

Micro Micro
Macro Macro

Micro Micro
ANALYTICAL SOCIOLOGY

Analytical sociology emphasizes the importance of making


explicit the processes through which whatever it is that we
seek to explain is likely to have been brought about.

From the perspective of analytical sociology it is essential to


base explanations of social or macro-level outcomes on clear
and precise theories of individual action and interaction.
DEPENDENCIES ARE
CRUCIAL
We know that for many settings observations
are simply not independent!

In fact, depependencies are responsible for a


lot of individual behavior and consequently
many social outcomes:

Adoption of innovation
Smoking behavior
Divorce
Suicide
...
Crime
International art fairs

NETWORK DYNAMICS
Changes 2005 - 2006

Yogev, T. and Grund, T. (2012) Structural Dynamics and the Market for Contemporary Art: The Case
of International Art Fairs. Sociological Focus, 54(1), 23-40.
CO-OFFENDING
IN YOUTH GANG

Caribbean East Africa UK West Africa

Grund, T. and Densley, J. (2012) Ethnic Heterogeneity in the Activity and Structure of a Black Street
Gang. European Journal of Criminology, 9(3), 388-406.

Grund, T. and Densley, J. (2015). Ethnic homophily and triad closure: Mapping internal gang structure
using exponential random graph models. Journal of Contemporary Criminal Justice, 31(3), 354370
MANCHESTER UTD 9/9/2006, Old Trafford

TOTTENHAM

Grund, T. (2012) Network Structure and Team Performance: The Case of English
Premier League Soccer Teams. Social Networks, 34(4), 682-690.
SOCIAL NETWORKS
Social
Friendship, kinship, romantic relationships
Government
Political alliances, government agencies
Markets
Trade: flow of goods, supply chains, auctions
Labor markets: vacancy chains, getting jobs
Organizations and teams
Interlocking directorates
Within-team communication, email exchange
NETWORKS ARE EVEN
MORE UNIVERSAL
Food webs
Internet
Power grids, airline networks
Metabolic networks
Neural networks
Economics networks

NETWORK PARADIGM
Not just composition of elements of a system that matters, but
also how the elements are arranged and related with each
other.
An individuals position in a network (social structure)
determines the opportunities and constraints this individual will
encounter.
Individuals change the social world of others. Individuals are
dependent and embedded in a web of relations.

unit of analysis = dyad


DEFINITION
Mathematically, a (binary) network is defined as ! = #, %
where # = 1,2, . . , ) is a set of vertices (or nodes) and %
,, - | ,, - # is a set of edges (or ties, arcs). Edges are
simply pairs of vertices, e.g. % 1,2 , 2,5 .
We write 234 = 1 if actors , and - are related to each other (i.e.,
if ,, - %), and 234 = 0 otherwise.
In digraphs (or directed networks) it is possible that 234 243 .
ADJACENCY MATRIX
We write 234 = 1 if actors , and - are related to each other (i.e.,
if ,, - %), and 234 = 0 otherwise
The matrix 7 is called the adjacency matrix and is a convenient
representation of a network.

288 28:
7=
2:4 2:=
ADJACENCY MATRIX

7 3 1 0 1 0 0 0 0 0

2 0 0 0 1 0 1 1
2 1

3 1 0 0 0 0 0 0
6

4 1 1 0 0 1 1 0

4
5 0 0 0 1 0 0 0

6 1 0 0 0 0 0 0

7 0 0 0 0 0 0 0

5
1

7
ADJACENCY MATRIX

7 3 1 0 1 0 0 0 0 0

2 0 0 0 1 0 1 1
2 1

3 1 0 0 0 0 0 0
6

4 1 1 0 0 1 1 0

4
5 0 0 0 1 0 0 0

6 1 0 0 0 0 0 0

7 0 0 0 0 0 0 0

5
1

7
ADJACENCY LIST

7 3

2 1

5
ADJACENCY LIST

7 3

2 1

5
NETWORK ANALYSIS
- Simple description/characterization of networks
- Calculation of node-level characteristics (e.g. centrality)
- Components, blocks, cliques, equivalences
- Visualization of networks
- Statistical modeling of networks, network dynamics
- .
Use network analysis for the right reasons. Not
because it is cool, but because you think it is likely to
help you answer your research question in a better
way.

Network analysis is not always the thing to do


Be careful as you are likely to see networks everywhere
KNOW YOUR TIES
Social network analysis is not always about friendship. Many
different forms of relations might matter. Probably they are even
influencing each other.

Be very clear on the type of relation you are looking at and do not
throw obviously different types of relationships in the same pot and
treat them equally.
PITFALLS
Networks by themselves do not do anything. Distinguish
between pure structure and what might happen on the
structure.

Networks might be there, but they might not matter for what
you want to explain.

Be super clear on the actual mechanisms


(dissect, dissect, dissect).
DONT

Never start an analysis without


knowing your setting and your data
properly.
Never just go an apply some
concept, measure or method to
networks without having thought for
a while what it tells you.
Never be guided by the method.
Fancy models are completely
useless if you just apply them
without them making sense in
your context.
DO ASK YOURSELF

What do your ties actually


mean?
Can your mechanisms really
work like you suggest?
Is the theory really applicable
to your setting?
What are you measuring and
what are you studying?
NETWORK FINGERPRINTS
Probably you are already familiar with exploring networks and
describing/characterizing them, e.g. calculate degree,
centrality and so on.
We do this for a slightly different reason. The aim is to
characterize networks so that we can classify them and assign
probabilities to networks with certain characteristics (later on).
It is a bit like getting a fingerprint of a network
so that later on we can say how likely it is
that a fingerprint comes from a certain
network.
Session VIII
Getting started
Data management
Network transformation
Node attributes
. help nwcommands
INTUITION
Software introduces netname and netlist.
Networks are dealt with like normal variables.
Many normal Stata commands have their network counterpart
that accept a netname, e.g. nwdrop, nwkeep, nwclear,
nwtabulate, nwcorrelate, nwcollapse, nwexpand, nwreplace,
nwrecode, nwunab and more.
Stata intuition just works.
NETWORK NAMES
AND LISTS
SETTING NETWORKS
Setting a network creates a network quasi-object that has a
netname.
After that you can refer to the network simply by its netname,
just like when refer to a variable with its varname.

Syntax:
var4

var1 var3

var2
1 5

4
3
LIST ALL NETWORKS
These are the names of the
networks in memory. You can
refer to these networks by
their name.

Check out the return vector. Both


commands populate it as well.
Simply the
last network
that you set
or generated
CURRENT NETWORK
Many nwcommands ask for a netname.
When a command allows for a netname to be optional, you do
not have to provide a network name and can just leave it blank.
In this case, the command automatically applies to the current
network.
CURRENT NETWORK

Just return the current network Change the current network


nwset
nwds
nwcurrent
Set a new network with 5 nodes (labelled A, B, C, D, E), with the
following undirected ties (use nwset):

A B, AC, AE, CD, CE

List all networks in memory (use nwds):.


LOAD NETWORK
FROM THE INTERNET

. help netexample
MARRIAGE TIES BETWEEN
FLORENTINE FAMILIES
BUSINESS TIES BETWEEN
FLORENTINE FAMILIES
IMPORT NETWORK
A wide array of popular network file-formats are supported, e.g.
Pajek, Ucinet, by nwimport.
Files can be imported directly from the internet as well.
Similarly, networks can be exported to other formats with
nwexport.
SAVE/USE NETWORKS
You can save network data (networks plus all normal Stata
variables in your dataset) in almost exactly the same way as
normal data.
Instead of save, the relevant command is nwsave.
Instead of use, the relevant command is nwuse.
DROP/KEEP NETWORKS
Dropping and keeping networks works almost exactly like
dropping and keeping variables.
DROP/KEEP NODES
You can also drop/keep nodes of a specific network.
nwdrop webnwuse
nwkeep nwimport
nwclear nwexport
nwuse
nwsave
Import Sampsons monastery networks from this website:
http://vlado.fmf.uni-lj.si/pub/networks/data/ucinet/ucidata.htm
Drop the first four nodes from the networks: SAMPLK1, SAMPLK1,
SAMPLK3
Check the network size with nwset
Import any other Ucinet dataset from this website.
Edgelist format

Matrix format

nwtoedge
Edgelist format

Matrix format

nwtoedge
Edgelist format

Matrix format

nwfromedge
nwtoedge
nwfromedge
NODE ATTRIBUTES

Every node of a network has a nodeid, which is matched with the


observation number in a normal dataset.
In this case, the node with nodeid == 1 is the acciaiuoli family and they
have a wealth of 10.
NODE ATTRIBUTES

Every node of a network has a nodeid, which is matched with the


observation number in a normal dataset.
In this case, the node with nodeid == 1 is the acciaiuoli family and they
have a wealth of 10.
DROP/KEEP NODES
When you drop/keep nodes, by default, attributes are not
included in the change. But with the option attributes()
you can include attribute variables in the drop/keep.
Load the gang data from the internet with webnwuse.
Transform the data in the network gang_valued in an edge list.
Tabulate the values of this edgelist with tab.
Session IX
Schemes
Network visualization
Animation of network dynamics
. webnwuse gang
. nwplot gang, color(Birthplace)
nwplot gang, color(Birthplace) symbol(Prison) size(Arrests)
pazzi pucci

acciaiuoli
salviati

ginori

medici
albizzi

barbadori
tornabuoni
ridolfi

guadagni

castellani lamberteschi
strozzi

bischeri
peruzzi

. webnwuse florentine
. nwplot flomarriage, lab
. nwplotmatrix flomarriage, lab
. nwplotmatrix flomarriage, sortby(wealth) label(wealth)
ANIMATION
. webnwuse klas12
. nwmovie klas12_wave1-klas12_wave4
. nwmovie _all, colors(col_t*) sizes(siz_t*) edgecolors(edge_t*)
nwplot
nwplotmatrix
nwmovie
48 3

10

10
55
103

20
27
48
146 36

32
49
8
44

42

seat = 0 seat = 1
wealth = 3 wealth = 146

Load the florentine network data with webnwuse.


Plot the flomarriage network, so that the color of the nodes indicates if a
family had a seat in the civic council (green = no, pink = yes), the size of
the nodes indicates their wealth, and the label of the nodes indicates their
wealth as well.
0 1
2 3
4

Load the gang network data with webnwuse.


Plot the gang_valued network as a sociomatrix where the color codes
represent the tie values.
klas12b_wave1

sex = 1 sex = 2
delinq1 = 0 delinq1 = 4

Load the klas12b dataset with webnwuse and


make the plot the network klas12b_wave1 above
using nwplot
SOCIAL NETWORK ANALYSIS
USING STATA
Thomas Grund
University College Dublin
thomas.u.grund@gmail.com

10/11 December 2015


University of Cologne

_day2
Session X
Examine networks
Dyads
Triads
Simmelian ties
Components
SUMMARIZE
SUMMARIZE
DENSITY
The density of a network is defined as the proportion of actually
observed ties among the potentially observable ones.
Remember, in a directed, binary network with ) actors there
could be ) ) 1 ties. In an undirected network, there could
be ) ) 1 /2 ties.

5 5
@A)B,C2 = = 0.416
4 41 12
DENSITY
We could also calculate density from the dyad census. Remember,
M = mutual dyads, A = asymmetric dyads.
Actually observed ties are 2G + I
Potential ties are ) ) 1

2G + I
@A)B,C2 =
) )1
5 5
@A)B,C2 = = 0.416
4 41 12
RECIPROCITY
The reciprocity of a network is defined as the proportion of actually
reciprocated ties among the potentially reciprocable ones.
Remember, M = mutual dyads, A = asymmetric dyads.
Actually reciprocated dyads are 2G.
Potentially reciprocated dyads are 2G + I

2G
JAK,LJMK,C2 =
2G + I
2
JAK,LJMK,C2 = = 0.4
5
OBTAIN TIE VALUES
OBTAIN TIE VALUES
OBTAIN TIE VALUES
OBTAIN TIE VALUES
TABULATE NETWORK
TABULATE TWO NETWORKS
TABULATE NETWORK
AND ATTRIBUTE

seat = 0 seat = 1
nwsummarize
nwtabulate
Clear all data from current memory with nwclear.
Load the gang data from the nwcommands-Server using webnwuse.
List the networks in this file (either nwds or nwset will do).
Summarize all networks in this dataset with nwsummarize.
Clear all data from current memory with nwclear.
Load the gang data from the nwcommands-Server using webnwuse.
Show how many ties in the gang network are between nodes with
different Birthplace.
DYAD
A dyad is a pair of actors ,, - in the network, plus the
configuration of the tie variables 234 , 243 between them.
In a directed, binary network, there are ) ) 1 tie variables
located in ) ) 1 /2 dyads.
Dyads can be of three types:

M: mutual

A: asymmetric

N: null
DYAD
A dyad is a pair of actors ,, - in the network, plus the
configuration of the tie variables 234 , 243 between them.
In an undirected, binary network, there are ) ) 1 /2 tie
variables located in ) ) 1 /2 dyads.
Dyads can be of two types:

M: mutual

N: null
DYAD CENSUS
We can describe a network by counting the number of mutual,
asymmetric and null dyads. It is like taking a fingerprint of a
network.

MAN = 132 MAN = 213


ISOMORPHISM
Two networks are isomorph, when they do not differ according to
their fingerprint.

isomorph

MAN = 213 MAN = 213


nwdyads
Load the hpotter data from the nwcommands-Server using webnwuse
List the networks in this file (either nwds or nwset will do)
Summarize the network hpbook1
Calculate the dyad census for network hpbook1
TRIAD
A triad is a set of three actors ,, -, N plus the configuration of all tie
variables 234 , 23O , 243 , 24O , 2O3 , 2O4 between them.
TRIAD
A triad is transitive when there is a tie ,, - , another one ,, N and
a third one N, - . Transitivity shows hierarchy.

transitive non-transitive

k k

i j i j
TRIAD
A triad is cyclical when there is a tie ,, - , another one -, N and a
third one back N, , . Cyclicity indicates the absence of hierarchy!

non-cyclical cyclical
transitive non-transitive

k k

i j i j
TRIAD
We can describe a triad as before by counting the number of
mutual, asymmetric, and null dyads plus (where necessary) a
distinguishing letter.

C = cyclical, T = transitive, U = up, D = down,

There are 16 possible triad configurations.


TRIAD CENSUS
We can now describe a whole network according to its triad
census (similar as we did before with the dyad census). Simply
count the number of times each of the 16 possible triad
configuration appears in the network.
TRIAD CENSUS
TRIAD CENSUS

111U 012 120U 021D

1 1 1

1
TRANSITIVITY
The transitivity of a network gives you an idea about you how
locally connected the network is. It is defined as the proportion of
actually observed transitively closed triples ,, -, N of nodes among
the observed potentially closed paths of length 2 from , to j via N.

Think of it in this way: Given that


?? two nodes i and j are indirectly
i j connected (via k), what is the
probability that there is a direct link
from i to j?
k
nwtriads
What is the description of the triads below?
What is the description of the triads below?

M = 1, A = 2, N = 0, Down M = 1, A = 1, N = 1 Down
120D 111D
Load the glasgow data from the nwcommands-Server using
webnwuse
Calculate the triad census for all networks
SIMMELIAN TIE
A Simmelian tie is a reciprocally connected pair with mutual
ties to third parties and hence it is an edge embedded in a
clique or triple (see Krackhardt 1998).
Tie between A and B is Simmelian when it is reciprocated (tie
from B to A) and then both A and B have a reciprocal
relationship to a third actor D.
B A

C F

D E
_simmelian = 0 _simmelian = 1
SIMMELIAN TIE

_simmelian = 0 _simmelian = 1
nwsimmelian
COMPONENTS
A component of a network is a subgraph of connected nodes
(that does not mean they all need to be connected with each
other).
Isolate nodes form their own component.
COMPONENTS
COMPONENTS
EXTRACT LARGEST
COMPONENT

largest
component
EXTRACT LARGEST
COMPONENT
nwcomponents
nwgen
Load the florentine data from the nwcommands-Server using webnwuse
Extract the largest component of the network flomarriage.
Plot the largest component only.
Session XI
Distance and paths
Distance distribution
Shortest paths
Bridges
Kevin Bacon ?

http://oracleofbacon.org/
Paul Erds ?

http://academic.research.micros
oft.com/VisualExplorer
DISTANCE
Length of a shortest connecting path defines the (geodesic)
distance between two nodes.
DISTANCE
How can we calculate the
distance?

Matrix 7 indicates which


row actor is directly
connected to which column
actor.
7Q =
The squared matrix 7Q
indicates which row actor
can reach which column
actor in two steps.
The matrix 7R indicates
who reaches whom in
S steps.
DISTANCE
When we take the average of the shortest paths between all
nodes (if all are connected) we get the average shortest path
length of the network.

Intuition: If we were to select two nodes at random, how many


steps would it take on average to connect them?

For a random graph one can show that:

ln ) ) = number of nodes

ln N N = average degree of nodes
DISTANCE

1 2
5
3 4

0 1 1 2 2
1 0 2 1 1
@,BCW)KAB = 1 2 0 3 3
1 2 2 0 3
2 1 3 1 0

WYZAJWZA BMJCABC LWC SA)ZC = 1.8


DISTANCE DISTRIBUTION
Networks can have the same average shortest path length,
but still be vastly different from each other.

Better, look at the distribution of shortest paths instead of the


average.
Calculate how often each distance occurs.

0 1 1 2 2
1 0 2 1 1 1 2
1 2 0 3 3 5
1 2 3 0 3 3 4
2 1 3 1 0
DISTANCE DISTRIBUTION
Networks can have the same average shortest path length,
but still be vastly different from each other.

Better, look at the distribution of shortest paths instead of the


average.
Calculate how often each distance occurs.

distance
0 1 1 2 2 10
1 0 2 1 1 8

1 2 0 3 3 6
freq

1 2 3 0 3
4
2
2 1 3 1 0 0
1 2 3 4
DISTANCE
DISTANCE
PATHS
pazzi pucci

acciaiuoli
salviati

ginori

medici
albizzi

barbadori
tornabuoni
ridolfi

How can one get from guadagni

the peruzzi to the castellani


strozzi
lamberteschi

medici? bischeri
peruzzi
PATHS
pazzi pucci

acciaiuoli
salviati

ginori

medici
albizzi

barbadori
tornabuoni
ridolfi

guadagni

castellani lamberteschi
strozzi

bischeri
peruzzi
PATHS

peruzzi tornabuoni

strozzi

salviati

ridolfi

pucci

pazzi
castellani
lamberteschi

guadagni

ginori

bischeri

barbadori albizzi

acciaiuoli

medici
PATHS

pazzi pucci

acciaiuoli
salviati

ginori

medici
albizzi

barbadori
tornabuoni
ridolfi

guadagni

castellani lamberteschi
strozzi

bischeri
peruzzi

mypath_1 = 0 mypath_1 = 1
PATHS OF SPECIFIC
LENGTH
nwgeodesic
nwpath
nwplot
What is the average shortest path length and
what is the distance distribution?

1 2

3 4
What is the average shortest path length and
what is the distance distribution?

1 2 From 1 to 2,3,4: 1,1,2 5 x 1 step


From 2 to 1,3,4: 2,3,1 5 x 2 steps
From 3 to 1,2,4: 1,2,3 2 x 3 steps
3 4 From 4 to 1,2,3: 1,2,2
21
= = 1.75
12
Load the florentine data from the nwcommands-Server
using webnwuse
What are the shortest paths between the albizzi and the
strozzi family?
What are the paths between these two families that have
exactly length 4?
Load the florentine data from the nwcommands-Server
using webnwuse
Calculate all shortest paths between nodes.
Make a histogram of the distribution of path lengths (hint:
use nwtoedge first and then hist)
BRIDGES

In general, a bridge is a direct tie between nodes that would


otherwise be in disconnected components of the graph
BRIDGES

pazzi pucci

salviati acciaiuoli

medici barbadori

ginori albizzi ridolfi


tornabuoni
castellani

strozzi
guadagni

peruzzi
bischeri
lamberteschi
BRIDGES
pazzi pucci

salviati acciaiuoli

medici barbadori

ginori albizzi ridolfi


tornabuoni
castellani

strozzi
guadagni

peruzzi
bischeri
lamberteschi
BRIDGES
pazzi pucci

salviati acciaiuoli

medici barbadori

ginori albizzi ridolfi


tornabuoni
castellani

strozzi
guadagni

peruzzi
bischeri
lamberteschi
BRIDGES
pazzi pucci

salviati acciaiuoli

medici barbadori

ginori albizzi ridolfi


tornabuoni
castellani

strozzi
guadagni

peruzzi
bischeri
lamberteschi
LOCAL BRIDGES
Local bridges are ties between two nodes in a network
that are the shortest route by which information might
travel from those connected to one to those connected to
the other.
If removed, a local bridge (between nodes A and B) would
increase the distance between two nodes A and B by at
least 2.
The length by which the removal of a local bridge
increases the distance between two nodes is called the
span of the local bridge.
LOCAL BRIDGES
LOCAL BRIDGES
LOCAL BRIDGES

pazzi pucci

salviati acciaiuoli

medici barbadori

ginori albizzi ridolfi


tornabuoni
castellani

strozzi
guadagni

peruzzi
bischeri
lamberteschi
LOCAL BRIDGES
pazzi pucci

salviati acciaiuoli

medici barbadori

ginori albizzi ridolfi


tornabuoni
castellani

strozzi
guadagni

peruzzi
bischeri
lamberteschi
LOCAL BRIDGES
pazzi pucci

salviati acciaiuoli

medici barbadori

ginori albizzi ridolfi


tornabuoni
castellani

strozzi
guadagni

peruzzi
bischeri
lamberteschi
LOCAL BRIDGES
pazzi pucci

salviati acciaiuoli

medici barbadori

ginori albizzi ridolfi


tornabuoni
castellani

strozzi
guadagni

peruzzi
bischeri
lamberteschi
nwbridge
Session XII
Network neighbors
Attributes of neighbors
FLORENTINE FAMILIES

Who are the


neighbors?
NEIGHBORS
NEIGHBORS
CONTEXT
pazzi pucci

salviati acciaiuoli

medici barbadori

ginori albizzi ridolfi


tornabuoni
castellani

strozzi
guadagni

peruzzi
bischeri
lamberteschi

wealth = 3 wealth = 146


CONTEXT
pazzi pucci

salviati acciaiuoli

medici barbadori

ginori albizzi ridolfi


tornabuoni
castellani

strozzi
guadagni

peruzzi
bischeri
lamberteschi

wealth = 3 wealth = 146

What is the average wealth of the albizzis network neighbors?


CONTEXT
CONTEXT
CONTEXT
nwneighbor
nwcontext
Load the klas12b data from the nwcommands-Server using webnwuse
Calculate for each node i in the network klas12b_wave1 the average
age of the nodes who nominate node i as friend.
Calculate for each node i in the network klas12b_wave1 the maximum
age of the nodes who get nominated by node i as a friend.
Calculate for each node i in the network klas12b_wave1 the maximum
age of the nodes who get nominated by node i as a friend and node i
himself.
Session XIII
Centrality
Centralization
CENTRALITY

Well connected actors are in a


structurally advantageous position.

Getting jobs
c
Better informed
Higher status
b a d

e
CENTRALITY

Well connected actors are in a


structurally advantageous position.

Getting jobs
Better informed
Higher status

What is well-connected?
DEGREE CENTRALITY
Degree centrality
We already know this. Simply the number of incoming/outgoing
ties => indegree centrality, outdegree centrality
How many ties does an individual have?

e e

^_`abcaa , = d 234 ^3`abcaa , = d 243


4f8 4f8
DEGREE CENTRALITY
Degree centrality
e

^`abcaa , = d 234
4f8

^`abcaa W = 4 c
^`abcaa g = 1
^`abcaa K = 1 b a d

e
DEGREE
The indegree ,)@AZ Y of node Y is simply the number of ties
that point towards Y.
The outdegree MhC@AZ Y of node Y is simply the number of
ties that point away from Y.

indeg(1) = 3 indeg(3) = 1
1 2 outdeg(1) = 2 outdeg(3) = 1

3 4 indeg(2) = 1 indeg(4) = 1
outdeg(2) = 2 outdeg(4) = 1
DEGREE DISTRIBUTION
indegree outdegree
4 4
3 3
freq

freq
2 2
1 1
0 0
1 2 3 1 2 3

indeg(1) = 3 indeg(3) = 1
1 2 outdeg(1) = 2 outdeg(3) = 1

3 4 indeg(2) = 1 indeg(4) = 1
outdeg(2) = 2 outdeg(4) = 1
CLOSENESS CENTRALITY
Closeness centrality
How close is an individual (on average) from all other individuals?

Farness
How many steps (on average) does it take an individual to reach all
other individuals?

e
1 -,
iWJ)ABB , = d S34
j1 S34 = shortest path
4f8 between i and j
FARNESS
Farness
e
1
iWJ)ABB , = d S34
j1
4f8

1 c
iWJ)ABB W = 1 + 1 + 1 + 1 = 1
4
1 7
iWJ)ABB g = 1 + 2 + 2 + 2 = b a d
4 4
e
CLOSENESS CENTRALITY

1
^kl_ma:amm , =
iWJ)ABB ,

1 c
^kl_ma:amm W = 1/ 1+ 1+ 1+1 = 1
4
1 4
^kl_ma:amm g = 1/ 1+ 2+2+2 = b a d
4 7
e
BETWEENNESS CENTRALITY
Betweeness centrality
How many shortest paths go through an individual?

^=anoaa::amm W = 6 c

^=anoaa::amm g = 0
b a d

e
BETWEENNESS CENTRALITY
Betweeness centrality
How many shortest paths go through an individual?

What about multiple shortest paths?


E.g. there are two shortest paths
from c to d (one via a and another
one via e)
c f

b a d

e
Give each shortest path a weight inverse to
how many shortest paths there are
between two nodes.
nwdegree
nwbetween
nwevcent
nwcloseness
nwkatz
What is the closeness centrality of node a?

a
What is the closeness centrality of node a?

1 3
=
1+2+2 5
3
a
CENTRALIZATION
How equally/unequally distributed are the
centrality scores of all individuals?

A network is highly centralized when one individual is very


central and all others are not.
A network is not centralized when all individuals have the same
centrality score.
CENTRALIZATION
The general definition of centralization for non-weighted networks
was proposed by Linton Freeman (1979).
1. Calculate the sum in differences in centrality between the
most central node in a network and all other nodes; and
2. Divide this quantity by the theoretically largest such sum of
differences in any network of the same size.

3 ^p rWs ^p ,
^p =
max Bhr
CENTRALIZATION
80

80
80

60
60

60

40
40

40

20
20

20

0
.04 .045 .05 .055 .06 .065
.06 .07 .08 .09 .1 .11 .06 .08 .1 .12 .14
Weight centralization
Out-strength centralization In-strength centralization
95% CI fitted
95% CI fitted 95% CI fitted
Goals
Goals Goals

Grund, T. (2012) Network Structure and Team Performance: The Case of English
Premier League Soccer Teams. Social Networks, Vol. 34, Issue 4, pp. 682-690.
nwdegree
nwbetween
nwcloseness
nwsummarize
Load the florentine data with webnwuse
Calculate betweenness centrality for the networks flomarriage and
flobusiness.
Calculate betweenness centralization for the flomarriage network.
Load the glasgow data from the nwcommands-Server using webnwuse
Calculate indegree centralization for network glasgow1
Tabulate and visualize the indegree distribution (check out tab and hist)
Calculate betweenness centralization for network glasgow2
Session XIV

Change networks
Symmetrize networks
GANG NETWORK
TABULATE NETWORK
RECODE TIE VALUES
FLORENTINE FAMILIES

pazzi pucci
pazzi pucci

acciaiuoli
acciaiuoli salviati
salviati

ginori
ginori

medici medici
albizzi albizzi

barbadori barbadori
tornabuoni tornabuoni
ridolfi ridolfi

guadagni guadagni

lamberteschi castellani lamberteschi


castellani strozzi
strozzi

bischeri bischeri

peruzzi peruzzi

Marriage ties Business ties


REPLACE TIE VALUES
. help nwreplace
nwrecode
nwreplace
nwsync
nwtranspose
nwsym
nwgen
Load the gang data from the nwcommands-Server using webnwuse
Symmetrize the gang network with nwsym.
Load the gang data again.
Symmetrize the gang network with nwtranspose and network
expressions.
Session XV
Multiplying networks
Adding networks
Network generators
Network expressions
GENERATE NETWORKS
. help nwgen
Load florentine data from the nwcommands-Server using webnwuse
Use nwgeodesic and a network expression to generate a new network
dist2, where a tie between node i and j means that these two nodes have the
distance 2 in the original flomarriage network.
Session XVI
Two-mode and one-mode networks
Dissimilarities
TWO MODE NETWORKS
Many network dataset are by definition two-mode networks
(also known as affiliation or bipartite networks). These are a
particular type of networks with two sets of nodes and ties are
only established between nodes belonging to different sets.
One of the first two-mode datasets to be analysed was the
Davis Southern Club Women dataset (Davis et al., 1941),
which recorded the attendance of a group of women (node set
1) to a series of events (set 2). A woman would be linked to an
event if she attended it.

AFFILIATION DATA
SETTING AFFILIATION DATA
Level 1
Level 1
AFFILIATION DATA

Peter

Tim

Andreas Oxford
Humboldt Richard
Cologne
Clemens

Thomas

UCD
LEVEL 1 PROJECTION

Peter Humboldt

Tim

Andreas Oxford
Humboldt Richard Oxford Cologne
Cologne
Clemens

Thomas

UCD
UCD
LEVEL 2 PROJECTION

Peter

Tim
Andreas Clemens
Tim

Andreas Oxford Richard


Humboldt Richard
Cologne
Clemens

Thomas Thomas

UCD

Peter
ONE-MODE PROJECTION
LEVEL 1 PROJECTION
Humboldt

Oxford Cologne

UCD
LEVEL 2 PROJECTION
Tim
Andreas Clemens

Richard

Thomas

Peter
PROJECTION WEIGHTS
PROJECTION WEIGHTS
(DIS)SIMILARITIES
The dissimilarity between two nodes reflects how dissimilar these
nodes are regarding the ties they have to other nodes (tie
vectors). Different distance measures can be used.

Euclidean distance
The Euclidean distance between two tie vectors is equal to the
square root of the sum of the squared differences between them.
That is, the strength of actor A's tie to C is subtracted from the
strength of actor B's tie to C, and the difference is squared. This
is then repeated across all the other actors (D, E, F, etc.), and
summed. The square root of the sum is then taken.
EUCLIDEAN DISTANCE

-1
0
-1

0 0 0

How
(dis)similar are
y y
x34 = d 23O 24O + 2O3 2O4 , N , W)@ N - nodes 4 and 5
O in their tie
vectors?
EUCLIDEAN DISTANCE

-1
0
-1

0 0 0
= 1 y+ 1 y = 2

y y
x34 = d 23O 24O + 2O3 2O4 , N , W)@ N -
O
What is the Euclidean dissimilarity
between nodes 1 and 2?

1 2

3 4
What is the Euclidean dissimilarity
between nodes 1 and 2?

1 2
= 28z 2yz y + 28{ 2y{ y
3 4 = (1 0)y +(0 1)y = 2
MANHATTEN DISTANCE

x34 = d WgB 23O 24O + WgB 2O3 2O4 , N , W)@ N -


O
MANHATTEN DISTANCE

1
0
1

0 0 0
=2

x34 = d WgB 23O 24O + WgB 2O3 2O4 , N , W)@ N -


O
JACCARD DISTANCE

23O 24O + 2O3 2O4


x34 = 1 d , N , W)@ N -
23O 24O + 2O3 2O4
O
JACCARD DISTANCE

0
0
0

1 1 0
2 1
= 1 =
4 2

23O 24O + 2O3 2O4


x34 = 1 d , N , W)@ N -
23O 24O + 2O3 2O4
O
nwdissimilar
nwdsimilar
.1
Load the gang data with webnwuse

.08
Calculate the dissimilarities between
nodes using the manhatten distance.

.06
Density
Transform the dissimilarities you just

.04
generated in a long edgelist and make
a histogram of the dissimilarities (see

.02
graph)
0
0 10 20 30 40 50
_dissimilar
Session XVII
Expand networks
Homophily
EXPAND VARIABLE TO
NETWORK
Variable Expanded network

234 = YWJ , == YWJ -

mode = same i and j same


i and j different
EXPAND VARIABLE TO
NETWORK
EXPAND VARIABLE TO
NETWORK
pazzi pucci

acciaiuoli tornabuoni
salviati
salviati

pucci
acciaiuoli
barbadori
medici
albizzi
medici pazzi
castellani
ridolfi
tornabuoni ridolfi lamberteschi
strozzi albizzi
guadagni
ginori ginori
peruzzi
guadagni strozzi
bischeri barbadori
bischeri

peruzzi
lamberteschi castellani

seat = 0 seat = 1
EXPAND VARIABLE TO
NETWORK
EXPAND VARIABLE TO
NETWORK
EXPAND VARIABLE TO
NETWORK
EXPAND VARIABLE TO
NETWORK

mode(absdist)

= abs(20 10) = 10
Load the florentine data from the nwcommands-Server using
webnwuse
Calculate for each node in the flomarriage network the average
wealth of those network neighbors who have the same value on
the variable seat (use nwexpand and then nwcontext).
Question: Are co-offending ties between gang members
from the same ethnicity more likely than ties between gang
members from different ethnicities?

Caribbean East Africa UK West Africa


HOMOPHILY
Homophily = tendency of similar people to associate with each
other
For example ties are more likely to form between people with the same age,
education, occupation, religion, income and so on (McPherson, Smith-Lovin, and
Cook 2001), .
One of the most striking empirical regularities in social life.
HOMOPHILY

Empirically observed homophily statistics


Observed count

Match(ethnicity) 63
Match(British) 23
Match(Jamaican) 14
Match(Somali) 6
Match(West African) 20 Caribbean East Africa UK West Africa
ADJACENCY MATRIX

0 1 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0 0

0 1 0 0 0 0 0 1 0

1 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0

0 0 1 0 1 0 1 0 0
0 0 0 0 0 1 0 1 0

0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0

Caribbean East Africa UK West Africa


GROUP SIZE MATTERS
50% 50%

50%

50%
GROUP SIZE MATTERS
50% 50%

50%

50%

If ties are assigned at


random, 50% of the ties
should be between similar
individuals.
GROUP SIZE MATTERS
50% 50% 75% 25%

50% 75%

50%
25%

If ties are assigned at


random, 50% of the ties
should be between similar
individuals.
GROUP SIZE MATTERS
50% 50% 75% 25%

50% 75%

50%
25%

If ties are assigned at If ties are assigned at


random, 50% of the ties random, 62.5% of the ties
should be between similar should be between similar
individuals. individuals.
E-I INDEX
E-I index is the number of ties external to the groups minus the
number of ties that are internal to the groups divided by the
total number of ties.
This value can range from 1 to -1.

%
% = =

Lets say we have 20 red, 10 green and 10
yellow individuals. Furthermore, there are 100
directed network ties. How many of these ties
will be between similarly colored individuals
(when ties are assumed to be assigned
randomly)?
Ties between two red individuals:
100 * 20/40 * 20/40 = 25

Ties between two green individuals:


100 * 10/40 * 10/40 = 6.25

Ties between two yellow individuals:


100 * 10/40 * 10/40 = 6.25

Total = 37.5
ETHNICITY DISTRIBUTION

Ethnicity distribution
Observed count

Total 54
British 24
Jamaican 12
Somali 6
West African 12
Caribbean East Africa UK West Africa

There are 133 edges in the network. How


many of them would be between similar
individuals if they were assigned at random?
ETHNICITY IN DYADS

Total British Jamaican Somali West African

Total 54 24 12 6 12

British 24 (24/54)*(24/54)
(12/54)*(24/54)
Jamaican 12 (12/54)*(12/54)
(6/54)*(24/54)
Somali 6 (6/54)*(6/54)
(12/54)*(24/54)
West African 12 . (12/54)*(12/54)
ETHNICITY IN DYADS

Total British Jamaican Somali West African

Total 54 24 12 6 12

British 24 (24/54)*(24/54)
(12/54)*(24/54)
Jamaican 12 (12/54)*(12/54)
(6/54)*(24/54)
Somali 6 (6/54)*(6/54)
(12/54)*(24/54)
West African 12 . (12/54)*(12/54)

Expected ties between two British individuals

= 133*(24/54)*(24/54) = 26.27
HOMOPHILY

Observed and expected (for networks with same density) match statistics
Observed Expected

Match(ethnicity) 63 41.05
Match(British) 23 26.27
Match(Jamaican) 14 6.57
Match(Somali) 6 1.64
Match(West African) 20 6.57
nwtabulate
nwcorrelate
nwpermute
nwexpand
Load the florentine data from the nwcommands-Server
using webnwuse
Calculate the transitivity of the flomarriage network
(nwsummarize, detail)
Generate 100 undirected random networks with exactly the
same density as the flomarriage network.
Calculate the transitivity scores for the random networks
and plot their distribution.
Compare the observed score against the simulated scores.
Session XVIII
Simulating networks
RANDOM NETWORK

nwrandom 15, prob(.1) nwrandom 15, prob(.5)

Each tie has the same probability to exist, regardless of any other ties.
LATTICE RING LATTICE

nwlattice 5 5 nwring 15, k(2) undirected


SMALL WORLD NETWORK

nwsmall 10, k(2) shortcuts(3) undirected


PREFERENTIAL ATTACHMENT
MECHANISMS
Start: m0 = 2 1 2

Step 1: m = 2, prob = 1 1 2

Step 2: m = 2, prob = 1 1 2

3 4
PREFERENTIAL ATTACHMENT
MECHANISMS
Each step one new node 1 2
enters and forms m = 2
new ties to the existing
nodes. With prob = 1,
these two new ties are 1 2
uniformly sampled, i.e.
each existing node has
the same probability to
become friends with the 3
new node.
1 2

3 4
PREFERENTIAL ATTACHMENT
MECHANISMS
When prob < 1, some 1 2
new ties are more likely
than other new ties. The
weights are proportional
to the current indegree of 1 2
the established nodes.

3
1 2
1 2

3 4
PREFERENTIAL
ATTACHMENT NETWORK

.4
.3
Density
.2.1
0
0 2 4 6 8
_in_degree

nwpref 10, prob(.5)


HOMOPHILY NETWORK
homophily = 5 homophily = -5

male female male female

nwhomophily gender, density(0.05) homophily(5)


nwrandom nwlattice
nwsmall nwpref
nwring
nwhomophily
nwdyadprob
Generate a random network with 50 nodes and exactly 245 ties.
Summarize this network to check that you succeeded.
Simulate 100 random (directed) networks with 50 nodes, 100
mutual dyads and 80 asymmetric dyads.
Generate transitivity scores for all 100 networks.
Make a histogram of the transitivity scores.
Session XIX
Hypothesis testing
Conditional uniform graphs
Quadratic assignment procedure
Permutation tests
HYPOTHESIS TESTING
Classical SNA (mainly descriptive)
Proximity, similarity, centrality, brokerage
Positional measures, equivalence

Hypothesis testing on networks (requires and inferential-


statistical approach)
Crucial are meaningful distributions of test statistics, on which
p-values for hypothesis tests can be based
It is not trivial to construct such meaningful distributions for
complete network data
Is a particular
network pattern
more (or less)
prominent than
expected?
PROBLEM
Even complete randomness produces certain network patterns in
networks, e.g. because of the size of sub-groups.

General strategy:
1. Calculate a network statistic that you are interested in.
2. Think about the properties of the network that you want to
conserve.
3. Generate many random networks that have the same
properties as the observed network.
4. Calculate the network statistic on these conditional random
networks and compare this baseline distribution against the
actually observed network statistic in the observed network.
1 Test-statistic
e.g. number of triads,
number of reciprocal ties,
number of ties between
similar individuals

2 Distribution of test-
statistic under null
hypothesis
e.g. distribution of triads
we can expect when there
is no clustering
Question: Is there more or less
clustering (triads) than expected?
transitivity score is 0.36
Is this a lot?

Problem: We do not know how much


transitivity we should expect by chance given a
certain number of ties in the network.
1 Test-statistic
CJW)B,C,Y,C2_=m = 0.36

Distribution of test-
statistic under null
2 hypothesis
CJW)B,C,Y,C2c:`_ = ??
(NON-)RANDOMNESS

Even random networks have non-random


features. What is random anyway?

Question: how many triads is Question: How many triads is


one likely to observe in a one likely to observe in a
network with 3 nodes and 2 network with 3 nodes and 3
undirected ties? undirected ties?
(NON-)RANDOMNESS

Even random networks have non-random


features. What is random anyway?

Question: how many triads is Question: How many triads is


one likely to observe in a one likely to observe in a
network with 3 nodes and 2 network with 3 nodes and 3
undirected ties? undirected ties?

=0 =1
CONDITIONAL UNIFORM
GRAPHS
Generate random networks with the same size, density, or dyad census
as the observed network and then calculate the test-statistic (transitivity)
on these conditional uniform graphs.

Force the network to have certain properties, e.g. density = condition on


density.
TIE PROBABILITIES
y
Density of the network =
:(:8)

y
Reciprocity of the network =
y

y8z
Density = = = 0.416
{({8) 8y

y8 y
Reciprocity = = = 0.4
M=1, A=2, N=1 (121) y8z
TIE PROBABILITIES
i ? j

Now, assume two nodes


,, - are randomly sampled
from this little network and M=1, A=2, N=1 (121)
look at tie 234 .

What is the probability for PJ 234 = 1 ?

There are 12 possible ties. And 5 of these 12 are realized. That means :

PJ 234 = 1 = = 0.416
8y
CONDITIONAL UNIFORM
GRAPHS
Generate random networks with the same size, density, or dyad census
as the observed network and then calculate the test-statistic (transitivity)
on these conditional uniform graphs.
15

transitivity of
the gang
10
Density

network
5

transitivity of
0

.05 .1 .15 .2
random networks transitivity

that have same kernel = epanechnikov, bandwidth = 0.0115

density as the
gang network
. webnwuse gang2
. nwsummarize gang, detail

transitivity of
the gang
network
. webnwuse gang2
. nwsummarize gang, detail

density of the
observed
network
. nwclear
. nwrandom 54, prob(.092) undirected ntimes(20)
. nwsummarize _all, detail save(myfile)

...

. use myfile, clear


. kdensity transitivity, xline(.363) xscale(range(0 .4))
nwrandom
nwsummarize
Load the florentine data from the nwcommands-Server
using webnwuse
Calculate the transitivity of the flomarriage network
(nwsummarize, detail)
Generate 100 undirected random networks with exactly the
same density as the flomarriage network.
Calculate the transitivity scores for the random networks
and plot their distribution.
Compare the observed score against the simulated scores.
Question: Are co-offending ties between gang members
from the same ethnicity more likely than ties between gang
members from different ethnicities?

Caribbean East Africa UK West Africa


HOMOPHILY
Homophily = tendency of similar people to associate with each
other
For example ties are more likely to form between people with the same age,
education, occupation, religion, income and so on (McPherson, Smith-Lovin, and
Cook 2001), .
One of the most striking empirical regularities in social life.
HOMOPHILY

Empirically observed homophily statistics


Observed count

Match(ethnicity) 63
Match(British) 23
Match(Jamaican) 14
Match(Somali) 6
Match(West African) 20 Caribbean East Africa UK West Africa
ADJACENCY MATRIX

0 1 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0 0

0 1 0 0 0 0 0 1 0

1 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0

0 0 1 0 1 0 1 0 0
0 0 0 0 0 1 0 1 0

0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0

Caribbean East Africa UK West Africa


GROUP SIZE MATTERS
50% 50%

50%

50%
GROUP SIZE MATTERS
50% 50%

50%

50%

If ties are assigned at


random, 50% of the ties
should be between similar
individuals.
GROUP SIZE MATTERS
50% 50% 75% 25%

50% 75%

50%
25%

If ties are assigned at


random, 50% of the ties
should be between similar
individuals.
GROUP SIZE MATTERS
50% 50% 75% 25%

50% 75%

50%
25%

If ties are assigned at If ties are assigned at


random, 50% of the ties random, 62.5% of the ties
should be between similar should be between similar
individuals. individuals.
E-I INDEX
E-I index is the number of ties external to the groups minus the
number of ties that are internal to the groups divided by the
total number of ties.
This value can range from 1 to -1.

%
% = =

Lets say we have 20 red, 10 green and 10
yellow individuals. Furthermore, there are 100
directed network ties. How many of these ties
will be between similarly colored individuals
(when ties are assumed to be assigned
randomly)?
Ties between two red individuals:
100 * 20/40 * 20/40 = 25

Ties between two green individuals:


100 * 10/40 * 10/40 = 6.25

Ties between two yellow individuals:


100 * 10/40 * 10/40 = 6.25

Total = 37.5
ETHNICITY DISTRIBUTION

Ethnicity distribution
Observed count

Total 54
British 24
Jamaican 12
Somali 6
West African 12
Caribbean East Africa UK West Africa

There are 133 edges in the network. How


many of them would be between similar
individuals if they were assigned at random?
ETHNICITY IN DYADS

Total British Jamaican Somali West African

Total 54 24 12 6 12

British 24 (24/54)*(24/54)
(12/54)*(24/54)
Jamaican 12 (12/54)*(12/54)
(6/54)*(24/54)
Somali 6 (6/54)*(6/54)
(12/54)*(24/54)
West African 12 . (12/54)*(12/54)
ETHNICITY IN DYADS

Total British Jamaican Somali West African

Total 54 24 12 6 12

British 24 (24/54)*(24/54)
(12/54)*(24/54)
Jamaican 12 (12/54)*(12/54)
(6/54)*(24/54)
Somali 6 (6/54)*(6/54)
(12/54)*(24/54)
West African 12 . (12/54)*(12/54)

Expected ties between two British individuals

= 133*(24/54)*(24/54) = 26.27
HOMOPHILY

Observed and expected (for networks with same density) match statistics
Observed Expected

Match(ethnicity) 63 41.05
Match(British) 23 26.27
Match(Jamaican) 14 6.57
Match(Somali) 6 1.64
Match(West African) 20 6.57
nwtabulate
nwcorrelate
nwpermute
nwexpand
Load the florentine data from the nwcommands-Server
using webnwuse
Calculate the transitivity of the flomarriage network
(nwsummarize, detail)
Generate 100 undirected random networks with exactly the
same density as the flomarriage network.
Calculate the transitivity scores for the random networks
and plot their distribution.
Compare the observed score against the simulated scores.
Question: Is there more or less correlation
between these two networks than expected?

Padgett, J. and Ansell, C. (1993) Robust Action and the Rise of the Medici, 1400-1434.
American Journal of Sociology 98: 1259-1319
GRAPH CORRELATION
Network 1 Network 2

a c a c

b b

a b c a b c
a 0 1 0 a 0 0 0
b 0 0 1 b 0 0 1
c
1 0 0 c
1 1 0
GRAPH CORRELATION
Network 1

a c Transform adjacency matrix in


a dataset of dyads.

b
row col net1
a b 1
a b c a c 0
a 0 1 0 b a 0
b 0 0 1 = b c 1
c
1 0 0 c a 1
c b 0
GRAPH CORRELATION
Network 1 Network 2

a c a c

b b
net1 net2
1 0
a b c 0 a b c 0
a 0 1 0 0
a 0 0 0 0
b 0 0 1 = 1 b 0 0 1 = 1
c
1 0 0 1 c
1 1 0 1
0 1
GRAPH CORRELATION

row col net1 net2


a c a c
a b 1 0
b b
a c 0 0
b a 0 0
b c 1 1
c a 1 1
c b 0 1 KMJJ )AC1, )AC2 = 0.333
GRAPH CORRELATION

KMJJ_=m = 0.372
Is this a lot?

Problem: We do not know how much


correlation we should expect by chance given
the marriage and the business network!
1 Test-statistic
KMJJ_=m = 0.372

Distribution of test-
statistic under null
2 hypothesis
KMJJc:`_ =? ?
QUADRATIC ASSIGNMENT
PROCEDURE

Scramble the network by permuting the actors


(randomly re-label the nodes), i.e. the actual
network does not change, however, the position
each node takes does.
Re-calculate the test-static on the
permuted networks and compare
it with test-statistic on the
unscrambled network.

Network structure is
controlled for. Keeps
dependencies.
PERMUTATION TEST
2 1

1 3 permutation 3 4

4 2

- 1 0 1
- 1 1 1
1 - 1 1
0 - 0 0
0 0 - 0
1 1 - 0
0 0 0 -
0 0 0 -
GRAPH CORRELATION

KMJJ_=m = 0.372
GRAPH CORRELATION

KMJJ = 0.034
GRAPH CORRELATION

KMJJ = 0.101
GRAPH CORRELATION
Corr(flobusiness, flomarriage)
4
3
density
2
1
0

-.2 0 .2 .4
correlation
based on 100 QAP permutations of network flobusiness

nwcorrelate flobusiness flomarriage, permutations(100)


Question: Are co-offending ties between gang members
from the same ethnicity more likely than ties between gang
members from different ethnicities?

Caribbean East Africa UK West Africa


ARE THERE MORE TIES BETWEEN
MEMBERS WITH THE SAME ETHNICITY?
EXPAND VARIABLE TO
NETWORK
Variable Expanded network

234 = YWJ , == YWJ -

mode = same i and j same


i and j different
ARE THERE MORE TIES BETWEEN
MEMBERS WITH THE SAME ETHNICITY?
ARE THERE MORE TIES BETWEEN
MEMBERS WITH THE SAME ETHNICITY?

KMJJ ZW)Z, ,JCLSWKA = 0.124


ARE THERE MORE TIES BETWEEN
MEMBERS WITH THE SAME ETHNICITY?

KMJJ LAJr ZW)Z , ,JCLSWKA


ARE THERE MORE TIES BETWEEN
MEMBERS WITH THE SAME ETHNICITY?

Corr(gang, same_Birthplace)
10
density
5
0

-.1 -.05 0 .05 .1 .15


correlation
based on 100 QAP permutations of network gang

nwcorrelate gang, attribute(Birthplace) permutations(100)


nwcorrelate
nwpermute
Load the klas12b data from the nwcommands-Server using
webnwuse
-sex pers
Use nwtabulate to show the support ties by sex in the
klas12b_wave1 network.
Calculate expected cell values (hint: option expected) und a Chi-
square statistic (hint: option chi)
Load the klas12b data from the nwcommands-Server using
webnwuse
-sex pers
Use nwcorrelate to show if support ties in the network
klas12b_wave1 are more likely between individuals with the same
sex.
Use the option permutations(100) to obtain a p-value.
Session XX
Logistic regression
Dyad-leve regression
LINEAR REGRESSION

= +

The OLS (ordinary least squares) estimator for ,


minimizes the error sum of squares.
LINEAR REGRESSION

= +

Predicted values can range from to


DICHOTOMOUS OUTCOMES
DICHOTOMOUS OUTCOMES

There are many important research topics for which the


dependent variable is "limited.
In the extreme case a dependent variable can only be 1 or 0.
Examples?
DICHOTOMOUS OUTCOMES

There are many important research topics for which the


dependent variable is "limited.
In the extreme case a dependent variable can only be 1 or 0.
Examples?
ADJACENCY MATRIX

7 3 1 0 1 0 0 0 0 0

2 0 0 0 1 0 1 1
2 1

3 1 0 0 0 0 0 0
6

4 1 1 0 0 1 1 0

4
5 0 0 0 1 0 0 0

6 1 0 0 0 0 0 0

7 0 0 0 0 0 0 0

5
1

7
ADJACENCY MATRIX

7 3 1 0 1 0 0 0 0 0

2 0 0 0 1 0 1 1
2 1

3 1 0 0 0 0 0 0
6

4 1 1 0 0 1 1 0

4
5 0 0 0 1 0 0 0

6 1 0 0 0 0 0 0

7 0 0 0 0 0 0 0

5
1

7
ADJACENCY MATRIX
We write 234 = 1 if actors , and - are related to each other (i.e.,
if ,, - %), and 234 = 0 otherwise
The matrix 7 is called the adjacency matrix and is a convenient
representation of a network.

288 28:
7=
2:4 2:=
LOGISTIC REGRESSION
Dependent variable = binary (1 or 0)
Logistic regression: Pr 23 = 1 = L3
L3
logit L3 = ln = + d O sO3
1 L3

1
Logistic function: logistic s =
1 + Ap

exponential function logistic function


LOGISTIC REGRESSION
Dependent variable = binary (1 or 0)
Logistic regression: Pr 23 = 1 = L3
L3
logit L3 = ln = + d O sO3
1 L3

1
Logistic function: logistic s =
1 + Ap

Odds-ratio: A

By which factor do the


odds increase?

logistic function
QAP REGRESSION

We can use the QAP principle to run


1. Dyad-level logistic regression on dyadic dataset
2. Permute network many times
3. Run dyad-level logistic regression on permuted
networks
4. Compare regression estimate from unscrambled
network with regression estimates obtained with
permuted networks to derive standard errors.

For example:. Grund, T. and Densley, J. (2012) Ethnic Heterogeneity in the


Activity and Structure of a Black Street Gang. European Journal of
Criminology, , Vol. 9, Issue 3, pp. 388-406.
DYAD-LEVEL REGRESSION

Sender and
receiver ID
DYAD-LEVEL REGRESSION

Dyad values
DYAD-LEVEL REGRESSION

Independent
variables
QAP REGRESSION

We can use the QAP principle to run


1. Dyad-level logistic regression
2. Permute network many times
3. Run dyad-level logistic regression on
permuted networks
4. Compare regression estimate from
unscrambled network with regression
estimates obtained with permuted networks to
derive standard errors.

Grund,Ethnic
Grund, T. and Densley, J. (2012) T. and Densley, J. (2012)
Heterogeneity inEthnic Heterogeneity
the Activity in the Activity
and Structure of a and Structure
Black Streetof a
Black Street Gang. European Journal of Criminology, , Vol. 9, Issue 3, pp. 388-406.
Gang. European Journal of Criminology, 9(3), 388-406.
nwqap
Load the klas12b data from the nwcommands-Server using
webnwuse
Use nwqap to show if ties in the klas12b_wave1 network are
more likely between individuals with the same sex.
-sex pers
Now, consider not only similarity in sex, but also absolute
difference in age. Are ties between individuals who are similar
when it comes to age more likely?
Encore
EXAMPLE: OUTDEGREE
Simply the number of outgoing ties for each node.
How many ties friends does an individual nominate?

John

^_`abcaa , = d 234 Peter Tim

4f8

Susan
most nwcommands

nwname, nwset, nwtomata, _nwsyntax, nwunab

quasi-objects (Mata matrix + globals)


THREE STEPS IN
PROGRAMS

1. Parse network

2. Obtain adjacency matrix and meta-information

3. Perform some calculation with the adjacency matrix


EXAMPLE: OUTDEGREE
EXAMPLE: OUTDEGREE

Parse networks.
Populate local
netname.
EXAMPLE: OUTDEGREE

Obtain
adjacency matrix
net
EXAMPLE: OUTDEGREE

Functionality
_NWSYNTAX
Parse networks (and obtain some meta-information)
_NWSYNTAX
_NWSYNTAX

Unabbreviate
network list
_NWSYNTAX

Obtain network
meta-information
_NWSYNTAX

Populate locals
with meta-
information and
parsed network
list
NWNAME
Obtain meta-information
NWNAME
NWNAME
NWNAME


Get ID of a
network
NWNAME

Get meta-
information


NWTOMATA
Obtain adjacency matrix
NWTOMATA
NWTOMATA

Parse network
and populate
local id
NWTOMATA

Make copy of
adjaceny matrix
_nwsyntax
nwname
nwtomata
mata
INDEPENDENCE
Independence means that whatever you observe as outcome
variable in one case does not depend on the value the
outcome variable has in other cases.

Statistical independence is one of the most crucial and


common assumptions in statistics (and practically you make it
all the time).

I = I I

Somehow defeats the purpose of sociology (in my view)


LOOKING BACK
How do the different approaches we looked at

a) allow to test hypotheses on network data?


Typically, the actual hypotheses are not so much
about structure at all, but about who is central, who
links up with whom and so on.

a) address the issue of interdependent data?


Interdependence is seen as a
nuisance, something that
needs to be taken into account,
but not as something of
focal interest.
CONDITIONAL
UNIFORM MODELS
(e.g. tie independence given the dyad census)

a) Hypothesis are tested by working with a network distribution


that enforces a selection of structural constraints.
b) Interdependence is taken into account as far as the structural
constraints already imply it.

Here, structure is not treated as


endogenous (part of the dependent
variable, to be explained), but as
exogenous (here even: enforced). It is not
always clear what to enforce. Approach is
not very flexible.
PERMUTATION-BASED
MODELLING
a) Hypothesis are tested (like in conditional uniform tests) by
working with a network distribution that enforces structural
constraints here even the total structure (actual structure
remains the same).
b) Interdependence is taken into account by completely fixing
the network structure.

Also here, structure is treated as


exogenous. This means it is difficult to study
how dependencies due to structure and
dependencies due to explanatory variables
interact.
P2 MODEL
a) Hypothesis are done based on parameter estimation/model
fitting. The distribution of networks is not fixed (as in previous
approaches) but modelled by a parametric family of models.
b) Within-dyad dependence is modelled through correlation,
between-dyad dependence through random effects (common
sender and/or receiver effects)

Here, structure is treated as endogenous


but in a limited sense. Triad level
dependencies that are not due to common
sender or common receiver effects cannot
be expressed: transitivity, social balance,
preferential attachment ERGM allow for
this!
What is the probability to observe our network?
In the modelling of
networks, we want to
know how likely is it that
we draw the observed
network from a random
distribution of networks.
For that we think of the
space of all possible
networks!
The space of possible
networks/worlds is
huge! If you would
count all possible
networks for the
people in this room
you would be busy
until the end of time.
Build a model that is
good in producing the
network features that
we observed.

Instead of thinking about


the probability of a
particular network, we
think of the probability of
networks having certain
features.
ERGM
= R, a randomly selected network from
the pool of all potential networks
7 = R, here observed network

= , to be estimated
A score given to our
network y using some
parameters and the
m 7 network features s of y
A
=7 =
K

Probability to draw A score given to all


our observed other networks we
network y from all could have observed
potential networks
ERGM
= R, a randomly selected network from
the pool of all potential networks
7 = R, here observed network

= , to be estimated
A score given to our
network y using some
m 7 parameters
A
=7 =
K

B 7 = R, prevalence of micro structures


in the network, e.g. number ties, number of reciprocal ties,
number of triads
ERGM: POSSIBLE
MICROSTRCUTURES
B 7 = R, prevalence of micro structures
in the network, e.g. number ties, number of reciprocal ties,
number of triads

tie count statistic BO 2 = d 234

reciprocity statistic BO 2 = d 234 243

transitive triplets BO 2 = d 234 24O 23O


ERGM: LINK FEATURES
AND PROBABILITIES
Think of the space of all possible isomorph networks (networks
with unique features).

m 7
A
=7 =
K
If parameter vector is set to [0,0,0], then, all unique classes of isomorph
networks get the same score. All possible combination of network features
have the same probability

all possible networks


ERGM: LINK FEATURES
AND PROBABILITIES
Think of the space of all possible isomorph networks (networks
with unique features).

m 7
A
=7 =
K
If parameter vector is not [0,0,0], then, some networks get a higher
score than others, which means, we assume that they are more likely to be
drawn.

all possible networks


ERGM
By changing the parameter vector we can alter the link between
all possible networks and the chances of these networks to be
drawn in a stochastic process (depending on the network features
the possible networks have).

In the estimation we try to find a vector which changes the


probability distribution for all potential networks to be drawn in
such a way that the one network y that we actually did draw, is the
most likely one.

Find so that: max = 7



ERGM: REFORMULATION

m 7
A Although we only look at classes
=7 =
K of networks defined by their
features, there are still too many
of them to calculate this.

m 7
=7 A

proportional to. The actual


proportionality constant is
uncalculable.
ERGM: REFORMULATION

m 7
A
=7 =
K

m 7
=7 A

logit 34f8 ) WKCMJB, 34k = d O BO 7


Of8
ERGM: REFORMULATION
34k = all dyads other than 34 Amount by which the feature
BO 2 changes when 34 is
toggled from 0 to 1.

logit 34 = 1 ) WKCMJB, 34k = d O BO 7


Of8

Probability that
there is a tie from i
to j. Given, n actors AND the rest
of the network, excluding the
dyad in question!
ERGM: INTEPRETATION
ERGMs ultimately give you an estimate for various
parameters O , which mean

If a potential tie 34 = 1 This changes the log-


(between i and j) would odds for the tie 34 to
change the network actually exist by O .
statistic BO by one unit.
ERGM: INTEPRETATION
ERGMs ultimately give you an estimate for various
parameters O , which mean

If a potential tie 34 = 1 This changes the log-


(between i and j) would odds for the tie 34 to
change the network actually exist by O .
statistic BO by one unit.

There is a tendency in the


network for the underlying
When O is significant, it micro-structure defined by BO
means that to be important in generating
the observed network.

(more precisely, in generating networks with


similar features than the one that we observe)
EXAMPLE
Consider an ERGM for an undirected network with parameters for these
three statistics:

1) number of edges Ba`bam 2 = d 234

2) number of 2-stars Bymncm 2 = d 234 23O

3) number of triangles Bnc3:blam 2 = d 234 24O 23O

Then the 3-parameter ERG distribution function is:

m m m
=7 A
EXAMPLE
m m m
=7 A

all possible networks all possible networks

The ERG distribution function (the combination of BO s and O s)


defines how the hypothetically assumed underlying distribution
of all possible networks to be drawn looks like. It says which
networks are more likely to be randomly drawn.
EXAMPLE
and consider the following two 4-node networks and their statistics:

7 7=

Ba`bam 4 3
Bymncm 5 3
Bnc3:blam 1 0
EXAMPLE
we do not know the proportionality constant

7 7=

{ 8
7 A
z z
7= A
EXAMPLE
although we do not know the proportionality constant we can calculate
the ratio between the two probabilities!

7 7=

{ 8
7 A
=
7= A z z

y
How much more =A
likely is 7 in
contrast to 7= ?
EXAMPLE
so, suppose in a larger network the estimation gave the
following parameters:

low density: a`bam = 1.5


positive degree variance: ymncm = 0.1
redundant ties are avoided: nc3:blam = 0.4

7 y
=A
7=
8. y .8 .{
1
=A = A 8.
5.5
i.e. the middle tie is about 5.5 times
likely NOT to exist as to exist (given
the rest of the network)
ERGM FEATURES
Think of ERG models as a probability distribution on a (huge)
space of all possible networks.

The observed network is modelled as if it has been drawn from


this distribution.

The model parameters are


Attached to network statstictis B
These statistics in general correspond to subgraph counts
(local patterns, motifs)
The parameters describe the relative prevalence of the
corresponding subgraph in generating the total graph.

The parameters are estimated in such a way that each change


of a tie (during the process of generating a network) is
considered for the next ties that could change. Structure is
endogenous => dyadic dependence model
ERGM FEATURES
High flexibility due to the many possibilities of choosing
statistic and controlling effects for each other.
Estimation, model specification and interpretation can be
difficult!
nwergm

Das könnte Ihnen auch gefallen