Sie sind auf Seite 1von 72

Social Network Analysis

Historical View
• Social network analysis has emerged as a key technique in
modern sociology.

• It has also gained a significant following


in anthropology, biology, demography, communication
studies, economics, geography, history, information
science, organizational studies, political science, social
psychology, development studies, sociolinguistics, and computer
science.
Overview
• Networks can consist of anything from families, project
teams, classrooms, sports teams, legislatures, nation-states, disease
vectors, membership on networking websites like Twitter or
Facebook, or even the Internet.

• Networks can consist of direct linkages between nodes or indirect


linkages based upon shared attributes, shared attendance at events, or
common affiliations.

• Network features can be at the level of


individual nodes, dyads, triads, ties and/or edges, or the entire
network.
Continue….
• For example, node-level features can include network phenomena such
as betweenness and centrality, or individual attributes such as age, sex, or
income.

• SNA software generates these features from raw network data formatted
in an edge list, adjacency list, or adjacency matrix (also called
sociomatrix), often combined with (individual/node-level) attribute data.

• Though the majority of network analysis software uses a plain text


ASCII data format, some software packages contain the capability to
utilize relational databases to import and/or store network features.
Definition

• Social network analysis (SNA) is the process of


investigating social structures through the use of
networks and graph theory.

• These networks are often visualized through


sociograms in which nodes are represented as points
and ties are represented as lines.
• It characterizes networked structures in terms of nodes (individual actors, people,
or things within the network) and the ties/edges/links (relationships or
interactions) that connect them.
• Examples of social structures commonly visualized through social network
analysis include
• social media networks,
• memes spread,
• friendship and acquaintance networks,
• collaboration graphs,
• kinship,
• disease transmission.
These networks are often visualized through sociograms in which nodes are
represented as points and ties are represented as lines.
Feature
• Visual representations of social networks are important to understand
network data and convey the result of the analysis.

• Visualization often also facilitates qualitative interpretation of network


data.

• With respect to visualization, network analysis tools are used to change


the layout, colors, size and other properties of the network
representation.
Continue….
• Some SNA software can perform predictive analysis.

• This includes using network phenomena such as a tie to predict


individual level outcomes (often called peer influence or contagion
modeling), using individual-level phenomena to predict network
outcomes such as the formation of a tie/edge (often called homophily
models) or particular type of triad, or using network phenomena to
predict other network phenomena, such as using a triad formation at time
0 to predict tie formation at time 1.
Software
Network analysis software generally consists of either packages based
on graphical user interfaces (GUIs) or packages built for
scripting/programming languages.
• Netminer
• R
• Gephi
• NodeXL
• Weka
• Matlab
• Rapidminer
• etc
Graph Theory
• In mathematics, graph theory is the study of graphs, which are mathematical
structures used to model pairwise relations between objects.

• A graph in this context is made up of vertices, nodes, or points which are


connected by edges, arcs, or lines.

• A graph may be undirected, meaning that there is no distinction between the two
vertices associated with each edge, or its edges may be directed from one vertex
to another.

• Graphs are one of the prime objects of study in Discrete Mathematics as well as
in Data Structure, Algorithms, etc.

• It is a tool that helps to solve different problems through Discrete mathematics


Contd…
• Graph theory is also widely used in sociology as a way, for example,
to measure actors' prestige or to explore rumor spreading, notably through
the use of social network analysis software.

• Under the umbrella of social networks are many different types of graphs.

• Acquaintanceship and friendship graphs describe whether people know each


other. Influence graphs model whether certain people can influence the
behavior of others.

• Finally, collaboration graphs model whether two people work together in a


particular way, such as acting in a movie together.
Graph Formulation
• in the most common sense of the term, a graph is an ordered pair G =
(V, E) comprising a set V of vertices or nodes or points together with a
set E of edges or arcs or lines, which are 2-element subsets of V (i.e. an
edge is associated with two vertices, and that association takes the form of
the unordered pair comprising those two vertices).
• To avoid ambiguity, this type of graph may be described precisely
as undirected and simple.
• Other senses of graph stem from different conceptions of the edge set. In
one more generalized notion, V is a set together with a relation
of incidence that associates with each edge two vertices.
• In another generalized notion, E is a multi-set of unordered pairs of (not
necessarily distinct) vertices. Many authors call this type of object a multi-
graph or pseudograph.
Continue…
• All of these variants and others are described more fully below.
• The vertices belonging to an edge are called the ends or end vertices of the
edge. A vertex may exist in a graph and not belong to an edge.
• V and E are usually taken to be finite, and many of the well-known results are
not true (or are rather different) for infinite graphs because many of the
arguments fail in the infinite case. The order of a graph is |V|, its number of
vertices. The size of a graph is |E|, its number of edges.
The degree or valency of a vertex is the number of edges that connect to it,
where an edge that connects a vertex to itself (a loop) is counted twice.
• For an edge {x, y}, graph theorists usually use the somewhat shorter
notation xy.
Application
• Graphs can be used to model many types of relations and
processes in physical, biological, social and information
systems.
• Many practical problems can be represented by graphs.
• Emphasizing their application to real-world systems, the
term network is sometimes defined to mean a graph in which
attributes (e.g. names) are associated with the nodes and/or
edges.
Computer Science Perspective
• In computer science, graphs are used to represent networks of communication, data
organization, computational devices, the flow of computation, etc.
• For instance, the link structure of a website can be represented by a directed graph, in
which the vertices represent web pages and directed edges represent links from one page
to another.
• A similar approach can be taken to problems in social media, travel, biology, computer
chip design, mapping the progression of neuro-degenerative diseases, and many other
fields.
• The development of algorithms to handle graphs is therefore of major interest in
computer science.
• The transformation of graphs is often formalized and represented by graph rewrite
systems. Complementary to graph transformation systems focusing on rule-based in-
memory manipulation of graphs are graph databases geared towards transaction-
safe, persistent storing and querying of graph-structured data.
Basics of Graph Theory
• Graph is a structure defined as G = (V, E)
– Where V is a set of vertices V={v1, v2, …,Vn}
– E is set of edges E={E1, E2,..,Em}

Directed Graph Undirected Graph

E (Vi , Vj ) E (Vi , Vj )

a(1,2) a(1,2)
(2,1) (2,1) ok

b(2,3) b(2,3)
c(4,3) c(4,3)
Undirected Graph
• An undirected graph is graph, i.e., a set of objects (called vertices or
nodes) that are connected together, where all the edges are bidirectional.
• An undirected graph is sometimes called an undirected network.
Directed Graph
Directed Graph (or digraph) is a graph that is a set of vertices
connected by edges, where the edges have a direction associated
with them.
Basics of GT
• Adjacent Vertex
– If two vertex joined by the same edge
• Adjacent Edge
– If two edges are incident on same vertex (should be a Common vertex among them)
– Edge (a & b) are adjacent becoz vertex 1 is common)
– Edge (a & c) are adjacent becoz vertex 2 is common)
– Edge (a & e) is not adjacent becoz no vertex is common)
• Self Loop
– Edge having same vertex
• Parallel Edge
– When more than one edges are associated with a given pair of vertices
Basics of GT
• Null Graph 1 2

– A graph where vertex set is non-empty but edge is empty


• Trivial Graph
– A graph where vertex set contains only one vertex and edge set is empty 1

• Complete Graph
– A graph is said to be a complete graph where there is an edge between every pair
of vertices. It is denoted by Kn |v|=n e.g n=4
Total Edges: (n(n-1))/2 =6

• Finite Graph
– A graph with finite number of vertices as well as finite number of edges
Graph Categories
Directed Acyclic Graph
A directed graph with no path that starts and ends at the
same vertex.
OR
is a finite directed graph with no directed cycles.
Weighted Graph
• A weighted graph is a graph in which each branch is given a numerical weight.
• The nodes might be neurons, individuals, groups, organizations, airports, or even
countries, whereas ties can take the form of friendship, communication,
collaboration, alliance, flow, or trade, to name a few
• A weighted graph is therefore a special type of labeled graph in which the labels
are numbers (which are usually taken to be positive).
• In a number of real-world networks, not all ties in a network have the same
capacity.
• In fact, ties are often associated with weights that differentiate them in terms of
their strength, intensity, or capacity or etc
• There are a number of software packages that can analyze weighted networks,
see Social network analysis software
Graphical View of Weighted Graph
Connected Graph/Graph Connectivity
• The graph which form a cycle be connecting each and every
vertices is called Connected Graph
Types of Connected Graph
– Strongly Connected Components
– Weakly Connected Components
– Recursively Connected Components
– Bi-Connected Components
Strongly and Weakly Connected Component
Weakly Connected Components

• If for each pair of vertices Vi and Vj, there is either a path


P(Vi,Vj) or a path P(Vj,Vi)
Strongly Connected Components

• If there is a path from each vertex to every other vertex


Representing Graphs
Adjacency Matrix
Social Network
• A social network is a social structure made up of a set of social actors
(such as individuals or organizations), sets of dyadic ties, and other social
interactions between actors.
• The social network perspective provides a set of methods for analyzing
the structure of whole social entities as well as a variety of theories
explaining the patterns observed in these structures.
• The study of these structures uses social network analysis to identify
local and global patterns, locate influential entities, and examine network
dynamics.
• Social networks and the analysis of them is an inherently
interdisciplinary academic field which emerged from social psychology,
sociology, statistics, and graph theory
Social Network Services (SNS)
• A social networking service (SNS or social media) is an online platform that
people use to build SNs or social relations with other people who share similar
personal or career interests, activities, backgrounds or real-life connections.
• The variety of stand-alone and built-in social networking services currently
available online introduces challenges of definition; however, some common
features exist.
• SNSs are Internet-based applications Such as
– FB
– Linkedin
– Twitter
– Sina Weibo
– Instagram
Structure based SNA
SNA Measures
• SNA measures are a vital tool for understanding the behavior of networks and
graphs.

• These algorithms use graph theory to calculate the importance of any given node
in a network.
• In graph theory and network analysis, indicators of centrality identify the most
important vertices within a graph. Applications include identifying the
– Influential People/Node/Actor/Key in a social network,
– key infrastructure nodes in the Internet
– urban networks
– super-spreaders of disease
– etc
Centrality Measures
• To know which is important actor in SNs
– Degree
– Closeness
– Betwenness
– Eigenvector
– PR
– etc
Centrality Measures
Learning Objectives
• Differentiate between basic centrality measures
• Calculate the Degree centrality by hand.
What do the measures tell me
• Degree: Exposure to the network, Opportunity to directly influence
(Central node, important nodes, Key node)

• Betweeness: Node with high betweeness has the ability to broke up


resources, information or knowledge from one side of network to another.

• Closeness: short distance to all other nodes in the network. Important for
diffusion process (how to diffuse information), rumor are going to spread
more rapidly by people high in closeness centrality.

• Eigenvector: Connected to influential nodes of high degree (not what you


know but who you know)
Graphically View
Degree
• Degree:
– Degree centrality assigns an importance score based purely on the number of links held by each
node OR the number of edges connected to a node
• What it tells us:
– How many direct, ‘one hop’ connections each node has to other nodes within the network.
• When to use it:
– For finding very connected individuals, popular individuals, individuals who are likely to hold
most information or individuals who can quickly connect with the wider network.
• If the network is directed, we have two versions of the measure:
– in-degree is the number of in-coming links, or the number of predecessor nodes;
– out-degree is the number of out-going links, or the number of successor nodes.
Typically, we are interested in in-degree, since in-links are given by other nodes in the network, while
out-links are determined by the node itself.
Graphical View
Mathematical Notation
Calculation Process
Self Practice
Application
Between-ness Centrality
• Definition:
– Betweenness centrality measures the number of times a node lies on the shortest
path between other nodes.
– What it tells us:
– This measure shows which nodes act as ‘bridges’ between nodes in a network. It
does this by identifying all the shortest paths and then counting how many times
each node falls on one.
– When to use it:
• For finding the individuals who influence the flow around a system.
Between-ness Centrality
Calculating Between-ness

Possible Pairs
Calculating Between-ness
Calculating Between-ness
Calculating Between-ness
No. of pairs excluding node you are evaluating
Self Practice
Self Practice
Closeness
• Definition:
– This measure scores each node based on their ‘closeness’ to all other
nodes within the network.
• What it tells us:
– This measure calculates the shortest paths between all nodes, then
assigns each node a score based on its sum of shortest paths.
• When to use it:
– For finding the individuals who are best placed to influence the entire
network most quickly.
Closeness Centrality
Distance metric: how many steps take to move
from 1 node to another Calculating Closeness

Close
Self Practice
Page Rank
• PageRank (PR) is an algorithm used by Google Search to rank websites
in their search engine results.
• PageRank was named after Larry Page, one of the founders of Google.
• PageRank is a way of measuring the importance of website pages.
According to Google:
– PageRank works by counting the number and quality of links to a page
to determine a rough estimate of how important the website is.
– The underlying assumption is that more important websites are likely
to receive more links from other websites.
PR Graphically View
Description
• PageRank is a link analysis algorithm and it assigns a numerical
weighting to each element of a hyperlinked set of documents, such as the
World Wide Web, with the purpose of "measuring" its relative importance
within the set.
• The algorithm may be applied to any collection of entities with reciprocal
quotations and references.
• The numerical weight that it assigns to any given element E is referred to
as the PageRank of E and denoted by PR(E). Other factors like Author
Rank can contribute to the importance of an entity.
Continue…
• The rank value indicates an importance of a particular page.
• A hyperlink to a page counts as a vote of support.
• The PageRank of a page is defined recursively and depends on the
number and PageRank metric of all pages that link to it (incoming links).
• A page that is linked to by many pages with high PageRank receives a
high rank itself.
• Other link-based ranking algorithms for Web pages include the HITS
algorithm
Algorithm
• The PageRank algorithm outputs a probability distribution used to
represent the likelihood that a person randomly clicking on links will
arrive at any particular page.
• PageRank can be calculated for collections of documents of any size.
• It is assumed in several research papers that the distribution is evenly
divided among all documents in the collection at the beginning of the
computational process.
• The PageRank computations require several passes, called "iterations",
through the collection to adjust approximate PageRank values to more
closely reflect the theoretical true value.
Continue…
• A probability is expressed as a numeric value between 0 and 1. A 0.5
probability is commonly expressed as a "50% chance" of something
happening.
• Hence, a PageRank of 0.5 means there is a 50% chance that a person
clicking on a random link will be directed to the document with the 0.5
PageRank.
Simplified PR Formula
In the general case, the PageRank value for any page u can be
expressed as:

PageRank value for a page u is dependent on the PageRank values


for each page v contained in the set Bu (the set containing all pages
linking to page u), divided by the number L(v) of links from page v.
Random Surfer Model
PR Contd…
PR Calculation
PR Algorithm (Iteration 0)
PR Algorithm (Iteration 1)
https://www.youtube.com/watch?v=P8Kt6Abq_rM
PR Algorithm (Iteration 1)
Finally PR
PR Practice (Random Surfer Model)

Das könnte Ihnen auch gefallen