Beruflich Dokumente
Kultur Dokumente
Historical View
• Social network analysis has emerged as a key technique in
modern sociology.
• SNA software generates these features from raw network data formatted
in an edge list, adjacency list, or adjacency matrix (also called
sociomatrix), often combined with (individual/node-level) attribute data.
• A graph may be undirected, meaning that there is no distinction between the two
vertices associated with each edge, or its edges may be directed from one vertex
to another.
• Graphs are one of the prime objects of study in Discrete Mathematics as well as
in Data Structure, Algorithms, etc.
• Under the umbrella of social networks are many different types of graphs.
E (Vi , Vj ) E (Vi , Vj )
a(1,2) a(1,2)
(2,1) (2,1) ok
b(2,3) b(2,3)
c(4,3) c(4,3)
Undirected Graph
• An undirected graph is graph, i.e., a set of objects (called vertices or
nodes) that are connected together, where all the edges are bidirectional.
• An undirected graph is sometimes called an undirected network.
Directed Graph
Directed Graph (or digraph) is a graph that is a set of vertices
connected by edges, where the edges have a direction associated
with them.
Basics of GT
• Adjacent Vertex
– If two vertex joined by the same edge
• Adjacent Edge
– If two edges are incident on same vertex (should be a Common vertex among them)
– Edge (a & b) are adjacent becoz vertex 1 is common)
– Edge (a & c) are adjacent becoz vertex 2 is common)
– Edge (a & e) is not adjacent becoz no vertex is common)
• Self Loop
– Edge having same vertex
• Parallel Edge
– When more than one edges are associated with a given pair of vertices
Basics of GT
• Null Graph 1 2
• Complete Graph
– A graph is said to be a complete graph where there is an edge between every pair
of vertices. It is denoted by Kn |v|=n e.g n=4
Total Edges: (n(n-1))/2 =6
• Finite Graph
– A graph with finite number of vertices as well as finite number of edges
Graph Categories
Directed Acyclic Graph
A directed graph with no path that starts and ends at the
same vertex.
OR
is a finite directed graph with no directed cycles.
Weighted Graph
• A weighted graph is a graph in which each branch is given a numerical weight.
• The nodes might be neurons, individuals, groups, organizations, airports, or even
countries, whereas ties can take the form of friendship, communication,
collaboration, alliance, flow, or trade, to name a few
• A weighted graph is therefore a special type of labeled graph in which the labels
are numbers (which are usually taken to be positive).
• In a number of real-world networks, not all ties in a network have the same
capacity.
• In fact, ties are often associated with weights that differentiate them in terms of
their strength, intensity, or capacity or etc
• There are a number of software packages that can analyze weighted networks,
see Social network analysis software
Graphical View of Weighted Graph
Connected Graph/Graph Connectivity
• The graph which form a cycle be connecting each and every
vertices is called Connected Graph
Types of Connected Graph
– Strongly Connected Components
– Weakly Connected Components
– Recursively Connected Components
– Bi-Connected Components
Strongly and Weakly Connected Component
Weakly Connected Components
• These algorithms use graph theory to calculate the importance of any given node
in a network.
• In graph theory and network analysis, indicators of centrality identify the most
important vertices within a graph. Applications include identifying the
– Influential People/Node/Actor/Key in a social network,
– key infrastructure nodes in the Internet
– urban networks
– super-spreaders of disease
– etc
Centrality Measures
• To know which is important actor in SNs
– Degree
– Closeness
– Betwenness
– Eigenvector
– PR
– etc
Centrality Measures
Learning Objectives
• Differentiate between basic centrality measures
• Calculate the Degree centrality by hand.
What do the measures tell me
• Degree: Exposure to the network, Opportunity to directly influence
(Central node, important nodes, Key node)
• Closeness: short distance to all other nodes in the network. Important for
diffusion process (how to diffuse information), rumor are going to spread
more rapidly by people high in closeness centrality.
Possible Pairs
Calculating Between-ness
Calculating Between-ness
Calculating Between-ness
No. of pairs excluding node you are evaluating
Self Practice
Self Practice
Closeness
• Definition:
– This measure scores each node based on their ‘closeness’ to all other
nodes within the network.
• What it tells us:
– This measure calculates the shortest paths between all nodes, then
assigns each node a score based on its sum of shortest paths.
• When to use it:
– For finding the individuals who are best placed to influence the entire
network most quickly.
Closeness Centrality
Distance metric: how many steps take to move
from 1 node to another Calculating Closeness
Close
Self Practice
Page Rank
• PageRank (PR) is an algorithm used by Google Search to rank websites
in their search engine results.
• PageRank was named after Larry Page, one of the founders of Google.
• PageRank is a way of measuring the importance of website pages.
According to Google:
– PageRank works by counting the number and quality of links to a page
to determine a rough estimate of how important the website is.
– The underlying assumption is that more important websites are likely
to receive more links from other websites.
PR Graphically View
Description
• PageRank is a link analysis algorithm and it assigns a numerical
weighting to each element of a hyperlinked set of documents, such as the
World Wide Web, with the purpose of "measuring" its relative importance
within the set.
• The algorithm may be applied to any collection of entities with reciprocal
quotations and references.
• The numerical weight that it assigns to any given element E is referred to
as the PageRank of E and denoted by PR(E). Other factors like Author
Rank can contribute to the importance of an entity.
Continue…
• The rank value indicates an importance of a particular page.
• A hyperlink to a page counts as a vote of support.
• The PageRank of a page is defined recursively and depends on the
number and PageRank metric of all pages that link to it (incoming links).
• A page that is linked to by many pages with high PageRank receives a
high rank itself.
• Other link-based ranking algorithms for Web pages include the HITS
algorithm
Algorithm
• The PageRank algorithm outputs a probability distribution used to
represent the likelihood that a person randomly clicking on links will
arrive at any particular page.
• PageRank can be calculated for collections of documents of any size.
• It is assumed in several research papers that the distribution is evenly
divided among all documents in the collection at the beginning of the
computational process.
• The PageRank computations require several passes, called "iterations",
through the collection to adjust approximate PageRank values to more
closely reflect the theoretical true value.
Continue…
• A probability is expressed as a numeric value between 0 and 1. A 0.5
probability is commonly expressed as a "50% chance" of something
happening.
• Hence, a PageRank of 0.5 means there is a 50% chance that a person
clicking on a random link will be directed to the document with the 0.5
PageRank.
Simplified PR Formula
In the general case, the PageRank value for any page u can be
expressed as: