Sie sind auf Seite 1von 47

CP5074 Social Network Analysis

UNIT II
MODELING AND VISUALIZATION
BY
JAUSMIN KJ,ME
ASSISTANT PROFESSOR
COMPUTER SCIENCE AND ENGINEERING
RMD ENGINEERING COLLEGE
SYLLABUS-UNIT II

• Visualizing Online Social Networks - A Taxonomy of Visualizations - Graph


Representation - Centrality- Clustering - Node-Edge Diagrams - Visualizing
Social Networks with Matrix-Based Representations- Node-Link Diagrams -
Hybrid Representations - Modelling and aggregating social network data –
Random Walks and their Applications –Use of Hadoop and Map Reduce -
Ontological representation of social individuals and relationships.
REFERENCE BOOKS-UNIT II

• Borko Furht, ―Handbook of Social Network Technologies and Applications‖,


Springer, 1 st edition, 2011
• Charu C. Aggarwal, ―Social Network Data Analytics‖, Springer; 2014
• Peter Mika, ―Social Networks and the Semantic Web‖, Springer, 1st edition,
2007.
1.Visualizing Online Social Networks
• Visualization is a powerful technique to facilitate exploring social relationships within social networks.
• various visualization techniques and metaphors were proposed to improve the analysis of social networks
and enhance the human computer interactions.
• For example, in 1950s, computational methods, such as factor analysis and multidimensional scaling
(MDS), were proposed to lay out nodes in social networks.
• Factor analysis was developed to reduce the number of nodes by mapping similar nodes into “factors”.
MDS was further utilized to lay out nodes in a 2D or 3D way that distances between pairs of nodes on the
display correspond to distances between individuals in the data.
• With evolution of computer technologies and visualization techniques, machine-drawn images and screen-
oriented graphics were developed to visualize social networks with more abundant visual components and
interactions.
• Although many visualization techniques have been focused on the discussions, such as displaying fine graph
layouts , coloring, and presenting clear node-edge relations, visualizing complex relations is still
challenging to social network visualization.
2.A Taxonomy of Visualizations
The visualization grouped into four categories : structural(mostly used),semantic(meaning)temporal and
statistical(relationship)
2.1 A structural visualization -focuses its structure(the topology of a graph that represents the actors and
relationships in a social network. There are two predominant approaches to structural visualization: node-link
diagrams and matrix-oriented methods.
2.1.1 The value of network layout in visualization.
 Readability ,clusterability,trustworthy(high level properties
2.1.2 Node-link Diagrams. One of the current challenges in social network visualization is the placement or
layout of these nodes in an effective way.
• Property-based Layout(assigns the value of a node property as a location in a coordinate system.It helps
to discover patterns.
 In radial layout,nodes placed in circle and edges, not convey the visualization of network structure
 To improve readability(radial layout)- target sociograms(centrality measure)
• Force-directed and Energy-based Layouts. draws analogies from a physical structure of rods and springs
connecting spheres with the links and nodes in a network. Forces are designed to satisfy low-level properties
that guarantee minimal overlap of nodes and proximity of related nodes.(closeness)
 results in the “optimal” placement of nodes ,its solution is to iteratively update the location of nodes to minimize an
energy function directly.(criteria for good graph layout)
 limitations: expensive and results in ‘hairball’ due to power law distribution
• Spectral layouts. This type of algorithms are based on spectral algebra on key matrices that can be
extracted from the social structure.eigen vectors of certain matrices (adjacency matrices,Laplacian matrix
used as an embedding coordinates and lower dimensional i.e. 2D)
• 2.1.3 Matrix-oriented Techniques. represent a social network via an explicit display of its adjacency or incidence
matrix.
 In this visualization, each link is represented as a grid location with cartesian coordinates corresponding to the
nodes in the network.
 One of the challenges with this matrix representation is enabling the users to visually identify local and global
structures in the network.
• 2.1.4 Hybrid Techniques. Node-Trix, a hybrid that represents small dense communities within a social network as an
adjacency matrix(matrix oriented techniques), and connects these within a node-link diagram. This methodology
avoids the issues associated with displaying dense networks using links, but retains the readability of the
connections among clusters of nodes.
2.2 Semantic and Temporal Visualization
 Instead of explicit relationships found in the data, these represent high level attributes and connections of actors
and links
• 2.2.1 Ontology-based Visualization. use of ontologies to represent the types of actors and relationships in a social
network. An example is Ontovis(node-link diagram)
 Ontology-graph whose nodes represent node types and links represent types of representations.
• 2.2.2 Temporal Visualization. its a time dependent phenomenon, handling temporal dimension from a structural
point is limited and insufficient
 One of the difficulties to represent time-shortage of dimension to potray dynamic network, to represent time with
temporal dimension. 2 types of dynamic visualization : Flipbooks , movie
• 2.3 Statistical Visualization
 correspond to network statistics that represent the structure of the network, such as (degree, centrality-
importance across the network) and the clustering coefficient(how clusterable are the node in the network)
3. Graph Representation
All fundamental concepts and metrics derived from graph theory. some are given below
• Node degree:
• The degree of a node in a graph is the number of edges incident to the node. If there are loops in the
graph, the degree of a node will be counted twice. Therefore, the maximum number of unique edges in a
graph can be obtained when the loops are excluded.
for undirected graph

where N is the number of nodes.


• Node density:
• It is a graph in which the number of edges is close to the maximal number of edges

• Density of directed graph where E is the Number of Edges


• Path length:
• the path length can be defined as the distances between pairs of nodes in a network graph, and
average path length is the average of these distances between all pairs of nodes.
• Component size:
• A graph is connected if all pairs of nodes are reachable,and if a graph is not connected, the graph
can be partitioned into several connected subgraphs, where each component size can be
calculated by the number of connected nodes in each subgraph.
3.1 centrality
• HITS and PageRank are two most famous representatives using centrality for ranking. HITS analyze the important
nodes based on calculating Authorities (indegrees) and Hubs (out-degrees). PageRank calculates node values based
on out-degrees.
• The three most popularly adopted methods to measure the centrality of a social network are listed below:

 Degree –nodes directly connected to larger nodes considered.if edges directed the in-degree centrality is differentiated from
the out-degree centrality.

 Betweenness - Betweenness centrality is to measure the connectivity of the neighbors of a node and to give a higher value
for nodes which bridge clusters.

 Closeness - The measure of closeness centrality is to take into account how distant a node is to the other nodes in the
network

3.2 Clustering

• Many social networks contain subsets of nodes-highly connected(within subset),to explore this community use the
measures
• Clustering coefficient- measure the degrees of nodes to decide which nodes in a graph tend to be clustered together.
3.3 NODE- EDGE – DIAGRAM
• With the node-edge visualization, many network analysis tasks, such as component size calculation,
centrality analysis, and pattern sketching, can be better presented in a more straightforward manner.
• There are three kinds of layouts:
• Random layout –placing nodes at random geometric locations in the graph and no clear visualization-
O(N)
• Force –directed layout- Also known as a spring layout(edges-spring, the nodes -repelling objects. an
initial random layout will be yielded first, and then the force-directed algorithms will run iteratively to
adjust the positions of nodes until all graph nodes and attractive forces between the adjacent nodes run-
least O(N log N) or O(E),
• Tree layout –A basic tree layout is to choose a node as the root of tree, and the nodes connected to the
root become children of the root node.
2.4 VISUALIZING SOCIAL NETWORKS WITH
MATRIX-BASED REPRESENTATIONS
• online social network services are created to connect social relationships among people
• depict and analyze the visualization of online social networks according to their attributes of sociality,
including Web communities, email groups , digital libraries, and Web 2.0 services.
• online social network visualizations based on different views of social relationships, e.g. user centric social
relationships, content centric social relationships, and hybrid social relationships.
2.4.1 web communities
• The SixDegrees.com website was an early representative created on the basis of the Web interaction model during
1997 and 2001.
• various social network websites and Web-based dating services have been established to build up their social
relationships and communities.
• In 2003, Club Nexus-friendship network community, provided very rich profiles explicitly list their friends by
their profiles and allow for detailed social network analysis ,and identifying activities and preferences that
determine the formation of friendship.
• In 2005,Vizster was developed based on node-edge network layouts for exploring connectivity in large graph
structures facilitate the analysis of social networks, such as highlighting, panning, zooming, and distortion
techniques.
• FOAF (Friend-of-a-friend)- Analyse and visualize human-centric social relationships based on Semantic Web
social metadata-XML/RDF
• Microsoft Research Asia proposed a novel object-level search service, called Entity Cube, to help people
discover real-world entities, such as people, locations, and organizations, and explore their social relationships.
2.4.2 Email groups
• In 2004, Soylent was developed to study the social patterns and the temporal rhythms of daily email
activities.(mutual interaction , collaboration activities clearly visible)
• EXAMPLE: onion pattern, the nexus pattern, and the butterfly pattern
• In 2005, two visual metaphors, Social Network Fragments (SNF) and Post History, were employed to
visualize the major two dimensions of email activities: people and time.
• Relationship from email archive highlighted in SNF
• Post History(calendar panel, contacts panel)-The email exchange activities with time progress visualized
2.4.3Digital libraries
• social networks can be mainly analyzed from two aspects: authors and writings.
2.4.3.1 Co-Authorship Networks
• With the visualization of co-authorships, some characteristics, such as clustering coefficient and average
path length, can be analyzed in co-authorship networks.
• In 2005,social network analysis for co-authorship was in-depth studied in digital libraries.
• In addition to the node-edge representation, a matrix representation was used in the coauthorship network
to help analyze different co-authorship patterns.
2.4.3.2 Co-Citation Relations
• In 2006, a novel visualization tool, called CircleView,(documents with high impact and citation pattern
immediately identified with interactive desigh,highlighted color and circles
• In 2007, an interactive visualization tool was developed to present large co-citation networks with latent
visual cues and allows direct interaction with the visualized graphs.
• In 2009, an innovative visualization technique, called FP-tree, was developed to present co-citation
network from a new perspective, namely, visualizing social networks based on a paper-reference matrix
instead of using a reference-reference matrix.
2.4.4 Web 2.0 services
• Since the concept of Web 2.0 was proposed in 2004, online social activities are becoming more prosperous
than before.
• Many Web 2.0 applications are popularly accessed by users to connect their social networks, such as Twitter
and Facebook.
• Nexus is a visualization application on Facebook communities to illustrate their large network
graphs(recognize relationship complex for some case)
• In 2010, an advanced interactive visualization interface, called IRNet, was proposed to further improve the
shortcomings of Nexus and TouchGraph on visualizing Facebook communities.
2.4.5 visualization of online social networks classification
• In addition, visualization of online social networks can be further categorized into three types by their social
relationships:
• user-centric visualization(access people network ,discover relationship with interest), content-centric
visualization(content based on interest), and hybrid visualization.(different kinds of relationship and
interaction)
2.6 Matrix based representation
2.6.1 Matrix or Node-Link Diagram
Node-link diagrams are more effective for very small (under 20 vertices) and sparse networks ,matrices when
the task is to follow paths in the network.
Advantages of matrices
• Matrices provide powerful overview visualization.
• Matrices do not suffer from node overlapping.
• Matrices do not suffer from link crossing each other.
• Matrices show all possible pairs of vertices.
• Matrices are particularly appropriate for directed and dense networks.
Advantages of node-link diagrams
 These representations are familiar to a wide audience; they constitute a powerful communication tool.
 For small or sparse networks, node-link diagrams were more effective than matrices.
 The space used by matrices is larger than the space to display node-link diagrams. Therefore, node-link
diagrams provide a compact representations.
 Node-link diagrams are more appropriate to perform a number of path-related tasks
2.6.2 Matrix +Node Link Diagram
• Matrix Explorer designed to combine advantages of both representations and to support the visual
exploration of social networks. Following are the steps to combine matrices and node-link diagrams.
• Initiate the exploration  
• Explore interactively and iteratively
• Find a consensus in the data or validate an hypothesis
• Present the findings
2.7 Node Link Diagram
• The principle of node-link diagrams is to graphically represent actors of the network by nodes and
connections by links.(readability and message depends on node position)
2.8 HYBRID REPRESENTATIONS
• Providing both matrix and node-link diagrams to the user has a number of advantages but also drawbacks.
• It requires a large amount of display space.
• At least two display monitors are required to comfortably use Matrix Explorer;
• Switching from one representation to the other may induce high cognitive load to the user.
• Two hybrid representations were developed namely,
• MatLink and NodeTrix
2.8.1 AUGMENTING MATRICES
• Its principle is to augment a standard matrix representation with links on its borders,dual encoding the
connection b/w actors.the two types of links added to representations:
• static links (in white on the figure) and
• interactive links (in a darker shade).
Assessing the Readability of MatLink
• MatLink introduced specific tasks of social network analysis: find a cut point, find a clique(circle) and
find communities (strongly connected groups).
• By the way MatLink significantly improve standard matrix representations.
• The only task for which node-link diagrams still perform better is the identification of cut points. With
MatLink, this task requires to identify specific visual patterns of the links.
Using MatLink for Navigating in the Matrix
• To improve readability of matrices, Matlink supports navigation
Three techniques that provide users with effective tools to navigate in large matrices with MatLink were listed
below:
• Melange: folds the space between two far away nodes as if it was a piece of paper. Users may see side by
side parts of the matrix that are far away.
• Bring-and-go: neighbors of an actor closer as if their links were elastic, by moving the cursor over one
of the neighbor and releasing the mouse, the view and the node travel to its previous location.
• Link Sliding : allows users to locks their cursor to a given link and travel very fast to its destination
2.8.2 MERGING MATRIX AND NODE-LINK DIAGRAM
• NodeTrix is a hybrid visualization merging node-link diagrams and matrices. The principle of NodeTrix is to
represent the global network as a node-link diagram and the locally dense subparts as matrices.
Interactive Exploration
• NodeTrix developed a number of interactions based on traditional drag-and-drop of objects with the
mouse cursor for ease creation, exploration and edition of matrices.
Drawback
• Making it impossible to place an actor in two different communities.
Presenting Findings
• NodeTrix can be used for both exploration and communication because matrices can be expanded showing
detailed information on actors and connections showing higher-level connection patterns.
2.9 MODELLING AND AGGREGATING SOCIAL NETWORK DATA

• 1st,Maintaining the semantics of social network data is crucial for aggregating social network
information, especially in heterogeneous environments where the individual sources of data are under
diverse control.
• 2nd,semantically representations can facilitate the exchange and reuse of case study data in the
academic field of Social Network Analysis.
• The possibilities for electronic data exchange has already revolutionized a number of sciences with the
most well-known examples of bio-informatics and genetics.
2.9.1 State-of-the-art in network data representation
• The most common kind of social network data can be modeled by a graph where the nodes represent
individuals and the edges represent binary social relationships. (Less commonly, higher-arity
relationships may be represented using hyper-edges, i.e. edges connecting multiple nodes.)
• The most commonly encountered formats are those used by the popular network analysis packages
Pajek and UCINET. These are text-based formats which have been designed in a way so that they can
be easily edited using simple text editors.
2.9.2 Ontological representation of social individuals
• The Friend-of-a-Friend (FOAF) ontology that we use in our work is an OWL based format for
representing personal information and an individual’s social network.
• FOAF greatly surpasses graph description languages in expressivity by using the powerful OWL
vocabulary to characterize individuals.
• The idea of FOAF was to provide a machine processable format for representing the kind of
information that made the original Web successful, namely the kind of personal information described
in homepages of individuals.
• Thus FOAF has a vocabulary for describing personal attribute information typically found on
homepages such as name and email address of the individual, projects, interests, links to work and
school homepage etc. 
2.9.3 Ontological representation of social relationships
• Ontological representations of social networks such as FOAF need to be extended with a framework for
modelling and characterizing social relationships for two principle reasons:
1. To support the automated integration of social information on a semantically basis
2. To capture established concepts in Social Network Analysis.
• The characteristics of social relationships:
1) Sign(positive and negative attitudes of relationship)
2)Strength(closeness or tie strength b/w nodes)
3) Provenance
4) Relationship history(interaction,indivuals)
5) Relationship roles
4. Random walk and its Application
5.Use of Hadoop and mapreduce

Map reduce
• · Data-parallel programming model for clusters of commodity machines
• · Pioneered by Google
- Processes 20 PB of data per day
• · Popularized by open-source Hadoop project
- Used by Yahoo!, Facebook, Amazon, …
 
Map Reduce used for
At Google:
• 1. Index building for Google Search
• 2. Article clustering for Google News
• 3. Statistical machine translation
At Yahoo!:
• 1. Index building for Yahoo! Search
• 2. Spam detection for Yahoo! Mail
At Facebook:
• 1. Data mining
• 2. Ad optimization
• 3. Spam detection
Challenges
· Cheap nodes fail, especially if you have many
- Mean time between failures for 1 node = 3 years
- MTBF for 1000 nodes = 1 day
- Solution: Build fault-tolerance into system
· Commodity network = low bandwidth
- Solution: Push computation to the data
· Programming distributed systems is hard
- Solution: Users write data-parallel “map” and “reduce” functions, system handles work
• distribution and faults
Hadoop Components
· Distributed file system (HDFS)
- Single namespace for entire cluster
- Replicates data 3x for fault-tolerance
· MapReduce framework
- Executes user jobs specified as “map” and “reduce” functions
- Manages work distribution & fault-tolerance

Hadoop Distributed File System


· Files split into 128MB blocks
· Blocks replicated across several data nodes (usually 3)
· Namenode stores metadata (file names, locations, etc)
· Optimized for large files, sequential reads
· Files are append-only
Thank you

Das könnte Ihnen auch gefallen