Beruflich Dokumente
Kultur Dokumente
Umadevi V
Associate Professor
Department of CSE, BMS College of Engineering
Bangalore, India
umav.77@gmail.com
KeywordsSVM;
collaboration
I.
Co-authorship
network;
Future
Introduction
184
183
II.
Related Work
A. Background
The earliest and the most basic link prediction model
was proposed by Liben-Nowell and Kleinberg [1] that works
explicitly on a social network. Every vertex in the graph
represents a person and an edge between two vertices
represents the interaction between the persons. Multiplicity of
interactions can be modelled explicitly by allowing parallel
edges or by adopting a suitable weighting scheme for the
edges. The learning paradigm in this setup typically extracts
the similarity between a pair of vertices by various graphbased similarity metrics and uses the ranking on the similarity
scores to predict the link between two vertices. They
concentrate mostly on the performance of various graph-based
similarity metrics for the link prediction task.
The recent methods and techniques were surveyed by
Mohammad Al Hasan.et.al [2] which includes a variety of
techniques of link prediction ranging from feature-based
classification and kernel based method to matrix factorization
and probabilistic graphical models. These methods vary with
respect to complexity of the model, prediction performance,
scalability, and its generalization ability. They have
considered the traditional (non-Bayesian) models which
extract a set of features to train a binary classification model.
These authors also presented another work on link prediction
using supervised learning [3] in which many features have
been identified. The features are calculated and effectiveness
has been calculated. They also compare the different classes of
supervised learning algorithms in terms of their performance
metrics. This research work involves how to construct a
dataset for a machine learning algorithm. The features selected
were based on node and structural attributes both resulting in
the improved accuracy. They have experimented on two
datasets of co-authorship network using most of the wellknown supervised algorithms and based on the ranking. It is
known that small set of features always yield better
performance results.
According to Kanika Narang.et.al,[4] link prediction
heuristic should take into account not only how close two
nodes is in a network, but also their ability to send and receive
information or to influence each other. This is determined by
the nature of the flow taking place on the network, i.e., the
process by which information is transmitted from one node to
another node to show that how easily two nodes can interact
with or influence each other depends also on the nature of the
flow which is an intermediate between their interactions. They
show that different types of flows ultimately lead to different
notions of network proximity. They measure the performance
of different heuristics on the missing link prediction task in a
variety of real-world social, technological and biological
networks. They show that heuristics based on random walktype processes outperform the popular Adamic-Adar and the
number of common neighbors heuristics in many networks.
While the newly defined heuristics measures did not beat
existing ones in the missing link prediction task, the work
III.
184
185
B. Feature Extraction
A multitude of topological features can be used for a pair
of nodes. In this paper, the features documented in [2] were
chosen for co-author relationship prediction.
1.
Katz = l.|paths(l)u,v|
Common neighbor =
185
186
Number of nodes
10
Number of Edge-pairs
16
Fig. 2.
Metrics
IV.
Accuracy
Values
93%
Precision
93%
Recall
93%
F1-score
0.9333
Results
II.
Real-time Dataset
Synthetic Data
186
187
Number of nodes
Values
1588
Number of edges
2743
Characteristics
Values
2743
5486
5760
2469
Experiment
Metrics
Characteristics
Number of Positive classes
Values
2743
2743
3840
1646
Accuracy (%)
99.6
99.6
99.8
Precision (%)
99.6
99.5
99.5
Recall (%)
99.7
99
99
F1-Score (%)
99.7
99.2
99.2
The supervised algorithm SVM performed well in coauthor relationship prediction with limited number of features.
The collaborations were easier to predict for authors who
are in higher degree of collaboration than less productive
authors in terms of all the four evaluation measures.
V.
Characteristics
Number of Positive classes
Values
2743
1371
2879
1235
Conclusion
187
188
[2]
[3]
[4]
[5]
[6]
[7]
[8]
188
189