Sie sind auf Seite 1von 5

Neighborhood Integrated Matrix Factorization for

Temporal Prediction

Jadhav Trishna J. Prof.S.S.Banait


Department of Computer Engineering Department of Computer Engineering
KKWIEER, Nashik,India. KKWIEER, Nashik,India.
Savitribai Phule Pune University,Pune Savitribai Phule Pune University,Pune
Email: trish.jgd@gmail.com Email: ssbanait@kkwagh.edu.in

Abstract—Link Mining and Temporal Link Prediction is for many real applications. Link prediction is a sub-field of
emerging trend in recent years. Link mining deals with hetero- social network analysis [10]. It is concerned with the problem
geneous and homogeneous data sources that generates link data of predicting the (future) existence of links amongst nodes in
and this link data provide scope for collaborative filtering tasks, a social network. The link prediction problems is interesting
which is a prime requirement in recommending systems and a in that it investigates the relationship between objects, while
significant role is played in predictive analytics. The proposed
traditional data mining tasks focuses on objects themselves
introduction of Neighborhood Integrated Matrix Factorization
method will improve the accuracy of missing value predictions as [5].Dynamic interactions over time add another dimension
preheuristic task of Neighborhood Similarity Computation which to challenge of mining and predicting link structure. This
produces object profiles. study focuses on basic link mining tasks and the problem
of temporal link prediction. In this problems given link data
Keywords—Link Mining;Matrix Factorization;Neighbourhood for T time steps, can one predict the relations among data
Integrated Matrix Factorisation;Neighborhood Similarity Compu-
objects at time T+1,T+2,..,T+k. where (k ¿0), is a point of
tation;Predictive Analytics.
interest. Section 2 of this review is to introduce the various
link mining tasks and addresses the categories of link mining
I. I NTRODUCTION taxonomy that includes object, link and graph. Various time
series analysis approaches for statistical data and proposed
Many objects and entities in the world are dependent,
Neighborhood Integrated Matrix Factorization [1] method in
and linked to many other objects through a diverse set of
state space approach for matrix factorization.
relationships: people have friends, family and coworkers:
scientific papers have authors , venues , and references
to other papers; web pages links to other web pages and Objective of proposed work is as follows:
have hierarchical structures; proteins have locations and
functions, and interact with other proteins. In link mining, the
connections among objects are explicitly modeled to improve To design a learning machine system using state space
performance in task such as classification, clustering, and model such as Kalman Filter, by appling neighborhood matrix
ranking,as well as enabling new applications, such as link factorization to capture temporal dynamics and latent factors
prediction. Most link mining problems have a great uncertainty in link data. This project aims to improve the prediction
in them. The data which is linked is typically very noisy and accuracy of such collaborative filtering system by minimising
incomplete [9]. The data in different analysis applications error and reducing noise in link data.
for example social networks, communication networks, web Relevant Objectives:
analysis, and collaborative filtering consists of relationships, 1. To capture temporal dynamics in link data.
this relationships can be considered as links, between objects 2. To find the latent factors/ features.
[10]. For example, two people may be linked to each other if 3. To model predictive function.
they exchange emails or phone calls. These relationships can 4. To predict next step links.
be modelled as a graph, where nodes correspond to the data
objects (e.g., people) and edges correspond to the links (e.g., a The remainder of paper is organized as follows: Section 2
phone call was made between two people). The link structure discuss the related work done and its shortcoming. Section 3
of the resulting graph can be exploited to detect underlying describes motivation. An overview of the proposed scheme is
groups of objects, predict missing links, rank objects, and given in section 4. Section 5 discusses the expected results of
handle many other tasks [3].Link mining refers to data mining the system. Finally, we conclude the paper.
techniques that explicitly consider these links when building
predictive or descriptive models of the linked data. Commonly II. L ITERATURE S URVEY
addressed link mining tasks include object ranking, group
detection, collective classification, link prediction and sub Conventional approaches deals with object classification,
graph discovery. This is actually an exciting, and rapidly ranking, entity resolution tasks like Object related tasks ,Graph
increasing area. There is not yet comprehensive framework related tasks and Link related tasks.This tasks can be ap-
that can support a combination of link mining tasks as needed proached based on similarity score, topological pattern mining,
and content filtering. Recent work contributed in this regard
is based on probabilistic approaches as well as graph based
feature learning. Aim of this survey is to set the basis for con-
structing a hypothesis. So that proposed hypothesis is to model
a predictive function. One can set the base line to understand
the predictive function as follows: Suppose state of a system is
captured using X which is to be mapped to produce output say
Y, then hypothesis can be set as h : X − > Y Such that for any
input x which belongs to range of X given to function say h(x)
shall provide the estimated output y which belongs to range of
Y. To approach above function, one need to consider the error
and need to track those changes in some variable, so as to more
the iterations of function shall result in identifying minimum
required number of such variables to control the correcting and
updating steps of Kalman Filter. Here one optimization step Fig. 1. Block diagram of system
such as maximum likelihood or Expectation maximization is to
be introduced to learn those variables. The above requirement
of identifying variables needed for machine learning is a basic Gopal of electrical engineering discipline [24]; that is use of
functional requirement.Therefore suitable technique is using state space approach. Time variant state space system models
matrix factorization [5] to decompose a variable in factors are designed in such a way that they enable the understanding
and learn the strengths of these factors over evolving time. of input and output of system in more casual manner such that
Expectation maximization is to be incorporated, and is the it simply can take care of the past and the present of the system
second functional requirement, to deal with capturing temporal and thus shape its future. Combining matrix factorization for
dynamics [5]. This configuration shall be suitably traced using collaborative filtering can give significant improvements in
state space model like Kalman Filter[3]. Resulting output is results and quote by Koren gave assurance to go with proposed
modelled predictive function that will be an optimal state of assumption. Koren quoted- ”’...matrix factorization techniques
system that is to be measured as root mean square error over have become a dominant method-ology within collaborative
number of iterations. filtering recommenders. What makes this tech-nique even more
convenient is that models can integrate naturally many aspects
III. M OTIVATION of the data, such as multiple forms of feedback, temporal
dynamics, and confidence levels....”’ in matrix factorization
Link mining task is been studied as a part of understanding for recommender systems by Yehuda Koren, 2009[5] Also
the relations among objects. So far essential observation about an-other discovery of dynamic matrix factorization, a state
links between objects is for instance; homogeneous networks space approach by john, 2012[3] helped to stick to proposed
include single mode social networks, such as people connected approach that is A state space approach for temporal link
by link like friendship, or the World Wide Web, a collection of prediction.
linked web pages. Examples of heterogeneous networks such
as in bibliographic domains describe authors, publications, and IV. I MPLEMENTATION D ETAILS
venues.
A. Proposed System
More over interest lies in identifying the pattern of ex-
istence of link in such networks. Here terminology network Function of each block is discussed below:
means group of objects, objects of interest can be from various
different domains +possessing features or properties. Various
• 1.Load Dataset This module loads dataset obeying
approaches like content filtering and collaborative filtering[5]
state space model so that to check effectiveness of
has been applied to find the pattern of interests and various
proposed collaborative Kalman filter can be analyzed.
clustering approaches played significant role for grouping such
Here generative dataset stores number of items, users,
objects. However most of the literatures narrate on static
item factors, timeslots, observation ratio, and gener-
snapshots of graph, and focused on identifying missing links
ates dataset obeying state model such that white noise
in graph [10][11]. Other introduced one more point of interest
is assumed to be added to dataset.
that is potential or possibility of existence of link based on
properties of objects and calculating the similarity score of • 2.Kalman Filter This module is consist of functionality
those objects[12]. On the parallel lines, use of such scores for following recursive operations of Predict- Correct
were combined with link prediction and forecasting methods phase further it undergoes through RTS smoother
in ”Temporal link prediction using matrix and tensor factoriza- treatment for finding optimal state estimation: Predict
tions” by Dunlavy D. M. Kolda, T. G., and Acar E; published phase 1.Project the state ahead 2.Project error
in ACM Transaction on Knowledge Discovery Data 5, Article covariance ahead.
10 (February 2011).[3]. Here another dimension that motivated
to work in this area is ”time series analysis” of data which has Update (correct) phase
gained much attention by statisticians in recent trends of busi- 1.To compute Kalman Gain
ness intelligence and predictive analytics.[6] To work on the 2.Update estimate with measurement
issue of evolving behavior of factors, there is a need of more 3.Update covariance
robust approach, and that found in control systems by Nagrath
TABLE I. F UNCTIONAL D EPENDANCY
RTS Smoother
1.To compute optimal estimates of states using NIMF f1 f2 f3 f4 f5 f6 f7
2.To compute predicted obervations f1 1 0 0 0 0 0 0
3.To compute missing value prediction f2 1 1 0 0 0 0 0
f3 0 1 1 0 0 0 0
f4 0 0 1 1 0 0 0
• 3.RMSE Measurement This module is conclusion of f5 0 0 0 1 1 0 0
experiment and gives root mean square error mea- f6 0 0 0 0 1 1 0
surements recorded at each iteration, objective of this f7 0 0 0 0 0 1 1
measurement is to check for effective learning of
machine and provide plot of the RMSE behavior.
dynamics in system and provides optimal estimates as user
B. Algorithms state at time T+1. Difference between predicted matrix and
true matrix are recorded as RMSE outputs.
For implementation of this system following algorithms
have been used ]u The proposed system S is defined as follows
1. State space design for generating dataset S=M,N,K,T,O
2. Kalman Filter and Smoother where M is a set of items.
Algorithm for generating dataset
1. Input: M = m1 ,m2 ,....... mM
• Number of Items (M) N is a set of users
N = n1 ,n2 ,.....,nN
• Number of Users(N) K is set of factors
• Number of latent factors(K) K = k1 ,k2 ,.....,kK
T is a set of time slots
• Number of timeslots (T) T= t1 , t2 ,....., tT
O is set of observations = O1,1,1 , O1,2,2 ,Om,n,t
• Observation ratio that is percentage of data in training
V is a item factor matrix
set (Oratio )
Vi = vi,1 ,vi,2 , vi,k ........iM
2. Output: U is user factor tensor
Ui ,t = ui,1,1 ,ui,2,1 ,......,ui,k,t , iN
• Stationary transition process matrix (A) Functionality of this system is to output RMSE values as a
• User factor tensor (Ut ) result of predicted to true
values. The system design includes following main functions
• Item factor matrix (V) 1. Generate Dataset (F1)
F1(N,M,K,T,Oratio ) - ¿ (V,Ut )
• True preference tensor (Ytrue )
2. Load generated dataset(F2)
• Observation preference tensor (Yobserved ) F2(path)− > D
3. Initial Estimates for Kalman Filter(F3)
3. Read input values F3(D)− > Initial values for kalman filter parameters
4. Specify stationary transition process matrix 4. Predict(F4)
5. Run dynamics to generate Ut ,V, Ytrue ,Yobserved F4(Initial mean vector and covariance matrix) − > (A priori
6. User factor tensor is computed using transition matrix(A) estimates)
and contains transition process noise 5. Correct (update) (F5)
7. True preference tensor is computed by multiplying item F5(predicted values and true values)− >(posteriori estimates
factor and user factor tensor Observation and covariance matrix)
preference tensor is a subset of true preference tensor and is 6. RTS Smoother(F6)
sparse in nature as per Observation ratio percetage. F6(posteriori estimates)> (use NIMF to get optimal estimates
8. Save dataset by user i. Here all Hi ,t are subsets of of states)
same fixed V, also measurement noise Zi ;t distributed N(0; 7. RMSE Measurements(F7)
Rit ) is also included, so overall observation model is yi ; F7(optimal estimates of states, V, Ytrue)− >(predicted
t = Hi ;tXi;t + Zi ;t.........i = 1..........N The product Hi ;tXi;t preference matrix and RMSE
gives corresponding observations in O tensor of RMXNXT measurements).
.This setup gives flexibility for generating dataset by tuning
different parameter regimes.
TABLE I shows the functional dependency matrix.
C. Mathematical Model
This system takes set of M Items and N User with T V. E XPERIMENTAL S ETUP
timeslots preference tensor as input, After generating item
A. Dataset
factor matrix and user factor tensor, set O is a true preference
tensor and subset of O at various time slots known as obser- This machine learning project utilises Generative dataset
vations forms sequential input for system S. S learns temporal that follows state space model. A generative dataset gives
TABLE II. RMSE P LOT FOR TABLE 1

Oratio=0.4 Iteration Number

1 2 3 4 5 6 7 8 9 10

T=10 0.172 0.138 0.135 0.134 0.135 0.136 0.136 0.137 0.137 0.138

Fig. 3. Proposed RMSE Measurement

VI. C ONCLUSION
The presented work is motivated by temporal link predic-
tion. This collaborative Kalman filtering approach and Neigh-
Fig. 2. RMSE Measurement bourhood Integrated Matrix Factorization approach focus on
capturing temporal dynamics so as to provide better accuracy
in prediction tasks and missing value prediction task. A state
insights on how proposed algorithm perform in different pa- space model called Kalman Filter captures temporal dynamics
rameter regimes as compared to collected dataset. For this of time series data. Further optimal machine learning is derived
proposed system dataset is generated as: item factor matrix using RTS smoother to predict optimal estimates of states.The
V iid N(0, GV) and initial user factor matrix U(0) iid N(0, presented work is validated on generative dataset and shown
GU), transition matrix is a the weighted sum of identity matrix significant improvements in reducing RMSE also tested on
and random matrix and observation triplet (i, j, t) is uniformly sparse Movielens dataset for effectiveness in prediction of
drown iid from preference tensor. Presented work is tested ratings.
on real world dataset such as Movielens considering various
parameter regimes. Movielens dataset consist of movies, users
and rating given by user to movie at some time. Rating given ACKNOWLEDGMENT
by user to movie is in range of 0 to 5, and movie is having 19 I am thankful to my guide Prof.S.S.Banait,Computer En-
factors. In each factor vector binary entries represent presence gineering,K.K.W.I.E.E.R., Nashik for his guidance, encour-
or absense of genre for that movie. agement and the interest shown in this project by timely
suggestions and helpful guidance in this work. His expert
B. Performance Measure suggestions and scholarly feedback had greatly enhanced the
effectiveness of this work.
Performance of system is measured over generative as well
as real world dataset such as Movielens dataset. Following
are discussions about rigorous testing conducted on available R EFERENCES
system, and data tables showing RMSE values for 10 itera-
[1] Zibin Zheng,Hao Ma,Michael R.Lyu,Collaborative Web Service QoS
tions, observed over permutations of number of Items, Users, Prediction via Neighbourhood Integrated Matrix Factorization.IEEE
Factors, Timeslots and percentage of generated training dataset Trans. On services computing Vol 6.
considered for experiment. Let: [2] State Space Model for Link Mining ,Kushal
M is number of Items (M > 0) P.Birla,Prof.S.M.Kamalapur,IJETTCS,ISBN:2278-6856,March-April
N is number of Users (N > 0) 2013
K is number of factors (K > 0) [3] Dunlavy D. M., Kolda, T. G., and Acar E. ,temporal link prediction using
T is number of Timeslots considered for experiment out of matrix and tensor factorizations., ACM Trans. Knowl. Discov. Data 5,
2, Article 10 (Febru- ary2011), 27 pages.
T, T-1 is number of timeslots for which training is given
[4] Sun, J.Z.; Varshney, K.R.; Subbian, K, Dynamic matrix factorization: A
to proposed learning system and predicted data values are state space approach,Acoustics, Speech and Signal Processing (ICASSP),
compared with true data values at Tth time slice. (T > 0) 2012 IEEE Interna-tional Conference, vol.,no 1, pp.1897-1900, 25-30
March 2012.
Oratio is training data Observation percentage. such as 0.0
[5] Yehuda Koren, Yahoo Research Robert Bell and Chris Volinsky, ATT
< Oratio < 1.0 Value in TABLE II indicates RMSE value Labs- Research Matrix Factorization Technique for Recommender Sys-
observed when T=10 and Ith iteration.RMSE measurements on tems, IEEE Com-puter Society 2009
given dataset of size M=100, N=100, K=5, T=20, Oratio=0.4 [6] Zan Huang and Dennis K.J. Lin , The Time Series Link Prediction
as shown in Fig 2 : Problem with Applications in Communication Surveillance., 2009
In the available systems RSME graph , the graph is getting [7] Roger M. du Plessis , Poor Man’s Explanation Of Kalman Filtering or
How I Stopped Worrying And Learned To Love Matrix Inversion, North
smooth from 6th iteration whereas aim of our project is to American Rock-well Electronics Group June 1967,
reduce the iterations as much as possible to obtain smooth [8] R.E.Kalman , A new approach to linear filtering and prediction problems,
RSME graph. Proposed iteration required to obtain smoothness ASME Transactions, Volume 82,Part D, Journal of Basic Engineering pp
is depicted in the Fig 3. shown below. 35-45, 1960.
[9] Kemal Gursoy and Melike Baykal-Gursoy , Forecasting: State-Space
Models And Kalman Filter Estimation.
[10] EvanWei Xiang , A Survey on Link Prediction Models for Social
Network Data., 2008
[11] Lise Getoor, Link Mining: A Survey., SIGKDD Explorations, Volume
7, Issue 2,2005
[12] Liben-Nowell, D. and J. Kleinberg, The link prediction problem for
social net-works, Proceedings of the 12th International Conference on
Information and Knowledge Management (CIKM), New Orleans, LA.
pg 556 - 559.2003.
[13] S. Wasserman and K. Faust , Social Network Analysis:Methods and
Applications. ,Cambridge University Press, Cambridge, 1994.
[14] Leskovec, J., J. Kleinberg and C. Faloutsos. , Graphevolution: Densi-
fication and shrinking diameters,ACM Trans. on Knowledge Discovery
from Data. 2007
[15] Q. Lu and L. Getoor. pNink-based classification,International Confer-
ence on Ma-chine Learning, 2003.
[16] Vazquez, A., J. G. Oliveira and A.L., The inhomogeneous evolution of
subgraphs and cycles in complex networks,Barabasi. Phys. Rev. Lett. E,
71 025103, 2005.
[17] I.J.Nagrath and M.Gopal, Leskovec, J., J. Kleinberg and C. Faloutsos,
Control System Engineering, Fifth Edition
[18] Leskovec, J., J. Kleinberg and C. Faloutsos. 2007. Graph evolu-
tion:Densification and shrinking diameters.ACM Trans. on Knowledge
Discovery from Data, 1.
[19] Vazquez, A., J. G. Oliveira and A.-L. Barabasi.2005. The inhomoge-
neous evolu-tion of subgraphs and cycles in complex networks. Phys.
Rev. Lett. E, 71 025103

Das könnte Ihnen auch gefallen