Beruflich Dokumente
Kultur Dokumente
R. P. Chatterjee
Department of Computer Science and Engineering, Meghnad Saha Institute of Technology, Kolkata
M. Ghosh
Dept. of CSE, Supreme Knowledge Foundation Group of Institutions, Mankundu, WB, India
M. K. Das
Dept. of CSE, Supreme Knowledge Foundation Group of Institutions, Mankundu, WB, India
R. Bag
Dept. of CSE, Supreme Knowledge Foundation Group of Institutions, Mankundu, WB, India
ABSTRACT: Web page prediction plays an important role by predicting and fetching probable web pages of
next request in advance, resulting in reducing the user latency. This paper proposes a web page prediction
model giving significant importance to the users interest using the clustering technique and the navigational
behavior of the user through latest substring association Rule. This method achieves better precision
compared to recent methods in web usage mining.
KEYWORDS- Data mining, Association rule, Substring association rule mining
1 INTRODUCTION
The users surf the internet either by entering URL or
search for some topic or through link of same topic.
For
searching
and
for
link prediction,
clustering plays an important
role.
Web-site
designers want to increase the number of visitors
and the time that these visitors spend on their web
site. To accomplish that, they have to supply
attractive content. And to make their content
attractive, web-site designers and content providers
need to know what their potential visitors want, in
order to organize their content according to their
visitors needs, and, if possible, according to
individual preferences. Researchers use different
techniques like Markov Model [Jin X et al. 2003],
Association rule mining, clustering [Dutta R et al.
2011] and so on. Web usage mining [Barsagade et
al. 2003] is the application of data mining
techniques to extract knowledge from web data,
where at least one of structure or usage data is used
in the mining process. Web usage mining has various
application areas such as web pre-fetching, link
prediction,
site
reorganization
and
web
personalization. Most important phases of web usage
mining are the reconstruction of user sessions by
using heuristics techniques and discovering useful
patterns from these sessions by using pattern
discovery techniques like association rule mining,
Apriori etc. We propose an integrated system for
applying data mining techniques such as association
rules on access log files
are
retrieved
automatically
after
accessing requests to a document
containing links to these files. We
consider web log data as a sequence of
distinct web pages, where subsequences,
such as user sessions can be observed by
unusually long gaps between consecutive
requests.
We have created an unique ID for each
web page link that exists in the web log.
After that binary context corresponding to
that unique ID to count how many times a
particular link of a web page has been
visited by users for a particular session
are created. Next Apriori algorithm
(Agrawal et al.1993) to find out frequent
web pages from all the previously user
visited web pages has been used.
W2
A,B,C
B,A,C
D
D
Substring Rules
{C} ->D
Start
Scan the transaction database to get
the support S of each item
S >=min
support
Access
Path
Precision
50%
60%
60%
10
12
60%
60%
S>=min
support