Beruflich Dokumente
Kultur Dokumente
www.ijaert.org
International Journal of Advanced Engineering Research and Technology (IJAERT) 318
Volume 4 Issue 10, October 2016, ISSN No.: 2348 8190
preprocessing consists of data cleaning, user Table 1: Web log of imaginary website
identification, session identification and path
Completion [9]. But in our proposed work we were
used data cleaning and user identification.
Data Cleaning
User Identification B C D
For more logs, if the IP address is the same, but the Using above heuristic following is a user from above
agent log shows a change in browser software or log
operating system, an IP address represents a different
User 1: AB E K I OE L
user.
Here, table 1 shows the log file after data cleaning User 2: A C G M HN
process. From given table user identification can be
done: From Site structure user can be differentiate like
following:
www.ijaert.org
International Journal of Advanced Engineering Research and Technology (IJAERT) 319
Volume 4 Issue 10, October 2016, ISSN No.: 2348 8190
www.ijaert.org
International Journal of Advanced Engineering Research and Technology (IJAERT) 320
Volume 4 Issue 10, October 2016, ISSN No.: 2348 8190
Precision =
Documents correctly retrieved by the system
(TP)
All documents retrieved by the system (TP +
FP)
Recall =
Documents correctly retrieved by the system
(TP)
All documents relevant for the human (TP +
FN)
Fig. 3: F-measure comparison on Dataset-1
So for dataset 1, using manhattan distance function 4
urls are recommended for new user and 2 urls are match Figure 3 shows the result of F-Measure comparison
with the generated recommendation set. New user has between two distance function for Dataset-1. Our
only 2 urls and both are match with recommendation set proposed K-means clustering process with Manhattan
so recall value becomes 1. We got precision value 0.5 distance gives better F-Measure value as compare to
from given formula. Euclidean distance in Dataset-1.
And using euclidean distance function 4 urls are Now for dataset 2, using manhattan distance function
recommended for new user and 2 urls are match with 9 urls are recommended for new user and 6 urls are
the generated recommendation set. New user has 3 urls match with the generated recommendation set. New
but 2 urls are match with recommendation set so recall user has 8 urls but 6 urls are match with
value becomes 0.67. Precision value is 0.5. recommendation set so recall value becomes 0.75. We
got precision value 0.67.
From formula of f-measure,
By using euclidean distance function 11 urls are
recommended for new user and 7 urls are match with
the generated recommendation set. New user has 12 urls
but 7 urls are match with recommendation set so recall
We got values given in table 2 for both distance
www.ijaert.org
International Journal of Advanced Engineering Research and Technology (IJAERT) 321
Volume 4 Issue 10, October 2016, ISSN No.: 2348 8190
value becomes 0.58. We got precision value 0.63. method for Adaptive Web personalization which
consists of four steps: Preprocessing, Dimensionality
We got f-measure value given in table 3 for both reduction, Clustering and site recommendation. We
distance function. proposed the algorithm for K-means clustering and site
recommendation. In k-means clustering with the value
Table 3: FMeasure Comparison of different of cluster centroid we matched new user interest and
distance functions for Dataset-2 generated recommendation for new user.
Dataset-2 Dimensionality reduction techniques are not only useful
K-means clustering for lowering the size of the data, but also that they are
F-measure Manhattan Euclidean able to extract the underlying semantics of the data.
distance distance Through our proposed process we have got good
0.7077 0.6039 accuracy in generation of similar users clusters and
recommendation rules.
In the Table 3, shows the values of the F-Measure of
Manhattan distance and Euclidean distance in K-means The proposed algorithm has k-means clustering
clustering for Dataset-2. Comparison between both algorithm. In future work, instead of giving random
distance function are represented by the graph. number of clusters we can find maximum distance
between users and assign a cluster. Also we can apply
sequence pattern in generation of recommendation urls.
It means recommend those urls which are visited in a
sequence.
REFERENCES
www.ijaert.org
International Journal of Advanced Engineering Research and Technology (IJAERT) 322
Volume 4 Issue 10, October 2016, ISSN No.: 2348 8190
www.ijaert.org