Sie sind auf Seite 1von 4

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856

An Improved Fuzzy Clustering Technique for Users Browsing Behaviors


Nikunj Kansara, Shailendra Mishra
Parul Institute of Engineering & Technology, Gujarat Technological University, P.O.Limda-391760, Vadodara, India

Abstract: Analyzing and predicting navigational behavior of


Web users can lead to more user friendly and efficient websites which is an important issue in Electronic Commerce. Web personalization is a common way for adapting the content of a website to the needs of each specific user. Several clustering approaches are implemented to cluster Users who are visiting similar pages on Web. Fuzzy clustering technique is widely applicable to cluster homogenous users according to their access characteristic of web pages. Fuzzy clustering found optimum way to cluster documents scattered on World Wide Web. In present scenario clustering users and documents separately increasing the time complexity as well as reduce the performance of the process. So this shows how to reduce time complexity by combining the users and document clustering process simultaneously. Advantages of this strategies are at the same time not only clusters the objects but also the features of the objects will be clustered.

1. Resource finding: It involves the task of retrieving intended web documents. It is the process by which we extract the data either from online or offline text resources available on web. 2. Information selection and pre-processing: It involves the automatic selection and pre processing of specific information from retrieved web resources. This process transforms the original retrieved data into information. The transformation could be renewal of stop words, stemming or it may be aimed for obtaining the desired representation such as finding phrases in training corpus. 3. Generalization: It automatically discovers general patterns at individual web sites as well as across multiple sites. Data Mining techniques and machine learning are used in generalization 4. Analysis: It involves the validation and interpretation of the mined patterns. It plays an important role in pattern mining. A human plays an important role in information on knowledge discovery process on web.

Keywords: Fuzzy clustering, Fuzzy c-means clustering, Pre-processing.

1. Introduction
The World Wide Web is a rich source of information and continues to expand in size and complexity. Retrieving of the required web page on the web, efficiently and effectively, is becoming a challenge. Whenever a user wants to search the relevant pages, he/she prefers those relevant pages to be at hand. Date mining is process of extracting knowledge from data. Web mining is process of extracting information and patterns from web. Web mining is the Data Mining technique that automatically discovers or extracts the information from web documents. Tracking the navigational behaviors of the online users according to their visiting of pages and predict their requirements for pages are more crucial thing on web. Fuzzy clustering techniques are more useful for predicting users browsing behaviors and also used to cluster related URLs for clustering the similar documents. Current research found that Document clustering and User clustering is considered as a separate process which is quite time consuming. This is showing enhancement in identifying the dynamic users clustering and document clustering simultaneously by applying the Fuzzy clustering approach to predict users navigational behaviors. It consists of following tasks: Volume 2, Issue 2 March April 2013

2. Clustering
Data clustering is the process of dividing data elements into classes or clusters so that items in the same class are as similar as possible, and items in different classes are as dissimilar as possible. Depending on the nature of the data and the purpose for which clustering is being used, different measures of similarity may be used to place items into classes, where the similarity measure controls how the clusters are formed.

3. What is Fuzzy Clustering?


Fuzzy clustering is a process which categorizes elements, typically usage clicks or usage sessions into groups, where each element can belong to several groups with different degrees of membership. In fuzzy clustering, the data points can belong to more than one cluster, and associated with each of the points are membership grades which indicate the degree to which the data points belong to the different clusters. 3.1 FUZZY C-MEANS CLUSTERING Algorithm: Page 214

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856
This algorithm works by assigning membership to each data point corresponding to each cluster center on the basis of distance between the cluster center and the data point. More the data is near to the cluster center more is its membership towards the particular cluster center. Clearly, summation of membership of each data point should be equal to one. 1. Pre-processing: Log files have useful information about access of all users to a specific website. Extracting these information, reformatted log file which contains useful information such as " time, date, accessed URL and IP address" is formed and useless requests such as accesses to images are removed from log file in data cleaning process. 2. Document Clustering: Step1. Assign each document to a single cluster. Step2. Merge primary clusters based on the Jaccard coefficient similarity measure. Defined as: Where, 'n'is the number of data points. 'vj' represents the jth cluster center. 'm' is the fuzziness index m [1, ]. 'c' represents the number of cluster center. 'ij' represents the membership of ith data to jth cluster center. 'dij' represents the Euclidean distance between ith data and jth cluster center. Main objective of fuzzy c-means algorithm is to minimize:

|Px n py| represents the number of common words and |Px u py| represents total number of words between two basic clusters. Step3. The second step repeated until all documents being clustered into a pre defined number of clusters. DC = {DCI, DC2, ..., DCn} is the result set. Each DCi represents a set of URLs with similar content. 3. Integration:

Algorithmic steps for Fuzzy c-means clustering: Let X = {x1, x2, x3 ..., xn} be the set of data points and V = {v1, v2, v3 ..., vc} be the set of centers. 1) Randomly select c cluster centers. 2) calculate the fuzzy membership 'ij' using:

In this step, the previously obtained document clusters are merged with reformatted log file and according to the result, access table can be produced. Then, users are clustered based on their behavior in access to document clusters.

4. User Clustering: 3) compute the fuzzy centers 'vj' using: According to access matrix, users with similar interests can be clustered together. In this work, I have used fuzzy c-means (FCM) method for clustering users. UC= {UCl, UCz, , UCn} is result set where each UCi represents a set of users with similar interesting patterns. 5. Final Preparation: In reformatted log file, for each log entry, accessed URL were substituted by most related document cluster and similarly, IP address will be replaced by the most relevant user cluster. The result of this step is input data for rule mining step. 6. Rule Mining:

4) Repeat step 2) and 3) until the minimum 'J' value is achieved or ||U(k+1) - U(k)|| < . 3.2 Pre- processing steps before clustering users and Documents: Using the previous behavior of each user, knowledge that represents user's preferences is extracted. For extracting knowledge from web usage data, 6 steps will be performed as described in the following: Volume 2, Issue 2 March April 2013

Page 215

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856
Varity number of algorithms such as Apriori, Eclat and FP-Growth are proposed by researchers for generating association rules. Extracted rules show that which group of users and when are interested to what kinds of document clusters. We used apriori algorithm for mining our frequent item set.

4. Working Environment

Graph 1. Result graph obtain with FCM Algorithm

Figure: 1 Architectural view for whole process In Figure two major steps of a web personalization system are knowledge discovery and recommendation. In first step, user preferences are identified using web access log data called web usage data (WUD).In the next step, the achieved knowledge is used to identify the possibly interested URLs to the users. This recommendation can be done in different ways such as adding related hyperlinks to the last web page requested by the user. Separate process will be consider for the document cluster and user cluster and at the end both process will be combine and created specific rule to mine common cluster.
Graph 2. Result graph obtain with Enhanced FCM Algorithm 5.1 Discussion:

5. Results Analytical Study


All preprocessing steps can be applied on web server log files which consist of information about each and every users visited that particular website pages and from that we can cluster document related to similar users interest for visiting that particular document. Pre processing on log files removes the redundant data like blank link entry included for particular IP address or different .CSS page link entry is there.

Consider the given analytical study presented in two different graphs is given. On horizontal axis number of data samples to be clusters is presented and on vertical axis time taken to cluster that particular number of users is given. In Graph-1 the 10 data samples has been taken as a input from www.drjslab.org log file according to different users visited web pages from different IP addresses. The Graph 1 showing that to cluster 10 samples data as a separate document cluster and Users cluster it will take around 4.5 sec to complete whole the process. In second Graph 2 from the same website 18 data samples will be taken as a input and applying Document clustering and User clustering simultaneously which gives result in around 6.00 sec. so from the analytical study of result we can say that the proposed work give proper enhancement in finally obtain result.

6. CONCLUSION
From this paper we can conclude that, with the help of fuzzy algorithm users clusters and document cluster

Volume 2, Issue 2 March April 2013

Page 216

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856
process can be combined and we can reduce the complexity of the process timing. The proposed algorithm is easy to implement and compatible with popular web data mining tools which are using common clustering algorithms over existing infrastructures. The limitations of the proposed work is, in clustering algorithms defining a suitable number of clusters for particular user groups will be difficult. These limitations of proposed work will be considered as a future work. REFERENCES [1] A Fuzzy Recommender System for Dynamic Prediction of User's Behavior, Internet technology and secured transactions, NOV 2010 international conference, 978-0-9564263-6-9 IEEE. [2] Fuzzy C-Means Clustering Based Uncertainty Measure for Sample Weighting Boosts Pattern Classification Efficiency CISP2012|Proceedings|31, 978-1-4577-0720-9/12/$26.00 2012 IEEE. [3] REXWERE: A tool for fuzzy Rule EXtraction in Web Recommendation. -4244-1214- 5/07/$25.00 B2007 IEEE [4] Improved FCM algorithm for Clustering on Web Usage Mining., 978-1-4244-9283-1/11/$26.00 2011 IEEE [5] A Rough Set Approach for Clustering the Data Using Knowledge Discovery in World Wide Web for E-Business .IEEE international conference on eBusiness engineering. [6] A Comparison Study between Various Fuzzy Clustering Algorithms. JJMIE Jordan Journal of Mechanical and Industrial Engineering, Volume 5, Number 4, Aug. 2011 [7] Modeling Academic Performance Evaluation using Fuzzy C-Means Clustering Techniques. International Journal of Computer Applications (0975 8887) Volume 60 No.8, December 2012. [8] Generalized Fuzzy C-Means Clustering with Improved Fuzzy Partitions and Shadowed Sets. International Scholarly Research Network ISRN Artificial Intelligence Volume 2012, Article ID 929085, 6 pages doi:10.5402/2012/929085 [9] Efficient Implementation of the Fuzzy c-Means Clustering Algorithms .IEEE TRRANSACTFIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. VOL.PAMI-8, NO.2 MARCH 2007. [10] Collaborative Fuzzy Clustering Algorithms: Some Refinements and Design Guidelines, IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 20, NO. 3, JUNE 2012, [11] Fuzzy C-Means Clustering Based Uncertainty Measure for Sample Weighting Boosts Pattern Classification Efficiency. CISP2012|Proceedings|32 [12] Comparison of Scalable Fuzzy Clustering Methods. WCCI2012.IEEE World Congress on Computational Intelligence June , 10-15-2012 Brisbane, Australia. Volume 2, Issue 2 March April 2013 Page 217 [13] FaiNet: An Immune Algorithm for Fuzzy Clustering WCCI 2012.IEEE World Congress on Computational Intelligence June, 10-15, 2012 - Brisbane, Australia.

Das könnte Ihnen auch gefallen