Beruflich Dokumente
Kultur Dokumente
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856
1. Resource finding: It involves the task of retrieving intended web documents. It is the process by which we extract the data either from online or offline text resources available on web. 2. Information selection and pre-processing: It involves the automatic selection and pre processing of specific information from retrieved web resources. This process transforms the original retrieved data into information. The transformation could be renewal of stop words, stemming or it may be aimed for obtaining the desired representation such as finding phrases in training corpus. 3. Generalization: It automatically discovers general patterns at individual web sites as well as across multiple sites. Data Mining techniques and machine learning are used in generalization 4. Analysis: It involves the validation and interpretation of the mined patterns. It plays an important role in pattern mining. A human plays an important role in information on knowledge discovery process on web.
1. Introduction
The World Wide Web is a rich source of information and continues to expand in size and complexity. Retrieving of the required web page on the web, efficiently and effectively, is becoming a challenge. Whenever a user wants to search the relevant pages, he/she prefers those relevant pages to be at hand. Date mining is process of extracting knowledge from data. Web mining is process of extracting information and patterns from web. Web mining is the Data Mining technique that automatically discovers or extracts the information from web documents. Tracking the navigational behaviors of the online users according to their visiting of pages and predict their requirements for pages are more crucial thing on web. Fuzzy clustering techniques are more useful for predicting users browsing behaviors and also used to cluster related URLs for clustering the similar documents. Current research found that Document clustering and User clustering is considered as a separate process which is quite time consuming. This is showing enhancement in identifying the dynamic users clustering and document clustering simultaneously by applying the Fuzzy clustering approach to predict users navigational behaviors. It consists of following tasks: Volume 2, Issue 2 March April 2013
2. Clustering
Data clustering is the process of dividing data elements into classes or clusters so that items in the same class are as similar as possible, and items in different classes are as dissimilar as possible. Depending on the nature of the data and the purpose for which clustering is being used, different measures of similarity may be used to place items into classes, where the similarity measure controls how the clusters are formed.
|Px n py| represents the number of common words and |Px u py| represents total number of words between two basic clusters. Step3. The second step repeated until all documents being clustered into a pre defined number of clusters. DC = {DCI, DC2, ..., DCn} is the result set. Each DCi represents a set of URLs with similar content. 3. Integration:
Algorithmic steps for Fuzzy c-means clustering: Let X = {x1, x2, x3 ..., xn} be the set of data points and V = {v1, v2, v3 ..., vc} be the set of centers. 1) Randomly select c cluster centers. 2) calculate the fuzzy membership 'ij' using:
In this step, the previously obtained document clusters are merged with reformatted log file and according to the result, access table can be produced. Then, users are clustered based on their behavior in access to document clusters.
4. User Clustering: 3) compute the fuzzy centers 'vj' using: According to access matrix, users with similar interests can be clustered together. In this work, I have used fuzzy c-means (FCM) method for clustering users. UC= {UCl, UCz, , UCn} is result set where each UCi represents a set of users with similar interesting patterns. 5. Final Preparation: In reformatted log file, for each log entry, accessed URL were substituted by most related document cluster and similarly, IP address will be replaced by the most relevant user cluster. The result of this step is input data for rule mining step. 6. Rule Mining:
4) Repeat step 2) and 3) until the minimum 'J' value is achieved or ||U(k+1) - U(k)|| < . 3.2 Pre- processing steps before clustering users and Documents: Using the previous behavior of each user, knowledge that represents user's preferences is extracted. For extracting knowledge from web usage data, 6 steps will be performed as described in the following: Volume 2, Issue 2 March April 2013
Page 215
4. Working Environment
Figure: 1 Architectural view for whole process In Figure two major steps of a web personalization system are knowledge discovery and recommendation. In first step, user preferences are identified using web access log data called web usage data (WUD).In the next step, the achieved knowledge is used to identify the possibly interested URLs to the users. This recommendation can be done in different ways such as adding related hyperlinks to the last web page requested by the user. Separate process will be consider for the document cluster and user cluster and at the end both process will be combine and created specific rule to mine common cluster.
Graph 2. Result graph obtain with Enhanced FCM Algorithm 5.1 Discussion:
Consider the given analytical study presented in two different graphs is given. On horizontal axis number of data samples to be clusters is presented and on vertical axis time taken to cluster that particular number of users is given. In Graph-1 the 10 data samples has been taken as a input from www.drjslab.org log file according to different users visited web pages from different IP addresses. The Graph 1 showing that to cluster 10 samples data as a separate document cluster and Users cluster it will take around 4.5 sec to complete whole the process. In second Graph 2 from the same website 18 data samples will be taken as a input and applying Document clustering and User clustering simultaneously which gives result in around 6.00 sec. so from the analytical study of result we can say that the proposed work give proper enhancement in finally obtain result.
6. CONCLUSION
From this paper we can conclude that, with the help of fuzzy algorithm users clusters and document cluster
Page 216