Sie sind auf Seite 1von 6

IEEE-International Conference on Recent Trends in Information Technology, ICRTIT 2011 MIT, Anna University, Chennai.

June 3-5, 2011

Web Mining Framework for Security in E-commerce


R.Manjusha, Research Scholar
Department of Information Technology, Sathyabama University, Chennai, Email: - manjusha84rr@yahoo.co.in .
Abstract This paper is based on e-commerce web sites how to use web mining technology for providing security on e-commerce web sites. The connection between web mining ,security and ecommerce analyzed based on user behavior on web .Different web mining algorithms and security algorithm are used to provided security on e-commerce web sites. Based on customer behavior different web mining algorithms like page rank algorithm and trust rank algorithm is used for developing web mining framework in e-commerce web sites. We have developed false hit database algorithm and nearest neighbor algorithm to provide security on e-commerce web site. In existing web mining framework is based on only web content mining, We have proposed Web mining framework system which is based on web structure mining analysis, Web Content Mining analysis, decision analysis and security analysis. KeywordWeb mining ,Security , E- commerce.

Dr.R.Ramachandran,
Sri Venkateshwara College of Engineering, Sriperumbudur,Chennai.

A. Page rank algorithm Page Rank algorithm used by search engine .We have computed page rank of web sites by parse web pages for links, iteratively compute the page rank and sort the documents by page rank engine .Page Rank algorithm is in fact calculated as follows PAR(A)=(1-d)+d(PAR(T1)/OG(T1)+.+PAR(TN)/OG(TN) Where PAR(A) is the PageRank of page A 0G(T1) is the number of outgoing links from page T1 d is a damping factor in the range 0<d<1 ,usually set to 0.85 The PageRank of web page is calculated as sum of the PageRank of all pages linking to its divided by the number of links on each of those pages its outgoing links. B. TrustRank algorithm The trust rank algorithm is procedure to rate the quality of web sites. Taking the linking structure to generate a measure for quality of a page. Steps of Trust Rank algorithm. 1-The starting point of the algorithm is the selection of trusted web pages. 2-Trust can be transferred to other page by linking to them. 3-Trust is propagating in the same was as Page Rank 4-The negative measure is propagating backwards and is a measure of bad pages 5-For the ranking algorithm both measures can be taken into account. Trust Rank algorithm is in fact calculated as follows Trust Rank=M*x Where the matrix m is given by M=1-dt

I.

WEB MINING FRAMEWORK SYSTEM

Web mining is the use of data mining techniques to automatically discover and extract knowledge from web documents.web mining is the information service centre for news, e-commerce, and advertisement, government, education, financial management, education, etc. We have developed Web mining framework for evaluating ecommerce web sites .In general web mining task can be classified into web content mining, web structure mining and web usage mining. Some of the well-known classification techniques for web mining such as like, page rank algorithm and trust rank algorithm is used in this paper. Our proposed web mining framework consists of four phases web structure mining analysis, Web Content Mining analysis, decision analysis and security analysis. II. WEB STRUCTURE MINING ANALYSIS

This phase analyses a web site by using both page rank algorithm and trust rank algorithm. The ranking of a page is determined by its link structure instead of its content. The trust rank algorithm is procedure to rate the quality of web sites. The output is quality based score which correspond to trust assessment level of the web site. The initial step is collects information from web sites and stores those web pages into web repository.

. 978-1-4577-0590-8/11/$26.00 2011 IEEE

1043

IEEE-ICRTIT 2011
With Tij=1/cj(if page j is linking to page i) Tij=0 otherwise d is damping factor and x is the source vector of the trust The invrse PageRank is given by Minv*xinv With Minv=1-dinvTinv The inverse transition matrix Tinv is definied by Tij=1/nj(if page i is linking to page nested in those generated in later stages. Clusters with different sizes in the tree can be valuable for discovery. A Matrix Tree Plot visually demonstrates the hierarchy within the final cluster, where each merger is represented by a binary tree. Process: Assign each object to a separate cluster. Evaluate all pair-wise distances between clusters (distance metrics are described in Distance Metrics Overview). Construct a distance matrix using the distance values. Look for the pair of clusters with the shortest distance. Remove the pair from the matrix and merge them. Evaluate all distances from this new cluster to all other clusters, and update the matrix. Repeat until the distance matrix is reduced to a single element.Advantages: It can produce an ordering of the objects, which may be informative for data display. Smaller clusters are generated, which may be helpful for discovery. B. K-Means Cluster Analysis In statistics and machine learning, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data as well as in the iterative refinement approach employed by both algorithms. Process: The dataset is partitioned into K clusters and the data points are randomly assigned to the clusters resulting in clusters that have roughly the same number of data points. For each data point: Calculate the distance from the data point to each cluster. If the data point is closest to its own cluster, leave it where it is. If the data point is not closest to its own cluster, move it into the closest cluster. Repeat the above step until a complete pass through all the data points results in no data point moving from one cluster to another. At this point the clusters are stable and the clustering process ends. The choice of initial partition can greatly affect the final clusters that result, in terms of inter-cluster and intra cluster distances and cohesion. Advantages: With a large number of variables, K-Means may be computationally faster than hierarchical clustering (if K is small).K-Means may produce tighter clusters than hierarchical clustering, especially if the clusters are globular. C. Module1: User Identification Users are of different categories. New Users will get registered in the system. Existing users can logon to their account. Administrator has the highest priority. Generate user profiles based on their access patterns. Cluster users based on frequently accessed URLs.Use classifier to generate a profile for each cluster. We have developed the web site using dot net as front end and sql as back end .

Tij =0 otherwise
d is damping factor and xinv is the source vector of the bad pages and n is number of incoming links on page j.Minv is nether the transparent nor the inverse matrix of M. From this we can say that pages are bad which are linking to bad pages. While pages are good which are linking good pages. III. WEB CONTENT MINING ANALYSIS

Web content mining is defined as searching of new information from web data. Data is retrieved for desired topic by user. In Web content mining analysis we have taken example job categories and the associated skills needs prevalent in the computing professions. We performed a cluster analysis on the ads in two phases. Hierarchical agglomerative clustering is the first step to identify unique skill set clusters. The classification of ads is validated into clusters by performing k-means cluster analysis. Module1: User Identification, Module2: Job Definition, Module3: Data Collection, Module4: Data Analysis.

Figure 1.Module Diagram

A.

Hierarchical Agglomerative Clustering Agglomerative hierarchical clustering is a bottom-up clustering method where clusters have sub-clusters, which in turn have sub-clusters, etc. The classic example of this is species taxonomy. Gene expression data might also exhibit this hierarchical quality (e.g. neurotransmitter gene families). Agglomerative hierarchical clustering starts with every single object (gene or sample) in a single cluster. Then, in each successive iteration, it agglomerates (merges) the closest pair of clusters by satisfying some similarity criteria, until all of the data is in one cluster. The hierarchy within the final cluster has the following properties: Clusters generated in early stages are

Figure 2.User login.

1044

Web Mining Framework for Security in E-commerce


D. Module2: Job Definition In this module we have authenticated users can proceed with the features provided by the web based learning site. The materials based on few subjects are given in the site and the users can utilize it. G. Results Using a Web content data mining application, few unique IT job descriptions from various job search engines are extracted and distilled each to its required skill sets. We examined these, revealing few clusters of similar skill sets that map to specific job definitions. It makes job search faster and gives the results according to user preference.

Figure 6.Matching job.

Figure 3. Job definition.

E.

Module3: Data Collection Collecting the job definitions based on grouping .Collecting the values of job title, job description and the skills required by the company of the candidate. The job definitions is being clustered based on the job title.

H. Performance analysis We propose a system in which all the Information about the system can be logged for future reference. Analysis of the students and the freshers performance measurements could be done. Graduates can get an exact job and the freshers can measure their gap in the industry and learn accordingly. IV. DECISION ANALYSIS

This phase uses the total trust of web page generated from Web structure mining analysis phase. Two processes are performed (a)Trust calculation of web site and (b) Application of suitable statistical techniques to analyses the result of the evaluation We consider three trust levels A. Trust calculation of web site -High Trust Web sites

Figure 4.Customer profile status.

-Moderate Trust Web sites -Un trust Web sites TrustCalculationModel <TrustCalculationModel> <opinions> <opinion Type=1 Weight=0.1> <source Type=Experience Weight=0.8/> <sourceType=Reputation Weight=0.1/> </source> </opinion> <opinion Type=2 Weight=0.9>

F. Module4:Data Analysis Module The data are analyzed based on the Data Collection Module. Dagger (), Asterisk (*).The data is mined based on the previous modules.

Figure 5.Skills Frequency.

1045

IEEE-ICRTIT 2011
<sources> <source Type=Experience Weight=0.8/> <sourceType=Reputation Weight=0.1/> </source> </opinion> </opinions> </TrustCalculationMode> <owner Name=trustvalue> <Term Name= Un trust Web sites> <points> <point x=0.0 y=1.0/> <point x=0.4 y=0.0/> </points> </Term> <Term Name= Moderate Trust Web sites> <points> <point x=0.0 y=0.0/> <point x=0.4 y=1.0/> <point x=1.0 y=0.0/> </points> </Term> <Term Name= High Trust Web sites> <points> <point x=0.4 y=0.0/> <point x=1.0 y=1.0/> </points> </Term> </owner> The value of the trust value variable has converted into degrees of membership fuction defined on variable Let as consider Trust Value :0.11 Un trust Web sites:0.78 Moderate Trust Web sites:0.22 High Trust Web sites:0.00 We can say that If trust value is un trust web sites then trust level is none. If trust value is Moderate Trust web sites then trust level is limited. If trust value is High Trust web sites then trust level is full. From the above method we can calculate trust of the web site A. Module 1: Authentication In authentication module Member or user access the search facility and admin check false hits and updates the database. B. Application of suitable statistical techniques to analyses the result of the evaluation Analyzing information from website is important. Using statistics, we can able to evaluate your website. Descriptive Statistics is mainly used to describe populations using random samples of Web data collected from web sites. It provides a statistical summary of the web data with a view to understand the population that sample represent. Central Tendency and dispersion measures are used in descriptive statistics. Measures of central tendencies describe the central values of a collected sample of web data. For an ungrouped set of web data measures are mean, median and mode. Pareto principal -The first 50% of un trusted web site is banned, next 25% of the un trusted web site is banned, next 12.5% takes same effort and so on..By application of suitable statistical techniques we can evaluate results. V.
SECURITY ANALYSIS

We perform complete security analysis in this phase. 89% of web development companies has not follow industry standard in developing and hosting the websites they make. The customers who use the web sites do know the difference between a secure website and insecure website. We have developed trust path intermediaries building algorithm, false hit database algorithm and nearest neighbor algorithm to provide security on e-commerce web site. Multi-step processing is used for nearest neighbor and similarity search in application involving web data and/or costly distance computations. CAMNC-to reduce the size of False Hit database. The query is authenticated. A server maintains dataset database signed by trusted authority False hit Database to reduce hang or lag in the server. Provides accurate data as well as NN result-set. We have developed following modules for providing security on ecommerce web sites. Module 1: Authentication, Module 2: Query processing, Module 3: Similarity search and Module 4: False hit reduction

Figure 7.Webwise search web site.

1046

Web Mining Framework for Security in E-commerce


B. Module 2:Query processing: This module describes the server and user communication process where the client posts the query and the server delivers the result based on the criteria. Module diagram

Figure 11.Module Diagram.

Working (Default): User enters the search keyword in the web site. Admin checks the false hits recorded. He then posts the necessary responses to the search database for future verification.
Figure 8.User fill the details.

C. Module 3:Similarity search The similarity search proposes the criteria of retrieving the relevant information from the database based on similar keyword.

Figure 12.Working.

Figure 9.Searches similar keyword.

Case 1: If the search keyword is not present ,search keyword is updated in the false hit database.

D. Module 4:False hit reduction: Admin checks the false hits recorded. He then posts the necessary responses to the search database for future verification.

Figure 10.Admin Checks the false hit database.

Figure 13.Keyword is not present false hit is updated.

1047

IEEE-ICRTIT 2011
Case 2: If the search keyword is present then posts the necessary responses to the search database for future verification. This is the first work addressing authenticated Similarity retrieval from such sources using the multistep NN framework. We show that a direct integration of optimal NN search with an authenticated data structure incurs excessive communication overhead. From security module we provide security to web site. VI. CONCLUSION

Figure 14.Keyword is present post the response.

In this paper we have proposed web mining framework for e-commerce web sites. In web mining framework we have developed four phases web structure mining analysis, Web Content Mining analysis, decision analysis and security analysis. In web structure mining analysis we have used page rank algorithm and trust rank algorithm. In Web Content Mining analysis we have used Hierarchical agglomerative clustering and k-means cluster analysis. In decision analysis we have used trust calculation of web site and statistical techniques to analyses the result of the evaluation. In security analysis we developed trust path intermediaries building algorithm, false hit database algorithm and nearest neighbor algorithm to provide security on e-commerce web site REFERENCES
[1] Chuck Litecky, Andrew Aken, Altaf Ahmad, and H. James Nelson, Southern Illinois University, Carbondale,Mining computing jobs, January/February 2010 IEEE SOFTWARE Tao, Y., Yi, K., Sheng, C., Kalnis, P. Quality and Efficiency in High Dimensional Nearest Neighbor Search. SIGMOD, 2009 Olfa Nasraoui, Member, IEEE, Maha Soliman, Member, IEEE, Esin Saka, Member, IEEE, Antonio Badia, Member, IEEE, and Richard Germain (2008), A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 2, FEBRUARY 2008 Bing Liu, Robert Grossman, and Yanhong Zhai, University of Illinois at Chicago, Mining Web Pages for Data Records. Published by the IEEE Computer Society NOVEMBER/DECEMBER 2004. Sankar K. Pal, Fellow, IEEE, Varun Talwar, Student Member, IEEE, and Pabitra Mitra, Student Member, IEEE, Web Mining in Soft Computing Framework:Relevance, State of the Art and Future Directions, IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO.
5, SEPTEMBER 2002

Admin Updates the Database User access the search facility and admin check false hits and updates the database.

[2] [3]

[4]

[5]

Figure 15. Admin Updates the Database.

The importance of authenticated query processing increases with the amount of information available at sources that are untrustworthy, unreliable, or simply unfamiliar.

[6]

[7]

Yacine AtifUnited Arab Emirates University,Building Trust in E-Commerce, IEEE INTERNET COMPUTING, JANUARY FEBRUARY 2002 Korn,F.,Sidiropoulos,N.,Faloutsos,C.,Siegel,E.,Protopapas,Z.Fast Nearest Neighbor Search in Medical Image Databases.VLDB,1996.

1048

Das könnte Ihnen auch gefallen