You are on page 1of 5

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No.

7, July 2011

Analysis of Educational web pattern using Adaptive Markov Chain for Next page Access Prediction
Harish Kumar, PhD scholar, Mewar University. Chittorgarh. Dr. Anil Kumar Solanki MIET Meerut.

ABSTRACT The Internet grows at an amazing rate as an information gateway and as a medium for business and education industry. Universities with web education rely on web usage analysis to obtain students behavior for web marketing. Web Usage Mining (WUM) integrates the techniques of two popular research fields - Data Mining and the Internet. Web usage mining attempts to discover useful knowledge from the secondary data (Web logs). These useful data pattern are use to analyze visitors activities in the web sites. So many servers manage their cookies for distinguishing server address. User Navigation pattern are in the form of web logs .These Navigation patterns are refined and resized and modeled as a new format. This method is known as Loginizing. In this paper we study the navigation pattern from web usage and modeled as a Markov Chain. This chain works on higher probability of usage .Markov chain is modeled for the collection of navigation a pattern and used for finding the most likely used navigation pattern for a web site. Keyword: Web mining, web usage, web logs, Markov Chain. INTRODUCTION:

The IT revolution is the fastest emerging revolution seen by the human race. The Internet surpasses online education, Web based information and volume of click the web site has reached at huge proportions. Internet and the common use of educational databases have formed huge need for KDD methodologies. The Internet is an infinite source of data that can come either from the Web content, represented by the billions of pages publicly available, or from the Web usage, represented by the log information daily collected by all the servers around the world[1][2]. The information collection through data mining has allowed Eeducation Applications to make more revenues by being able to better use of the internet that helps students to make more decisions. Knowledge Discovery and Data Mining (KDD) is an interdisciplinary area focusing upon methodologies for mining useful information or knowledge from data [1]. Users leave navigation traces, which can be pulled up as a basis for a user behavior analysis. In the field of web applications similar analyses have been successfully executed by methods of Web Usage Mining [2] [3]. The challenge of extracting knowledge from data draws upon research in statistics, machine databases, learning, pattern data recognition, visualization,

optimization, web user behavior and high-

124

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 7, July 2011

performance computing, to deliver advanced business intelligence and web discovery 1. 2. 3. 4. solutions[3][4]. It is a powerful technology with great potential to help various industries focus on the most important information in their data warehouses. Data mining can be viewed as a result of the natural evolution of information technology. In Web usage analysis, these data are the sessions of the site visitors: the activities performed by a user from the moment he enters the site until the moment he leaves it. Web usage mining consists on applying data mining techniques for analyzing web users activity. In educational contexts, it has been used for personalizing e-learning and adapting educational hypermedia, discovering potential browsing problems, automatic recognition of learner groups in exploratory learning environments or predicting student performance. The discovered patterns are usually represented as collection of web pages, objects or resources that are frequently accessed by groups of users with common needs or interests [10][11]. Generally user visit a web site in sequential nature means user visit first home page then second page and then third and then finish his work with this user leaves his navigation marks on a server. These navigation marks are called navigation pattern that can be used to decide the next likely web page request based on significantly statistical correlations. If that sequence is occurring very frequently then this sequence indicated most likely traversal pattern. If this pattern occurs sequentially, Makov chains have been used to represent navigation pattern of the web site. This is because in Markov chain present state is depending on previous state. If a web site contains more navigation pattern (Interesting Pattern) high supporting threshold is assign to it and less interesting patterns are ignored. So we can say that at different level of 1. 2. 3. 4.

web site we need to assign different threshold value. Important properties of Markov Chain: Markov Chain is successful in sequence matching generation. Markov model is depending on previous state. Markov Chain model is Generative. Markov Chain is a discrete time stochastic process Due to the generative nature of Markov chain, navigation tours can automatically derived. Sarukkai proposed a technique ho Markov model predict the next page accessed page by the user[4][2]. and Pitkow Junyi and Deshpande various Makov ,Dongshan Model[5][2] proposed

techniques for log mining using

METHODOLOGY: This Markov model is an easiest way of representing navigation patterns and navigation tree. Suppose we have an e web site of a university. Navigation pattern sequences are ABCDEF ACF ACE BCD Navigation Pattern SABCDEFT SACFT SACET SBCDT Total No of web site navigate Frequency of visit 3 2 3 2 10

Table 1: Navigation pattern table

125

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 7, July 2011

So we can identify that total probability of visit of A is 8/39, B is 5/39, C is 10/39, D is 5/39, E is 6/39 and F is 5/39.Here NPi j is a navigation probability matrix where NP is the probability where next stage will be j. Navigation probability is defined as NPi j 0,1

The probability of transition is calculated by the ratio of the number of times the corresponding sequence of pages was traversed and the number of times a hyperlink page was visited. A state of a page is composed by two other states Start state(S) and Terminal State (F).

And for all j NPi j =1. The initial probability of a state is estimated as the how many number of times a page was requested by user so we can say that every state has a positive probability. The Traditional Markov model has some limitations which are as follows. 1. Low order Markov Models has good coverage but less accurate due to poor history. High order Markov Models suffers from high state space complexity.

2.

In higher-order Markov model number of states exponential increases as increase in the order of model. The exponential increment in number of Probability of hyperlink is based on the content of page being viewed. Navigation matrix is as follows: This Indicate navigation control can reach at total 10 times at T.
A 0 0 0 0 0 B 3 / 5 0 0 0 0 C 1 / 2 1 / 2 0 0 0 D 0 0 E 0 0 F 0 0 T 0 0 0

states increases search space and complexity Higherorder Markov model also have low coverage problem. In proposed model, each request with its time-duration is considered as a state. A session is a sequence of such states. The m-step Markov model assumes that the next request depends only on last m requests. Hence, the probability of the next request is calculated by P(r n+1|rn...r1) = P(r n+1|r n...r n m +1), Where ri is the i th request in a session, i=1, 2... n, rn is the current request, and r
n+1

A B C D E

1 0 0

1 / 2 1 / 2 0

2 / 5 0

3 / 5 0 0

F T

0 0

0 0

0 0

0 0

0 0

1 / 5 3 / 1 0 1 / 2 1

is the next request.

From this equation, if m=1 (the 1-step model), the next request is determined only by the current request [5]. The Matrix CM is of conditional probability of previous occurrence. The state matrix CM is a square matrix. So we need to be calculating the probability of each page. So we need to design a model that is dynamic in nature means prediction is based on the next incoming and outgoing node. The Markov model construction starts with the first row of table (first navigation pattern)

Table 2: frequency of each Node and their probability.

126

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 7, July 2011

leaving the control at a page or reaching at another page. Now with this dynamic Markov model it is possible to predict the most probable next web page accessed by the user. CONCLUSION: This main goal of this paper is to analyzing hidden information from large amount of log data. This paper emphasizes on dynamic Makov chain model Figure: First Order Dynamic Markov Model (For pattern1) Similarly we create patterns chain for all the above pattern of table 1. among the different processes. I define a novel approach for similar kind of web access pattern. This approach serve as foundation for the web usage clustering that were described and I conclude that web mining methods and clustering technique are used for self-adaptive websites and intelligent websites to provide personalized service and performance optimization.

REFERENCES: [1] Ajith Abraham, Business Intelligence from Web Usage Figure: First Order Dynamic Markov Model (For pattern2) Summaries above pattern chain into one model and set the in link and out link. So each node contains name of web page, count of web page and an inlink list and outlink list. 390 [2] Jose Borges, Mark Levene An Average Linear Time Algorithm for Web Usage Mining Sept 2003. [3] Hengshan Wang, Model Based On Cheng Yang, Hua Zeng Fpgrowth and Prefixspan, Design and Implementation of a Web Usage Mining Mining Journal of Information & Knowledge Management, Vol. 2, No. 4 (2003) 375-

Communications of the IIMA, Volume 6 Issue 2 [4] Jaideep Srivastava_ y , Robert Cooleyz , Mukund Deshpande, Pang-Ning Tan Web Usage Mining: Discovery and Applications of UsagePatterns from Web Data Volume 1 Issue 2-Page13 [5] Alice Marques, Orlando Belo Discovering Student web Usage Profiles Using Markov Chains The Electronic Journal of e-Learning Volume 9 Issue 1 2011, (pp63-74) [6] Ji He,Man Lan, Chew-Lim Tan,Sam-Yuan Sung, Hwee-BoonLow, Initialization of Cluster refinement algorithms: a review and comparative study, Proceeding of International Joint Conference on Neural Networks[C].Budapest,2004.

Figure: Dynamic Makov Model Node Inlink list contains the list pointer of Inlink web pages and outlink list contains outlink web pages every node contains its frequency as well (as per Table no 2).Frequency of every visited node will change whenever number of inlink pointer is increase means when a page is visited by any user. So this helps us to predict the next web page before

127

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 7, July 2011

[7] Renata

Ivancsy, Ferenc Kovacs Clustering

Techniques Utilized in Web Usage Mining International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, Madrid, Spain, February 15-17, 2006 (pp237-242) [8] Bradley P S, Fayyad U M. Refining Initial Points Press. [9] Ruoming Jin , Anjan Goswami and Gagan Agrawal. Fast and exact out-of-core and distributed k-means clustering Knowledge and Information Systems, Volume 10, Number 1/July, 2006. [10] Bhawna.N and Suresh. J Generating a New Model for Predicting the Next Accessed Web Page in Web Usage Mining Third International Conference on Emerging Trends in Engineering and Technology, ICETET.2010.56 [11] Bindu Madhuri, Dr. Anand Chandulal.J, Ramya. K, Phanidra.M Analysis of Users Web Navigation Behavior using GRPA with Variable Length Markov Chains IJDKP.2011.1201.
AUTHORS PROFILE

for

Kmeans,Clustering

Advances

in

Knowledge Discovery and Data Mining, MIT

Harish Kumar is has completed his M.Tech (IT) in 2009 from Guru Gobind Singh Indraprastha University, Delhi. He is currently pursuing his PhD from Mewar University, Chittorgarh.

Prof. A.K. Solanki, Director of the Institute, has obtained his Ph.D. in Computer Science & Engineering from Bundelkhand University, Jhansi. He has published a good number of International & National Research papers in the area of Data warehousing and web mining and always ready to teach the subjects to his students which he does with great finesse.

128

http://sites.google.com/site/ijcsis/ ISSN 1947-5500