Sie sind auf Seite 1von 34

Ernestina Menasalvas

emenasalvas@fi.upm.es

Facultad de Informatica Univesidad Politecnica de Madrid May 2004

Introduction and motivation


Internet as a communication channel. Technology needed to develop new services, security, infraestructure, analysis Web Mining to analyze the patterns so the services reply to user needs Most of the webmining projects that have been developed, have note taken into account the context in which they have been developed:
Competitive society Success criteria dependes both:
User satisfaction Sponsors benefit increase

The gap between tecnology depelopment in the web and the business factors is increasing and genetares as a side effect a separation on what tecnologist develop and what the companies need. Knowing that the problem exists is just the begining Technological projects have to be integrated in the global strategy of the company

The problem
Innovative ideas in e-commerce are vaguely defined so they loose focus and precision New technologies are being applied consuming resources but without appropriate finantial or economic benefits Growth of the web activity, participation in every daily activity (commercial, educational news, ..) is not being replied by an accordindly number of servicies Services are being considered insuficient. Thus, site sponsors have to improve offered services to satisfy the increasing growth in demand. On the other hand, the growth in offers will bring a growth in demand what will make that the consumer will ask for a better service offer. Web Mining projects have to be planned as one more project in the global strategy of the company

Web Site personalization


Optimization and personalization of user web experience is crucial for attracting and retaining electronic, web-based commerce customers. Try to maintain the one-to-one relationship Identifying future behaviour is crucial for the site to act proactively. Information about user experience is captured in clickstream logs: pages viewed, timing, and sequence. Solutions given:
Clustering of users Cluster of pages Most visited path Recommender systems How to deploy? How has the method been evaluated? How does it helps to the company How does it evolves in time?

The question:

Web Mining project evaluation


Criteria being used to evaluate the success of a site takes not external (commercial) aspects into account. Site aspects such as: increasing volume of selling, fraud decrease, customer retention, competitivie prizes are not explicitiy tackled Success in web sites is a measure related to eficiency and quality:
Efficiency: number of pages being accessed along one session, lenght of the session and actions developed Quality: respose time of the site to the user requests, pages accesibility, visitors per page

Company success is evaluated in terms of:


Incomes, Outcomes, Expenses ROI, Market presence

Differences between criteria used to evaluate the success of any project in the entreprise compared to those in the case of a web project are in the root of the problem of webmining not complete success Site sponsors do no evaluate commercial and finantial aspects and are only based on vague commertial notions The success in terms of use, structure and content has to be linked to company business goals achievement

Web Mining project management


An enterprise is a system design to fulfil certain goals by means of the integration of different resources. Subsistems are at the same time interrelated and inter independent When the company uses the Web as a channel, all the services, infraestructure, , has to be seen as one of the subsystems. Success of solution in the web subsystem has to be related to the behaviour of the rest of the subsistems Web Mining projects are concerned with the Web subsystem So web mining project is not only an IT problem Apply a project management methodology to control the process: A project manager is needed-> different role from the data miner Identify Data Mining problems. For each of them apply CRISP-DM

Web Mining Project management (cont)


To properly deal with a data mining project we need explicit information of the company:
Structure of the company (departments, sections, channels, ) Goals of the company and success criteria (both at the higher level and at the department level) Resources, constraints, and any factor that can determine the goal analysis and the development of a web project Web Project goals and their relationship with the goals of the company

Company environment, identify:


To evaluate if the web mining project results contribute to the company goals fulfilment:
The web site is not usually the end but the means. It is of the channels that the company uses to achieve goals. So in order to establish a site as a sucessful site, then it is a must the activities being developed through the site to generate value for the company

Traditional approaches only analyze the site from the user perspective, but the actions of the users have to generate value for the company It is a CRM project Web Project plan generation

CRM project the three legs


Operational CRM
Back Office
Closed-Loop Processing (EAI Toolkits, Embedded/Mobile Agents
ERP/ERM
Order Manag.

Analytical CRM

Supply Chain Mgmt.


Order Prom.

Legacy Systems

Data Warehouse

Front Office

Service Automation

Marketing Automation

Sales Automation

Customer Activity

Customers

Products

Mobile Office

Mobile Sales

Field Service

Vertical Apps.
Category Mgmt.

Marketing Automation
Campaign Mgmt.

Customer Interaction

Voice (IVR, ACD)

Conferencing Web Conferencing

E-mail Response Management

Fax Letter

Direct Interaction

Collaborative CRM

Data Mining
Increasing potential to support business decisions

Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery

Relationship with End User

Business Analyst Data Analyst

Data Exploration Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts OLAP, MDA Data Sources Paper, Files, Information Providers, Database Systems, OLTP

DBA

Fact Gap

Fact Gap: difference between the available


information and the ability to take decisions based on these information. (Gartner Group)

Data Mining gives the intelligence


Data bases gives the data. But intelligence is needed to explore the data to find the patterns, rules and ideas to explain what is going on and to predict what will go on Techniques and tools are needed to add this intelligence to data in order to extract the maximum benefit from data. But tools alone (nowadays) do not put the intelligence, this has to be provided by EXPERTS and translated into the data for better understanding

Data warehouse and data bases are the support

Data Mining Standard process model : Crisp-DM


Problem Understandin g Data Understandin g Data Preparation Modeling Evaluation

Deployment

Building the bridge


In order to provide users with the most appropriate solution, data to be analyzed have to be enriched with business information Business problems have to be translated to data mining problems Results have to be understable not only by data mining experts but also by end users Underlying the data mining solution semantics has to be settled

Deeper analisis of Personalization


What is personalization? Observe user-web page interactions to identify patterns that:
indicate high-level user activity, anticipate future use activity, Make it possible to proactively act

What is going to be personalized?


The site: this means pages according to the users behaviour or pattern

Why the personalization is needed?


To improve the site performance The web is just another channel Site performance has to do with improving the goals of the company

Who is the user?


Navigator Customer

Web Data to be analyzed


In any web mining problem we have data related to:
Pages Navigators and navigation Customers and their transactions

Web Logs is just the begining Not only the data has to be taken into account but all the circumstances under which the data were collected: Environment
General Organization-related Customer-related

Enviroment
Affects both direct and indirectly to the way activites occur. Between the factors to take into account:
Legal conditions Technological conditions Demography Ecological conditions (weather, transports, communications) Cultural and social conditions Geographical situation

Take into account the location of the site, of the navigator,

Information to be added
Departments:
The same concept can have different meaning depending on the department Product for marketing is not the same than for production Data per se of the object: size, color, Data relevant for the company: margin of benefits, top ten, How it is presented in the web Static data: gender, demographic information (varies over the time but in a particular moment it is static) Roles: Behavior with the company being analyzed: number and kind of transaction he/she performs Behavioural data related to the environment (economy, legal constraints, climate,) Web Log: Location (IP), time, browser, Behaviour : comparative with the normal if any to discover : mood, different location, Itself has no meaning Legal and fiscal periods, holidays, weekend, Opening, closure, .

Products, services:

People consumers in general:


Navigators:

Dates

Data enrichment
There is no method, no model to follow. It is more an art Only with experience Projects for the same domain share the enrichment:
A model could be established Evaluate if data are appropriate to mine Evaluate kind of patterns that can be obtained Evaluate if a certain pattern cannot be obtained

Metadata is needed about the data


Meaning for the business of each value, attribute, page, action,

Metadata for each attribute, has to include semantics:


Meaning: group according to it: demographical, behavioural, enviromental, social, cultural Business value Cirmcunstances Constraints Relationship with other concepts

Ontology of concepts ??? Integrate metadata so the mining activity deals with them.

Data Modelling and deployment


Once enriched data, patterns extracted can be interpreted according to:
User profiles Session value (according to certain goals) Period of the day

Solution has to be deployed and integrated in the site structure. Patterns evolve in time as new data are coming Models have to be refined Establish the basis for the model to be refined without performance decrease

Web Mining infraestructure


User HTTP Client
HTTP Response

Interface Agent

HTTP Request HTTP Response

Original WEBSITE

DECISION LAYER User Agent


USERS User Model Action Plan

Planning Planning Planning Agent Agent Agent VWi


Operational PLANS

CRM SERVICES PROVIDER LAYER Agents


Services Information

SEMANTIC LAYER Agents


Models

WebLogs

Case-study: act according to the value of the current session


Patterns to help:
Predict user behavior based on current behavior, not identity. Abstract user behavior with varying degrees of granularity => subsessions. Estimate the value of the session to accordidly act

Subsessions capture/approximate user state information. Key concept: frequent behavior paths. Markov model to predict next set of pages and behaviour Webhouse to store information about users Modify APACHE: pop ups and precaching

Case-study

1. Find behavior rules


Partial tree: Define break points as decision points in the path. Use them to create rules. Knowing PIND allows us to predict a set of pages to follow....

Break point PIND PDEP Break point

PDEP

Behaviour rules
Pgina principal, Tabln Pgina principal, Tabln Pgina principal, Tabln Exmenes Prcticas, Material apoyo Prctica 1 Prcticas, Material apoyo Prctica 2

-3 ... Pgina principal 3 2 Tabln 4 Prcticas 5

Exmenes Material de apoyo Prctica 1 Material de apoyo Prctica 2

Pgina de Decisin

Pgina Objetivo

2. Find Subsessions
Sessions may be described in terms of subsessions. E.g., browse catalog, browse shipping information, browse privacy notices, perform purchase. Subsessions may be defined in a number of ways, according to the PDEP desired semantics. E.g., use breakpoints.

PIND PDEP

Click-path Subsession Figure


Real-time user web page access path, with identified frequent paths

Web page access path expressed as a sequence of subsessions

3. Markov models to predict behavior and paths


Behavior X Behavior Y

BK N

BK M

BK P

.. .

session1 session2 session3


Dep2 Dep1 Dep3

session4 session5 session6

4. Per user analysis: average time spent in page


60 50

Time 40 (secs)
30 20 10 0 1 2 3 4 5 6 7 8 9 1011121314151617181920212223

URLs

5. Online Value evolution


35 30

Value

25 20 15 10 5 0 -5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 sesin 3

sesin 1

sesin 2

Traversed number of links

Benefits of the algorithm


Makes it possible to know at any point if the ongoing navigation would be beneficial for the site, so that the site can be dynamically adjusted accordingly. Quantify the value of a user session while he or she is navigating Makes relationship user - site closer to real life relationships The algorithm integrates the site/department goals:
Sends pop ups to students according to the exercises they have already done Professors can establish preferences and the rules are changed accordingly

Conclusion
Without a proper project management:
Difficult to obtain significant patterns Difficult interpretation of the resutls The potential of the process is minimized

Site goals have to be integrated Algorithms alone are of not use: The best algorithm not always means the best result The patterns have to be deployed in a proper architecture

THANKS! QUESTIONS???

Das könnte Ihnen auch gefallen