Sie sind auf Seite 1von 14

By Group 7

Nancy (1301021)
Subhash Rajeev (1301044)
Vishnu Poduval (1301049)
Vishal Wagh(1301050)
Eswar Sunil Kumar (1301053)
Aaron Ernest (1301061)

Objectives of the study


Business
Objective

To gauge the buying behavior of customers from an ecommerce point of view


and thereby to identify the major issues that prevent different classes of users
from using the internet for making purchases

Factors Under
Consideration

Source of
database

General demographics of the users


Technology demographics of the users
Internet shopping habits
Web and Internet Usage habits

GVUCs 8th WWW user survey( Run from October 10, 1997 to November 16,
1997)
Special pointers provided to the by Yahoo, Netscape and WebTV

Mining
Main challenge faced - Branching

- Categorization of the data


General

Gender
Primary
Language

Registered to
Vote
Imp Issue
Facing the
Internet

Technology
Connection
Speed &
Upgrades
Email
accounts

Equipment
Owned
Frequency of
Switching
Browsers

Years on
Internet

Internet Usage

Privacy

Indispensable
Technologies

Cookie
Privacy

E-commerce

Reasons for Using the Web


for Personal Shopping
Time Spent Searching
Frequency of Use Internet Laws Personal Shopping
Content
Providers
have right to
Navigation
resell user
Success Rate of Personal
Services
information
Shopping

Which are the classes of customers


who have/havent taken to shopping
online? What are their traits?
What are the fundamental reasons
for people to not purchase online?
What is the sensitivity of these
reasons with respect to economic
classification,
geography, gender and occupation?
Is there a correlation between time
spent on the internet and purchase
habits?
How much do education levels
matter in terms of segmenting online
customers?

Overview of Dataset
Dataset- Numeric, stores in ASCII file
No of datasets- 10044. Fields- 60

The entire data set has been


recoded as numeric, with an
index to the codes described
in an additional "Coding"
file

After Cleaning, no of datasets- 7290, no of training datasets- 1822, no of test datasets- 5468

Variables
(Demographic Segmentation)

Variables
(Psychographic Segmentation)

Age

Years on Internet

Country

Major Occupation

Gender

Who_pays_for_access

Education_Attainment

Willingness_to_Pay_Fees

Race

Most_important_Issue_Facing_th
e_Internet

Major Geographical Location

Sexual Preference

Marital Status

Primary_Place_of_WWW Access

Household Income

The training dataset has been used


to develop the model which was
later tested on test data. Some of the
fields are highly correlated with
each other.
In most of such cases, only one of
them have been used for analysis

Methodology
Data cleaning
Sampling Training and test
Demographic segmentation models trained
C 5.0 with balancing
C 5.0 with balancing and boosting
C 5.0 with balancing, boosting and misclassification costs

Psychographic segmentation models trained

C 5.0 balanced
C 5.0 Boosted
C 5.0 misclassified
Neural network
Boosted Neural network

Testing of models

Interpreting results of most accurate and suitable model

Sampling methodology used:


Training Include 1 in 4
Testing Discard 1 in 4
Target variable
Whether user has shopped
online
Variable type Flag
Distribution in training model78% Yes, 22% No
Factor of balancing used 2.25
Misclassification penalty for
demographic segmentation 8
No. of trials used for boosting

Evaluation parameters used Gains chart and confusion


matrices

Other models were used as well, only these have been represented in the PPT for having the most accurate results in the
confusion matrices and lift curves

Methodology Models used


Demographic

Psychographic

Results
Demographic
C 5.0 with balancing

C 5.0 with balancing and boosting

C 5.0 with balancing, boosting


and misclassification costs

C 5.0 with balancing, boosting and misclassification costs has the highest model accuracy for test data
It also has the highest number of true positives

Results
Demographic
C 5.0 with balancing

C 5.0 with balancing and boosting

C 5.0 with balancing, boosting


and misclassification costs

C 5.0 with balancing, boosting and misclassification costs has the highest amount of lift for test data

Results

Demographic- Balanced C 5.0 model with boosting and misclassification


penalties

Results
Psychographic
Normal C 5.0 with balancing

Simple neural network

C 5.0 with misclassification penalties

C 5.0 with boosting

Neural network with boosting

Results
Psychographic
Normal C 5.0 with balancing

C 5.0 with misclassification penalties

C 5.0 with boosting

Simple neural network

Neural network with boosting

Results
Psychographic Simple neural network

Time spent on the internet seems to be a crucial factor when it


comes to understanding what factors come into consideration
when deciding whether to purchase online

Interpretation and Concluding Remarks

Thus we can see that age, years spent on the internet

and primary place of internet access are the most


E- commerce websites therefore could choose to
important characteristics that decide whether users
only advertise online since it helped them reduce
buy online
the cost of advertising as against a tv commercial
Surprisingly, the purchasing power or monthly income since their buyers were mainly people who were
familiar with the internet
and occupation had a much lower importance when it
came to predicting whether a user would purchase
online
However, since this data is from 1997, at a time when
users were new to the internet, it makes sense that
familiarity with the medium was necessary in order to
reach consumers
E-commerce shoppers during this era were early
adopters in the product lifecycle and were technology
savvy and already used to the internet. They trusted it
as well.

Das könnte Ihnen auch gefallen