Beruflich Dokumente
Kultur Dokumente
Nancy (1301021)
Subhash Rajeev (1301044)
Vishnu Poduval (1301049)
Vishal Wagh(1301050)
Eswar Sunil Kumar (1301053)
Aaron Ernest (1301061)
Factors Under
Consideration
Source of
database
GVUCs 8th WWW user survey( Run from October 10, 1997 to November 16,
1997)
Special pointers provided to the by Yahoo, Netscape and WebTV
Mining
Main challenge faced - Branching
Gender
Primary
Language
Registered to
Vote
Imp Issue
Facing the
Internet
Technology
Connection
Speed &
Upgrades
Email
accounts
Equipment
Owned
Frequency of
Switching
Browsers
Years on
Internet
Internet Usage
Privacy
Indispensable
Technologies
Cookie
Privacy
E-commerce
Overview of Dataset
Dataset- Numeric, stores in ASCII file
No of datasets- 10044. Fields- 60
After Cleaning, no of datasets- 7290, no of training datasets- 1822, no of test datasets- 5468
Variables
(Demographic Segmentation)
Variables
(Psychographic Segmentation)
Age
Years on Internet
Country
Major Occupation
Gender
Who_pays_for_access
Education_Attainment
Willingness_to_Pay_Fees
Race
Most_important_Issue_Facing_th
e_Internet
Sexual Preference
Marital Status
Primary_Place_of_WWW Access
Household Income
Methodology
Data cleaning
Sampling Training and test
Demographic segmentation models trained
C 5.0 with balancing
C 5.0 with balancing and boosting
C 5.0 with balancing, boosting and misclassification costs
C 5.0 balanced
C 5.0 Boosted
C 5.0 misclassified
Neural network
Boosted Neural network
Testing of models
Other models were used as well, only these have been represented in the PPT for having the most accurate results in the
confusion matrices and lift curves
Psychographic
Results
Demographic
C 5.0 with balancing
C 5.0 with balancing, boosting and misclassification costs has the highest model accuracy for test data
It also has the highest number of true positives
Results
Demographic
C 5.0 with balancing
C 5.0 with balancing, boosting and misclassification costs has the highest amount of lift for test data
Results
Results
Psychographic
Normal C 5.0 with balancing
Results
Psychographic
Normal C 5.0 with balancing
Results
Psychographic Simple neural network