289ml Project Proposal

Expedia Hotel Recommendations
Qingxuan Li, Bing Zhang

Department of Electrical and Computer Engineering
University of California, Davis
Introduction:
Recommendation system is the typical problem in machine learning. In this project,
Expedia would like to use recommendation system to provide personalized hotel to their users.
Expedia upload their customers dataset and wants people to use them to contextualize customer
data and predict the likelihood a use will stay at 100 different hotel groups.
Dataset introduction:
train/test.csv
Column name
date_time
site_name
Description
Timestamp
ID of the Expedia point of sale (i.e.
Expedia.com, Expedia.co.uk, Expedia.co.jp, ...)
Data
type
string
int
posa_continent
ID of continent associated with site_name
int
user_location_country
The ID of the country the customer is located
int
user_location_region
The ID of the region the customer is located
int
user_location_city
The ID of the city the customer is located
int
orig_destination_distance
Physical distance between a hotel and a customer at the time of

search. A null means the distance could not be calculated
double
user_id
ID of user
int
is_mobile
1 when a user connected from a mobile device, 0 otherwise
tinyint
is_package
1 if the click/booking was generated as a part of a package (i.e.

combined with a flight), 0 otherwise
int
channel
ID of a marketing channel
int
srch_ci
Checkin date
string
srch_co
Checkout date
string
Column name
Data
type
Description
srch_adults_cnt
The number of adults specified in the hotel room
int
srch_children_cnt
The number of (extra occupancy) children specified in the hotel

room
int
srch_rm_cnt
The number of hotel rooms specified in the search
int
srch_destination_id
ID of the destination where the hotel search was performed
int
srch_destination_type_id Type of destination
int
hotel_continent
Hotel continent
int
hotel_country
Hotel country
int
hotel_market
Hotel market
int
is_booking
1 if a booking, 0 if a click
tinyint
cnt
Numer of similar events in the context of the same user session
bigint
hotel_cluster
ID of a hotel cluster
int
destinations.csv
Column name
Description
Data type
srch_destination_id
ID of the destination where the hotel search was performed
int
d1-d149
latent description of search regions
double
Features selection:
To classifier our user and hotel, we need to train the machine from those features:
1. User location:
x
user_location_country
user_location_region
user_location_city
orig_destination_distance
The first three features categorize customers living area. We plan to use these data to distinguish
users location from each other. We are not going to use the orig_destination_distance data as
some times this data cannot be detected, which will be increase the uncertainty of our predicted
results.
2. User information:
user_id
is_mobile
is_package
channel
user_id is used to mark each user separately. We will not include data is_mobile in ours
calculated features as booking through different devices will not affect a persons choice at all.
Data is_package is not going to be used as well. Because whether booking a hotel with a flight or
not will not affect a users decision too much. Data channel will not be counted as we are not
sure about its meaning.
3. Booking information:
x
srch_ci
srch_co
srch_adults_cnt
srch_children_cnt
srch_rm_cnt
srch_destination_id
srch_destination_type_id
x cnt
We will calculate the expected booking dates by subtracting srch_ci from srch_co. Named the
results as srch_length. We plan to use srch_adults_cnt; srch_rm_cnt; srch_destination_id,
srch_destination_type_id and cnt data to distinguish each booking properties. srch_children_cnt
will be counted as 1 or 0 instead of numbers of children, 1 for existence of children and 0 for not.
4. Hotel information:
x
hotel_continent
hotel_country
x hotel_market
All of these three data will be used to categorize hotels location property.
5. Decision information:
x is_booking
x hotel_cluster
Learning results are based on these two data.
Method:
To do the recommendation, there also has other method to address problem such as SVD,
collaborative Filter method, which are also useful for solving this problem. The straight forward
way is the SVM method. In this project, we plan to use Support Vector Machine (SVM) that is
based on statistical learning theory, which uses the principle of Structural Risk Minimization
instead of Empirical Risk Minimization [1]. SVM would find a maximal margin separating
hyperplane between two classes of data. In our case, one case is the users group and another is
hotels clusters.
After this method, if we still have time, we would like to implement SVD or Collaborative Filer
method to do the compare the running time and the accuracy.
To implement the SVM, there are two factors: mathematical programming and kernel functions
(Linear or non-linear function).
SVM Model:
min
22 +
, 0 2
,
=1
(

) + 1
Where is the error for a given training point
,
is the vector of coefficients for the best
separating hyperplane. B is the offset for that hyperplane, and C is a constant that represents the
emphasis that is to be placed on minimizing the error [1].
Once this problem has been solved, this equation can be transferred as following:
() = [ (
, ) ]
=1
In this case, the kernel can be switched as non-linear kernel, the most popular kernel is:
(
, ) = ,,2
Where is user-chosen parameter [1]. () can be the score for each hotel [2]. In training file,
we can make the classifier based on the users feature and then based on that classifier to assign
the user from test file who has similar feature into the hotel which has the highest score.
[1] Xu, J. A., & Araki, K. (n.d.). A SVM-based Personal Recommendation System for TV
Programs.2006 12th International Multi-Media Modelling Conference. doi:10.1109/mmmc.2006.1651358
[2] Ankit Gupta, Rohan Jain, Shiwei Song. Movie Recommendations Using Social Networks.

289ml Project Proposal

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

289ml Project Proposal

Hochgeladen von

Copyright:

Verfügbare Formate

Expedia Hotel Recommendations

Qingxuan Li, Bing Zhang

ID of continent associated with site_name

The ID of the country the customer is located

The ID of the region the customer is located

The ID of the city the customer is located

Physical distance between a hotel and a customer at the time of

1 when a user connected from a mobile device, 0 otherwise

1 if the click/booking was generated as a part of a package (i.e.

The number of adults specified in the hotel room

The number of (extra occupancy) children specified in the hotel

The number of hotel rooms specified in the search

ID of the destination where the hotel search was performed

srch_destination_type_id Type of destination

Numer of similar events in the context of the same user session

ID of the destination where the hotel search was performed

latent description of search regions

Das könnte Ihnen auch gefallen