Sie sind auf Seite 1von 3

Expedia Hotel Recommendations

Qingxuan Li, Bing Zhang


Department of Electrical and Computer Engineering
University of California, Davis
Introduction:
Recommendation system is the typical problem in machine learning. In this project,
Expedia would like to use recommendation system to provide personalized hotel to their users.
Expedia upload their customers dataset and wants people to use them to contextualize customer
data and predict the likelihood a use will stay at 100 different hotel groups.

Dataset introduction:
train/test.csv
Column name

Description

Data
type

date_time

Timestamp

string

site_name

ID of the Expedia point of sale (i.e.


Expedia.com, Expedia.co.uk, Expedia.co.jp, ...)

int

posa_continent

ID of continent associated with site_name

int

user_location_country

The ID of the country the customer is located

int

user_location_region

The ID of the region the customer is located

int

user_location_city

The ID of the city the customer is located

int

orig_destination_distanc Physical distance between a hotel and a customer at the time of


e
search. A null means the distance could not be calculated

double

user_id

ID of user

int

is_mobile

1 when a user connected from a mobile device, 0 otherwise

tinyint

is_package

1 if the click/booking was generated as a part of a package (i.e.


combined with a flight), 0 otherwise

int

channel

ID of a marketing channel

int

srch_ci

Checkin date

string

Column name

Data
type

Description

srch_co

Checkout date

string

srch_adults_cnt

The number of adults specified in the hotel room

int

srch_children_cnt

The number of (extra occupancy) children specified in the hotel


room

int

srch_rm_cnt

The number of hotel rooms specified in the search

int

srch_destination_id

ID of the destination where the hotel search was performed

int

srch_destination_type_id Type of destination

int

hotel_continent

Hotel continent

int

hotel_country

Hotel country

int

hotel_market

Hotel market

int

is_booking

1 if a booking, 0 if a click

tinyint

cnt

Numer of similar events in the context of the same user session

bigint

hotel_cluster

ID of a hotel cluster

int

destinations.csv
Column name

Description

Data type

srch_destination_id

ID of the destination where the hotel search was performed

int

d1-d149

latent description of search regions

double

Features selection:
To classifier the user, we need to train the machine from those features: {user_location_country;
user_location_region; user_location_city; user_id; channel; srch_adults_cnt; srch_children_cnt;
srch_rm_cnt; srch_destination_id, srch_destination_type_id; cnt } which can help machine to
identify the user from test file into the similar class of the users.
For the hotel part: { hotel_continent ; hotel_country; hotel_market; is_booking; hotel_cluster}
Method:
In this project, we decide to use Support Vector Machine (SVM) that is based on statistical
learning theory, which uses the principle of Structural Risk Minimization instead of Empirical

Risk Minimization [1]. SVM would find a maximal margin separating hyperplane between two
classes of data. In our case, one case is the users group and another is hotels clusters.
To implement the SVM, there are two factors: mathematical programming and kernel functions
(Linear or non-linear function). Based on the materials from the class, the non-linear function we
can use are logistic regression, relu, or sigmoid function to do the recommendation.
Linear SVM:
m

2
1

w2 +C i
w ,b , 0 2

i=1

min
i

such that y i (
w
x ib ) + i 1
xi ,
w
Where i is the error for a given training point

is the vector of coefficients for the

best separating hyperplane. B is the offset for that hyperplane, and C is a constant that represents
the emphasis that is to be placed on minimizing the error [1].
Once this problem has been solved, this equation can be transferred as following:

[
m

f ( x )=sign

i=1

ai y i K (
xi ,
y i )b

In this case, the kernel can be switched as non-linear kernel, the most popular kernel is:
2


x i ,
x j ,2

K (
xi ,
y i )=e
Where

is user-chosen parameter [1]. f ( x ) can be the score for each hotel [2]. In training

file, we can make the classifier based on the users feature and then based on that classifier to
assign the user from test file who has similar feature into the hotel which has the highest score.

[1] Xu, J. A., & Araki, K. (n.d.). A SVM-based Personal Recommendation System for TV Programs.2006
12th International Multi-Media Modelling Conference. doi:10.1109/mmmc.2006.1651358
[2] Ankit Gupta, Rohan Jain, Shiwei Song. Movie Recommendations Using Social Networks.

Das könnte Ihnen auch gefallen