Sie sind auf Seite 1von 1

Tayko Software Case

Tayko is a software catalog firm that sells games and educational software. It started out as a
software manufacturer, and added third party titles to its offerings. It has recently put together
a revised collection of items in a new catalog, which it mailed out to its customers. This mailing
yielded 1000 purchases. Based on these data, Tayko wants to devise a model for predicting the
spending amount that a purchasing customer will yield.
The Tayko.xls contains the following information on 2000 customers

FREQ - Number of transactions in last year

FIRST_UPDATE Number of days since the customer first update to customer record
LAST_UPDATE Number of days since last update to customer record
WEB Whether customer purchased by web order at least once
GENDER Male/Female
ADDRESS RES whether it is a residential address
ADDRESS US whether it is a US address
SPENDING (response) Amount spent by customer in test mailing (in $)

1. Explore the spending amount by creating a pivot table for the categorical variables, and
computing the average and standard deviation of spending in each category. Write down your

2. Explore the relationship between spending and each of the two continuous predictors by
creating two scatter plots (SPENDING vs. FREQ, and SPENDING vs. LAST UPDATE). Does there
seem to be a linear relationship there?
3. Explore the relationship among all the independent variables. Which variables must be
included/excluded from the model. State the reason.

4. In order to fit a predictive model for SPENDING,

(a) Partition the 2000 records into training (70%) and validation set(30%).

(b) Run a multiple linear regression model for SPENDING vs. all the selected predictors (as
above in question 3) on training data set. Save the model.

(c) Use validation set to predict the spending amount. Comment on the performance of the
model using the validation data set.

(d) Give the estimated predictive equation of the predictive model. Based on this model, what
type of customer is most likely to spend more?

5. Run the best subset variable selection in R. Give the estimate predictive equation of the best
subset of predictors using the predictive accuracy of the model by examining its performance
on the validation set. Which model is better: Model built is Q4 or Q5?