Beruflich Dokumente
Kultur Dokumente
BUSINESS INTELLIGENCE
PROF. MAYTAL SAAR-TSECHANSKY
Assignment 1
I. Data Mining Concepts
Answer the questions below. Please be concise and phrase your answers carefully.
1. For each of the data mining tasks (i.e., classification, regression, clustering, link analysis and
sequence analysis) provide an example of a business problem that can be supported by these
methods. For each business problem (e.g., customer attrition) formulate clearly the business
goal (e.g., prevent attrition of any customers who is likely to switch to a competitor), and
explain how the data mining method can be used to obtain it (e.g., build a classification
model to predict whether a customer will switch). Please do not use the examples discussed
in class. (10 points)
2. Because descriptive data mining is not used to predict values of interest, it is beneficial to
gain insights on past events, but is not useful to support future decisions. Do you agree with
this statement? Explain your answer. (15 points)
3. An analyst in a telecommunication company analyzed a subset of the firms data and found
that 20% of customers who made at least two calls to the companys customer service center
within 2 months have switch to a competing provider. Later that day, the analyst repeated
the analysis, using another subset from the same database. However, this time the analysis
suggested that only 10% of customers switched once they made at least two calls to
customer service within a period of two months.
discrepancy between the patterns the analyst found in each case. (15 points)
Possible Values
Warm, Cold, Raining
Yes, No
Yes, No
Because each attribute's value starts with a different letter, for shorthand we'll
just use that initial letter, e.g., 'W' for Warm.. Our target/class variable (the
variable value we want to predict) is whether or not the president will jog today.
Here is our TRAINING data set, which we will use to build a predictive model
of the presidents decisions:
N
Y
Y
Y
N
Y
N
N
Y
Y
N
N
Y
Y
Yes
No
No
No
No
No
No
Yes
No
Yes
Yes
Yes
No
No
(b) Using the tree model for prediction, and estimating models predictive accuracy
(15 points)
Here is a Test Set of examples for which you would like to generate predictions:
WEATHER JOGGED_YESTERDAY Target (Jog Today)
W
Y
?
R
N
?
C
N
?
C
Y
?
W
N
?
R
Y
?
Use the decision tree produced in part (a) to predict the class (classify) each example in
the TEST set.
The table below contains the same examples as in the Test Set, but also includes the
correct classification of each example. What proportions of the cases in the Test Set were
predicted accurately by your model?