Sie sind auf Seite 1von 9

1

P R E D I C T I V E M O D E L

WORKFLOW
Several steps were taken to develop a predictive model for student apartments.

DATA COLLECTION DATA ANALYSIS MODEL BUILDING PREDICTION


Developed custom Data augmentation Clustering analysis Predictions

crawlers to gather and feature and non parametric validated on blind

information from a engineering. regression data test and tested

well known real against provided

state site in benchmark data

Belgium.
2

DATA COLLECTION
“ Wa r i s n i n e t y p e r ce n t i n f o r m a t i o n ” . N a p o l e o n B o n a p a r t e

+6000 data points


Custom web crawler was developed for this
purpose.

Apartment price, address, area among other


features were collected.

30% of the data was reser ved as holdout set.


3

DATA ANALYSIS
“If you torture the data long enough, it will confess”. Ronald Case, Economist

Feature
Engineering
- Coordinates inferred from apartment
address.

- Multidimensional clustering applied to


classify apartments.

- Heat maps created to visualize density of


student apartments in Antwerp.

- Descriptive statistics per cluster.

- Determine features that correlate with


price to be used as predictors.
4

VISUALIZATION
“ A b ove a l l e l s e s h ow t h e d a t a ” . E d w a r d Tu f t e

Demo available here:


http://muro.ai/index.php/antwerp/

Student Apartments in Antwerp.


Clustered View
5

VISUALIZATION
“ A b ove a l l e l s e s h ow t h e d a t a ” . E d w a r d Tu f t e

Custom map tiles developed using GIS.

Student Apartments in Antwerp.


Correlation Maps - Price per square meter

15

16

18

19

20
6

MODEL BUILDING
“ We ca n n o t s o l ve o u r p r o b l e m s w i t h t h e s a m e t h i n k i n g we u s e d w h e n we c r e a t e t h e m ” . A l b e r t E i n s t e i n

Machine
Learning
- Non parametric regression used.

- R a n d o m F o r e s t a n d B o o s t e d D e c i s i o n Tr e e s
were used.

- 70% of the data was used for training


model while 30% of the data was used for
model validation.
7

PREDICTION
“Prediction is ver y difficult, especially if it’s about the future”. Niels Bohr

Prediction Analogues

error of 14% on All blind tests

analogue data
- Additionally, 1000+ samples used as
validation set.

- 80% of these predictions within 30 EUR.


8

COMPARISON
COMPARISON WITH
Analogues are
CURRENT ESTIMATES All Data
Price typically mode
Analogue
MADE BY CLIENT expensive

Analogues used for Tunnelplaats are:


•Campus Nieuw Zuid (Jan Vanhoenackerstraat)
•Parktoren (Ellermanstraat 61)
•Xior (Keizerstraat 13)
•Xior (Kipdorpstraat 49 -55)
•Campus Nieuw Zuid (Jan Vanhoenackerstraat)
•Parktoren (Ellermanstraat 61)

Area Analogues and Price per m2 Analogues are


sampled data have typically mode
similar area expensive per m2.
9

INSIGHTS
“Prediction is ver y difficult, especially if it’s about the future”. Niels Bohr

Feature Importance

Apart from location and area, the presence of a private shower and toilet are
important parameters that drive the price up.