Sie sind auf Seite 1von 62

AI - predviđanje cijena

nekretnina
30.10.2019
Kristijan Šarić
EXACT BYTE d.o.o.
AI - predviđanje cijena
nekretnina
30.10.2019
Kristijan Šarić
EXACT BYTE d.o.o.
About me
● > 10 years of experience
● Java (desktop, web, mobile) applications
● JavaScript (Angular, React, …)
● Scala (web), purely functional programming
● 2017, EXACT BYTE d.o.o
● Haskell (IOHK - explorer, wallet, cardano-shell, consulting for foreign companies)
● Python (ML, web applications, scripts)
● My own products (using ML)
○ https://www.emprovio.com/
○ https://contetino.com/
○ https://alenn.ai
○ Croatian sign language (in progress)
AI
● (Most) General form/term for anything relating to machines “thinking for
themselves”
● Artificial Intelligence is the broader concept of machines being able to carry
out tasks in a way that we would consider “intelligent” [1]
Machine learning
● “Rather than teaching computers everything they need to know about the
world and how to carry out tasks, it might be possible to teach them to learn
for themselves” [1]
● Probabilistic reasoning - what is the connection between inputs and outputs?
● Interesting field that emerged is Probabilistic programming which emphasizes
“reasoning under uncertainty”
Machine learning

Reinforcement

Supervised Unsupervised
AI vs Machine learning
● Deep Blue, the AI that defeated the world’s chess champion in 1997, used a
method called tree search algorithms to evaluate millions of moves at every
turn
● Prolog (Zlatko presented), first order logic - "there exists x such that x is
Socrates and x is a man"
● General intelligence, which is very broad and uses a sort of “transfer learning”
over different domains is more general than ML - imagine a neural network
that learns how to understand the language by understanding pictures
Machine learning

Data Result
Machine learning

House attributes (number


of bedrooms, bathrooms, Price
floors, garden…)
Machine learning

ML algorithm (linear
Model, thing that
Data regression, neural
“understands” data
network, …)
Machine learning

Model that knows how to


House price data Neural network
predict the house price
Machine learning

Model that knows how to


House price data Neural network
predict the house price
Machine learning

Model that knows how to


House price data
predict the house price
Machine learning

Model that knows how to


New, unseen data Result price
predict the house price
Machine learning

House attributes (number


Model that knows how to
of bedrooms, bathrooms, Price
predict the house price
floors, garden…)
ML algorithm, Neural networks

0.75 Magic 0.32


Neural networks

[0.75, 0.35, 1.7, -4.5] Magic [0.63, -3.56, 0, -10.3]


Neural networks

[ 0.75 [ 0.63
, 0.35 , -3.56
, 1.7 Magic ,0
, -4.5 , -10.3
] ]
Neural networks

[ 0.63
[ [ 0.75 ]
, -3.56
, [ 0.35, 1.7 ]
Magic ,0
, [ -4.5 ]
, -10.3
]
]
Neural networks

[ 0.63
[ [ [ 0.75 ] ]
, -3.56
, [ [ 0.35, -1.5 ], [ 1.7 ] ]
Magic ,0
, [ [ -4.5 ], [ -6.4 ]
, -10.3
]
]
Neural networks

[ 0.63
[ [ [ 0.75 ] ]
, -3.56
, [ [ 0.35, -1.5 ], [ 1.7 ] ]
Function ,0
, [ [ -4.5 ], [ -6.4 ]
, -10.3
]
]

● Universal Approximation Theorem


● http://neuralnetworksanddeeplearning.com/chap4.html
Linear function

plt.plot([0,1,2,3], [0,1,2,3])
Non - linear function
xs = np.linspace(-10, 10, 100).tolist()

plt.plot(xs, list(map(lambda x: x**2 - 1, xs)))


Linear vs non-linear
● Linear is a straight line, the slope is constant
● Non-linear is not a straight line, the slope is not constant
Data
● http://www.remax-svijetnekretnina.com/za-prodaju
● You can get the data from the company that owns it!
● I have manually copied them from the web
● I will not be sharing that data and will be unable to share the notebook unless
Remax agrees that this is ok
Site
Site
● url http://www.remax-svijetnekretnina.com/kuca/ist...

● lokacija Orihi, Barban

● transakcija Za prodaju

● vrsta_nekretnine Kuća/Vikendica

● ukupan_broj_soba 8

● toaleti NaN

● ukupno_katova 2

● cijena 180.000 € - 156.000 € (1.158.428 kn)

● povrsina 198 m2

● godina_izgradnje NaN

● stanje_nekretnine dovršena

● ...
Data?
● 80/20
● If you don’t have the data ready, you are probably going to spend 80% of the
time preparing it, the “fun” stuff is the 20% that is left.
● “The title the data engineering is associated with data, namely, their delivery,
storage, and processing.”
● https://towardsdatascience.com/who-is-a-data-engineer-how-to-become-a-dat
a-engineer-1167ddc12811
Notebook!
What data do we have?
● df.vrsta_nekretnine.unique()
● ['Poslovni prostor', 'Stan/Apartman', 'Kuća/Vikendica', 'Građevinsko zemljište',
'Negrađevinsko zemljište', 'Industrijski objekti', 'Ugostiteljski objekti',
’Trgovački objekti', 'Hoteli i pansioni', 'Garaža']
What data do we have?
● df.vrsta_nekretnine.unique()
● ['Poslovni prostor', 'Stan/Apartman', 'Kuća/Vikendica', 'Građevinsko zemljište',
'Negrađevinsko zemljište', 'Industrijski objekti', 'Ugostiteljski objekti',
’Trgovački objekti', 'Hoteli i pansioni', 'Garaža']
● len(df[df.vrsta_nekretnine == 'Stan/Apartman'])
○ 600
What data do we have?
● df.vrsta_nekretnine.unique()
● ['Poslovni prostor', 'Stan/Apartman', 'Kuća/Vikendica', 'Građevinsko zemljište',
'Negrađevinsko zemljište', 'Industrijski objekti', 'Ugostiteljski objekti',
’Trgovački objekti', 'Hoteli i pansioni', 'Garaža']
● len(df[df.vrsta_nekretnine == 'Stan/Apartman'])
○ 600
● len(df[df.vrsta_nekretnine.isin(['Stan/Apartman', 'Kuća/Vikendica'])])
○ 1892
What is it that we want to do?
● stanovi_kuce_df = df[df.vrsta_nekretnine.isin(['Stan/Apartman',
'Kuća/Vikendica'])]
● len(stanovi_kuce_df)
○ 1892
Vizualizacija i statistika, površina
● stanovi_kuce_df.plot(kind='scatter',x='povrsina_broj',y='cijena_kn',color='blue'
)
Vizualizacija i statistika, površina
● stanovi_kuce_df.plot(kind='scatter',x='povrsina_broj',y='cijena_kn',color='blue'
)

stanovi_kuce_df[stanovi_kuce_df.povrsina_broj >
1000].url
Vizualizacija i statistika, površina
● stanovi_kuce_df[stanovi_kuce_df.povrsina_broj <
500].plot(kind='scatter',x='povrsina_broj',y='cijena_kn',color='blue')
Vizualizacija i statistika, godina izgradnje
● stanovi_kuce_df.plot(kind='scatter',x='godina_izgradnje_broj',y='cijena_kn',co
lor='green')
Vizualizacija i statistika, godina izgradnje
● stanovi_kuce_df.plot(kind='scatter',x='godina_izgradnje_broj',y='cijena_kn',co
lor='green')
Vizualizacija i statistika, godina izgradnje
● stanovi_kuce_df.plot(kind='scatter',x='godina_izgradnje_broj',y='cijena_kn',co
lor='green')
Vizualizacija i statistika, godina izgradnje
● stanovi_kuce_df.plot(kind='scatter',x='godina_izgradnje_broj',y='cijena_kn',co
lor='green')

stanovi_kuce_df[stanovi_kuce_df.godina_izgradnje_
broj < 250].url
Vizualizacija i statistika, godina izgradnje
● stanovi_kuce_df[stanovi_kuce_df.godina_izgradnje_broj >
1950].plot(kind='scatter',x='godina_izgradnje_broj',y='cijena_kn',color='green'
)
Vizualizacija i statistika, ukupni broj soba
● stanovi_kuce_df.plot(kind='scatter',x='ukupan_broj_soba',y='cijena_kn',color=
'red')
Vizualizacija i statistika, broj toaleta
● stanovi_kuce_df.plot(kind='scatter',x='toaleti',y='cijena_kn',color='brown')
Vizualizacija i statistika, broj toaleta
● stanovi_kuce_df[stanovi_kuce_df.toaleti <
6].plot(kind='scatter',x='toaleti',y='cijena_kn',color='brown')
Vizualizacija i statistika, broj katova
● stanovi_kuce_df.plot(kind='scatter',x='ukupno_katova',y='cijena_kn',color='pu
rple')
Vizualizacija i statistika, broj katova
● stanovi_kuce_df[stanovi_kuce_df.ukupno_katova <
5].plot(kind='scatter',x='ukupno_katova',y='cijena_kn',color='purple')
Notebook!
Keras
● High level library, integrated into Tensorflow 2.0
● There is a lot of tools/libraries to work with (Tensorflow, Pytorch, …)
● They tend to be quite similar
● Similarity comes from the fact that most of the actual computations are
operations on Tensors (which are arrays for our purposes)
● The original library that used to run a lot of these computations was NumPy
● NumPy is limited in the sense that it can execute only on CPU and not on
GPU
● GPU computation is faster these days since the calculations fit nicely into the
framework of neural networks (a lot of small/simple calculations)
Neural networks (some) details

Hidden

Input Output
Hidden

Hidden
Neural networks (some) details

Hidden Hidden

Input Output
Hidden Hidden

Hidden Hidden
Neural networks architecture for this problem?

Output
Neural networks architecture for this problem?

cijena_kn
Neural networks (some) details

Input
Neural networks (some) details
ukupan_broj_soba

toaleti

ukupno_katova

spavace_sobe

kupaonice

godina_izgradnje_broj

povrsina_broj
Neural networks (some) details
ukupan_broj_soba

toaleti

ukupno_katova
cijena_kn

spavace_sobe

kupaonice

godina_izgradnje_broj

povrsina_broj
Neural networks (some) details
ukupan_broj_soba

toaleti

ukupno_katova

?
cijena_kn

spavace_sobe

kupaonice

godina_izgradnje_broj

povrsina_broj
Neural networks (some) details
ukupan_broj_soba

toaleti
1. The number of hidden layers equals one
ukupno_katova 2. The number of neurons in that layer is
the mean of the neurons in the input and cijena_kn
output layers.
spavace_sobe https://stats.stackexchange.com/questions/181/how-to
-choose-the-number-of-hidden-layers-and-nodes-in-a-f
eedforward-neural-netw/1097#1097

kupaonice

godina_izgradnje_broj

povrsina_broj
Neural networks (some) details
ukupan_broj_soba

Hidden
toaleti

ukupno_katova
Hidden cijena_kn

spavace_sobe

kupaonice Hidden

godina_izgradnje_broj
Hidden
povrsina_broj
Notebook!
Neural networks (some) details
ukupan_broj_soba

Hidden 1 Hidden 1 Hidden 1


toaleti

ukupno_katova
.... .... .... cijena_kn

spavace_sobe

kupaonice Hidden Hidden Hidden


250 250 250

godina_izgradnje_broj

povrsina_broj
Zanimljive (dodatne) ideje
● Po podacima, nekretnine se čine zanimljive za investicije jer postoji velika
količina devijacije (sa trenutno dostupnim podacima)
● Uključiti još podataka u analizu (fali dosta podataka)
● Uključiti još više izvora podataka (druge firme/stranice)
● Vizualizacija lokacija sa cijenama
● Analiza (tekstualnog) opisa nekretnine kako bi se dobio bolji uvid
● Pronaći faktore koji najviše utječu na cijenu (vjerojatnost a ne pogađanje)
● Ako je moguće doći do liste prodanih nekretnina (informacije o nekretnini i
kupcu) moguće bi bilo stvoriti analizu profila osobe koja kupuje (određene)
nekretnine?!?
Pitanja? Ideje? Prijedlozi?
Hvala na vremenu!

Das könnte Ihnen auch gefallen