Anirban CMI StatFin 2019 II

Prof.
Anirban Chakraborti
School of Computational and Integrative Sciences

Jawaharlal Nehru University, New Delhi
COMPLEX SYSTEMS
Complex Systems is a new field of science studying
how parts of a system give rise to the collective
behaviors of the system, and how the system
interacts with its environment. Systems that are
"complex" have distinct properties that arise from
these relationships, such as nonlinearity, emergence,
spontaneous order, adaptation, and feedback loops,
among others.
E.g.:
Social systems formed (in part) out of people,
Brain formed out of neurons,
Financial markets formed out of agents or firms,
etc.
Our research work is focused on new
interdisciplinary research ﬁelds, by
applying methods of statistical physics
to problems in economics and ﬁnance,
termed as “Econophysics”,
or to problems in sociology, termed as
“Sociophysics”.
2019
Market is a ‘model’ complex system
In a market, many agents are interacting to

perform the collective task of finding the
best price for a good/asset.
1. Microscopic level: Stocks 2. Mesoscopic level: Sectors 3. Macroscopic level: Indices
Log-Returns
where i = 1,…,N (total no. of stocks)
𝑟𝑖 (𝜏) = ln𝑃𝑖 (𝜏) − ln𝑃𝑖 (𝜏 − 1)
τ = time
Correlation coefficient
where Cij is the equal time Pearson
𝑟𝑖 𝑟𝑗 〉 − 〈𝑟𝑖 〉〈𝑟𝑗 correlation coefficients between
𝐶𝑖𝑗 =
stock i and j ,
〈𝑟𝑖2 〉 − 𝑟𝑖 2 〈𝑟𝑗2 〉 − 𝑟𝑗 2
i and j = 1,…,N (total no. of stocks)
Random time series: White noise
T <- 2000 # length of time series

X <- rnorm(T) # random time series of length T
plot(1:T, X, type='l',xlab="days", ylab="values", main="White Noise", cex.lab=2,
cex.axis=2, cex=2, font=2 , bty='n')
Probability distribution: Random time series
T <- 1e7
X <- rnorm(T)
hist(X, plot=T, breaks=200, cex.axis=1.5, font=2, cex.lab=1.5)
hh <- hist(X, plot=FALSE, breaks=200)

# breaks is used to mention the width of each bar.
plot(hh$mids, hh$density,
type='l', ylab="PDF", xlab=expression(r[t]),
lwd=2, main="PDF of white noise",
cex.lab=2, cex.axis=2, cex=2, font=2)
Auto-correlation: Random time series
T <- 1e7
X <- rnorm(T)
my_acf <- acf(X, plot=T, lag.max=30)
plot(my_acf,
main="Autocorrelation of White noise",
xlab="lag", ylab="ACF",
cex.lab=2, cex.axis=2, cex=2, font=2)
Correlation: Random time series
n <- 50000 # length of time series

x<-rnorm(n) # normal distribution time series
A <- matrix(x, 1000, 50)
# find correlation
C <- cor(A)
corrplot(C, method='color')
Eigenvalues: Random time series
EV <- eigen(C)
h <- hist(EV$values, breaks = 30 )
Eigenvalues
H.K. Pharasi, K. Sharma, A Chakraborti and T.H. Seligman, “Complex market dynamics in the light of random matrix theory”
Data Science in Finance
Supervised learning vs. unsupervised learning
• Supervised learning: discover patterns in the data that relate data

attributes with a target (class) attribute.
• These patterns are then utilized to predict the values of the target
attribute in future data instances.
• Unsupervised learning: The data have no target attribute.
• We want to explore the data to find some intrinsic structures in them.
Unsupervised learning
Unsupervised Learning
In unsupervised learning (UML), no labels are provided, and the

learning algorithm focuses solely on detecting structure in unlabeled
input data. One generally differentiates between
• Clustering, where the goal is to find homogeneous subgroups within
the data; the grouping is based on distance between observations.
• Dimensionality reduction, where the goal is to identify patterns in
the features of the data. Dimensionality reduction is often used to
facilitate visualization of the data, as well as a pre-processing method
before supervised learning.
Multi-dimensional scaling (MDS)
 Represent high-dimensional data in few (usually 2Dor 3D) dimensions

keeping distances between points similar.
 Multidimensional scaling is a visual representation of distances or
dissimilarities between sets of objects.
 Objects that are more similar (or have shorter distances) are closer
together on the graph than objects that are less similar (or have longer
distances).
Algorithm
 Assign a number of points to coordinates in n-dimensional space: N-dimensional space

could be 2-dimensional, 3-dimensional, or higher spaces. The orientation of the
coordinate axes is arbitrary.
 Calculate Euclidean distances for all pairs of points: This results in the similarity matrix.
 Compare the similarity matrix with the original input matrix: Evaluate the stress
function. Stress is a goodness-of-fit measure, based on differences between predicted
and actual distances.
 Adjust coordinates, if necessary, to minimize stress.
1. Kruskal , J. B. Nonmetric multidimensional scaling : a numerical method. Psychometrikaj (1964), 29, 115-130.
2. Kruskal, J.B. and M. Wish (1978). Multidimensional Scaling. Sage.
3. Borg, I. et. al (2012). Applied MDS. Springer Science & Business Media.
Coordinates are not unique
• Display structure of distance-like data as a geometrical picture, such that ”similar” objects
are together and ”dissimilar” objects are far from each other.
Display structure of distance-like data as a geometrical picture
Input (Distance metric) Output (Coordinates  Map)

Clustering
• Clustering is a technique for finding
similarity groups in data, called clusters.
• It groups data instances that are similar to
(near) each other in one cluster and data
instances that are very different (far away)
from each other into different clusters.
• Clustering is often called an unsupervised The data set has three natural groups
learning task as no class values denoting of data points, i.e., 3 natural clusters.
an a priori grouping of the data instances
are given, which is the case in supervised
learning.
K-means algorithm
• Given K, the K-means algorithm works
as follows:
1) Randomly choose K data points (seeds) to
be the initial centroids, cluster centers
2) Assign each data point to the closest
centroid
3) Re-compute the centroids using the
current cluster memberships.
4) If a convergence criterion is not met, go to
2).
kmeans(x, centers = 3)
where, x is a numeric data matrix, and
centers is the pre-defined number of clusters
Spanning Tree
 A spanning tree of a graph is just a subgraph that contains all the vertices and is a tree.
 A graph may have many spanning trees.
 The cost of the spanning tree is the sum of the weights of all the edges in the tree.
 The following complete graph will generate 16 spanning trees.
Graph A Some Spanning Trees from Graph A
or or or
Minimum Spanning Tree (MST)
The minimum spanning tree (MST) is the spanning tree where the cost is
minimum among all the spanning trees.
Complete Graph Minimum Spanning Tree

7
2 2
5 3 3
1 1
So, a minimum spanning tree has (N-1) edges where N is the number of
vertices in the given graph. There also can be many minimum spanning
trees.
Prim’s algorithm
Prim’s Algorithm, was developed in 1930 by Czech mathematician Vojtěch Jarník and later
rediscovered and republished by computer scientists Robert C. Prim in 1957 and Edsger W.
Dijkstra in 1959. Therefore, it is also sometimes called the Prim-Dijkstra algorithm.
 It works with nodes rather than edges.
 The algorithm works as follows:
• Maintain two disjoint sets of vertices. One containing vertices that are in the growing spanning
tree and other that are not in the growing spanning tree.
• Select the cheapest vertex that is connected to the growing spanning tree and is not in the
growing spanning tree and add it into the growing spanning tree. This can be done using Priority
Queues. Insert the vertices, that are connected to growing spanning tree, into the Priority Queue.
• Check for cycles. To do that, mark the nodes which have been already selected and insert only
those nodes in the Priority Queue that are not marked.
1. Robert, C. "Prim. 1957. Shortest connection networks and some generalizations." The Bell System Technical Journal 36.6 (1957).
Kruskal's algorithm
• Kruskal's Algorithm, was written by Joseph Bernard Kruskal, an American

mathematician, statistician, computer scientist and psychometrician in 1956.
 It works with edges, rather than nodes.
 The algorithm works as follows:
• Sort the graph edges with respect to their increasing weights.
• Start adding edges to the MST from the edge with the smallest weight until the edge of the
largest weight.
• Only add edges which doesn't form a cycle, edges which connect only disconnected
components.
1. Kruskal, Joseph B. "On the shortest spanning subtree of a graph and the traveling salesman problem." Proceedings of the American Mathematical society 7.1 (1956): 48-
50.
Comparison
• Kruskal time complexity worst case is O(E
log E) this because we need to sort the
edges and the best time for Kruskal's is O(E
logV) if the edges are sorted in linear
time. We should use Kruskal when the graph
is sparse, i.e. small number of edges, when
the edges are already sorted or if we can
sort them in linear time.
• Prim time complexity worst case is O(E log
V) with priority queue or even better, O(E+V
log V) with Fibonacci Heap. We should use
Prim when the graph is dense, i.e. number of
edges is high.
https://stackoverflow.com/questions/1195872/kruskal-vs-prim
Data Science in Finance
USA
Black Monday Lehman Bros.

Financial data analyses: Intraday
“Study of statistical correlations in intraday and daily financial return time series”
Gayatri Tilak, Tamas Szell, Remy Chicheportiche, Anirban Chakraborti
http://arxiv.org/pdf/1204.5103.pdf
Ultrametric distances?
Correlation coefficients can be transformed to distances as follows
d 
t
ij 2(1   )  D , where 2  d  0
t
ij
t t
ij
N N
and fulfil the conditions:

(i) d ij  0  i  j
(ii) d ij  d ji
(iii) d ij  d ik  d kj
(iv)
R.N. Mantegna, Eur. Phys. J. B 11, 193 (1999). 43

J.-P. Onnela, A. Chakraborti, K. Kaski, J. Kertesz and A. Kanto, Phys. Rev. E 68, 056110 (2003).
Procedure for asset trees
Return matrix
   rijt  
1 2 M
R , R , .., R
N T N T N T
calculate correlations
Correlation 1    1  ijt  N ( N  1)
1 2 M t 1
C , C , ..., C ij 2
matrix N N N N N N
transform to distances
Distance matrix D , D
1 2
, ..., D
M
2  d ijt  0 d ijt  1
N ( N  1)
N N N N N N 2
data pruning with MST
Asset tree T 1  (V , E 1 ), ......, T M

 (V , E M ) V N
Ei  N 1
Asset tree and clusters
Business sectors (Forbes)
Utilities
Energy
Yahoo
data
Asset tree: topology change
Normal market topology crash topology
topology
“Dynamic asset trees and Black Monday”, J.-P. Onnela, A. Chakraborti, K. Kaski and J. Kertesz, Physica A 324, 247 (2003)
Erdős & Rényi: Random graph model (1959)
The Erdős–Rényi model, is used for generating random graphs in which edges are set between
nodes with equal probabilities.
To generate an Erdős–Rényi model two parameters must be specified:
 the number of nodes in the graph generated as N and
 the probability that a link should be formed between any two nodes as p.
Sharma et al., in preparation

where E is the expected number of edges.
The degree distribution is binomial
The clustering coefficient  0.
The average path length is relatively short and  log(N).
However, most real networks are not random!

Facebook network
Facebook friendship network
https://digiday.com/uk/facebooks-ad-network-extends-mobile-web/
Six degrees of separation
Six degrees of separation is the idea that all living things and everything
else in the world are six or fewer steps away from each other so that a chain
of "a friend of a friend" statements can be made to connect any two people
in a maximum of six steps.
The phrase "six degrees of separation" is often used as a synonym for the
idea of the "small world" phenomenon.
Computer networks: In 2001, Duncan Watts, a professor at Columbia University, attempted to recreate Milgram's
experiment on the Internet, using an e-mail message as the "package" that needed to be delivered, with 48,000
senders and 19 targets (in 157 countries). Watts found that the average (though not maximum) number of
intermediaries was around 6.
A 2007 study by Jure Leskovec and Eric Horvitz examined a data set of instant messages composed of 30 billion
conversations among 240 million people. They found the average path length among Microsoft Messenger users to
be 6.
Facebook: The average degrees of separation between different people is 5.73 degrees, whereas the maximum
degree of separation is 12.
Watts-Strogatz: Small-world model (1998)
 The Watts and Strogatz model is a random graph generation model that produces graphs with small-world
properties.
 Each node in the network is initially linked to its closest neighbors.
 Each edge has a probability p that it will be rewired to the graph as a random edge.
 The expected number of rewired links in the model is
Barabási–Albert:
Preferential attachment model (1999)
A "rich-get-richer" effect.
In this model, an edge is most likely to attach to nodes with higher
degrees.
Growth: The network begins with an initial network of m0 nodes. m0 ≥
2 and the degree of each node in the initial network should be at least
1, otherwise it will always remain disconnected from the rest of the
network.
Preferential attachment: New nodes are added to the network one at
a time. Each new node is connected to m existing nodes with a
probability that is proportional to the number of links that the existing
nodes already have. Formally, the probability pi that the new node is
connected to node i is:
where ki is the degree of node i.

The degree distribution resulting as scale
free, in particular, it is a power law.
p ( k ) ~ k   , where  3
The BA model exhibits D ~ log(log(N)) (ultra-small world!)
There are many other models to generate scale-free networks, with different powers
Distribution of vertex degrees
Vertex degree distributions and its scaling:
Normal market topology Crash topology
Portfolio optimization
The first stage starts with observation and experience and ends with beliefs about the
future performances of available securities. The second stage starts with the relevant
beliefs about future performances and ends with the choice of portfolio.
--Harry Markowitz
In portfolio optimization, our aim is to build feasible combinations of risk and return called the
efficient frontier, figured out in 1952 by Harry Markowitz, for which he was awarded the Nobel Prize
in 1990.
There is a way to estimate the tolerance for loss to imply the amount of collateral (risk-free asset) to
hold, in which the idea is to take a ruler and draw a line to the efficient frontier to discover the best
portfolio of exposures for a hypothetical working capital position: the one that maximizes the return
for the risk, the ratio that William Sharpe figured out in 1966.
https://bookdown.org/wfoote01/faur/portfolio-analytics.html
 The primary objective of portfolio

management is to maximize gains while
reducing diversifiable risk.
 As portfolios can consist of any number of
assets with differing proportions of each asset,
there is a wide range of risk-return ratios.
 The efficient frontier consists of the set of all
efficient portfolios that yield the highest return
for each level of risk.
 On the efficient frontier, there is a portfolio
with the minimum risk, as measured by the
variance of its returns — hence, it is called the
minimum variance portfolio — that also has a
minimum return, and a maximum return
portfolio with a associated maximum risk.
Allocation of portfolio weights
As the market evolves, the weights are updated; allocation of portfolio weights for the
minimum risk portfolio without short-selling
Asset tree and portfolio layer
Forbidden
region
Portfolio
Region with
Minimum risk
Studies & results
• Empirical data
• MDS maps with

• N=18 companies
• 1 year daily closure prices
• T=30 days (overlapping/non-overlapping)
• δT=1 day
75
“Study of statistical correlations in intraday and daily financial return time series”
Gayatri Tilak, Tamas Szell, Remy Chicheportiche, Anirban Chakraborti
http://arxiv.org/pdf/1204.5103.pdf
Studies & results
• Empirical data
76
Studies & results
• Empirical data
77
Pairs trade
78
Refer Wikipedia
Pairs trade
79
Refer Wikipedia
Studies & results
• Empirical data
80
Studies & results
Studies & results
Studies & results
AUS - Australia
Data BEL - Belgium
CAN - Canada
CHE - Switzerland
We have used the sectoral price indices from the Thomson Reuters Eikon database , within the DEU - Germany
time frames January 2008- December 2009, and October 2014- September 2016. We have DNK - Denmark
ESP - Spain
analyzed the data for a total of 65 sectors of 27 countries across the globe. FIN - Finland
FRA - France
GBR - United Kingdom
GRC - Greece
HKG - Hong Kong
IDN - Indonesia
IND - India
Abbreviations of the 65
JPN - Japan
sectors analyzed. LKA - Sri Lanka
MYS - Malaysia
NLD - the Netherlands
NOR - Norway
PHL - Philippines
PRT - Portugal
QAT - Qatar
SAU - Saudi Arabia
SWE - Sweden
THA - Thailand
USA - United States of
America
ZAF - South Africa,
https://customers.thomsonreuters.com/eikon/index.html
USA
Minimum spanning trees: 20 countries out of the 27 countries
Sectoral dynamics and core-periphery structure
Core Periphery
stable core-
periphery
structure with
no change
Sectoral dynamics and robustness
The bit-strings of sectoral centralities (EVC) and their corresponding inclusion in
the portfolio (PWT) for the different sectors of the USA
n (% of coefficient of variation) vs. D
Additional materials
Additional materials
• http://www.jnu.ac.in/faculty/anirban/index.html

Anirban CMI StatFin 2019 II

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Anirban CMI StatFin 2019 II

Hochgeladen von

Copyright:

Verfügbare Formate

Prof.

School of Computational and Integrative Sciences

In a market, many agents are interacting to

T <- 2000 # length of time series

hist(X, plot=T, breaks=200, cex.axis=1.5, font=2, cex.lab=1.5)

hh <- hist(X, plot=FALSE, breaks=200)

my_acf <- acf(X, plot=T, lag.max=30)

n <- 50000 # length of time series

• Supervised learning: discover patterns in the data that relate data

In unsupervised learning (UML), no labels are provided, and the

 Represent high-dimensional data in few (usually 2Dor 3D) dimensions

 Assign a number of points to coordinates in n-dimensional space: N-dimensional space

Display structure of distance-like data as a geometrical picture

Input (Distance metric) Output (Coordinates  Map)

Graph A Some Spanning Trees from Graph A

Complete Graph Minimum Spanning Tree

• Kruskal's Algorithm, was written by Joseph Bernard Kruskal, an American

Black Monday Lehman Bros.

Correlation coefficients can be transformed to distances as follows

and fulfil the conditions:

R.N. Mantegna, Eur. Phys. J. B 11, 193 (1999). 43

data pruning with MST

Asset tree T 1  (V , E 1 ), ......, T M

Sharma et al., in preparation

The degree distribution is binomial

The clustering coefficient  0.

The average path length is relatively short and  log(N).

However, most real networks are not random!

Facebook friendship network

where ki is the degree of node i.

 The primary objective of portfolio

• MDS maps with

Das könnte Ihnen auch gefallen