Beruflich Dokumente
Kultur Dokumente
Anirban Chakraborti
E.g.:
Social systems formed (in part) out of people,
Brain formed out of neurons,
Financial markets formed out of agents or firms,
etc.
Our research work is focused on new
interdisciplinary research fields, by
applying methods of statistical physics
to problems in economics and finance,
termed as “Econophysics”,
or to problems in sociology, termed as
“Sociophysics”.
2019
Market is a ‘model’ complex system
T <- 1e7
X <- rnorm(T)
plot(hh$mids, hh$density,
type='l', ylab="PDF", xlab=expression(r[t]),
lwd=2, main="PDF of white noise",
cex.lab=2, cex.axis=2, cex=2, font=2)
Auto-correlation: Random time series
T <- 1e7
X <- rnorm(T)
plot(my_acf,
main="Autocorrelation of White noise",
xlab="lag", ylab="ACF",
cex.lab=2, cex.axis=2, cex=2, font=2)
Correlation: Random time series
# find correlation
C <- cor(A)
corrplot(C, method='color')
Eigenvalues: Random time series
EV <- eigen(C)
h <- hist(EV$values, breaks = 30 )
Eigenvalues
H.K. Pharasi, K. Sharma, A Chakraborti and T.H. Seligman, “Complex market dynamics in the light of random matrix theory”
Data Science in Finance
Supervised learning vs. unsupervised learning
1. Kruskal , J. B. Nonmetric multidimensional scaling : a numerical method. Psychometrikaj (1964), 29, 115-130.
2. Kruskal, J.B. and M. Wish (1978). Multidimensional Scaling. Sage.
3. Borg, I. et. al (2012). Applied MDS. Springer Science & Business Media.
Coordinates are not unique
• Display structure of distance-like data as a geometrical picture, such that ”similar” objects
are together and ”dissimilar” objects are far from each other.
kmeans(x, centers = 3)
where, x is a numeric data matrix, and
centers is the pre-defined number of clusters
Spanning Tree
A spanning tree of a graph is just a subgraph that contains all the vertices and is a tree.
A graph may have many spanning trees.
The cost of the spanning tree is the sum of the weights of all the edges in the tree.
The following complete graph will generate 16 spanning trees.
or or or
Minimum Spanning Tree (MST)
The minimum spanning tree (MST) is the spanning tree where the cost is
minimum among all the spanning trees.
2 2
5 3 3
1 1
So, a minimum spanning tree has (N-1) edges where N is the number of
vertices in the given graph. There also can be many minimum spanning
trees.
Prim’s algorithm
Prim’s Algorithm, was developed in 1930 by Czech mathematician Vojtěch Jarník and later
rediscovered and republished by computer scientists Robert C. Prim in 1957 and Edsger W.
Dijkstra in 1959. Therefore, it is also sometimes called the Prim-Dijkstra algorithm.
It works with nodes rather than edges.
The algorithm works as follows:
• Maintain two disjoint sets of vertices. One containing vertices that are in the growing spanning
tree and other that are not in the growing spanning tree.
• Select the cheapest vertex that is connected to the growing spanning tree and is not in the
growing spanning tree and add it into the growing spanning tree. This can be done using Priority
Queues. Insert the vertices, that are connected to growing spanning tree, into the Priority Queue.
• Check for cycles. To do that, mark the nodes which have been already selected and insert only
those nodes in the Priority Queue that are not marked.
1. Robert, C. "Prim. 1957. Shortest connection networks and some generalizations." The Bell System Technical Journal 36.6 (1957).
Kruskal's algorithm
1. Kruskal, Joseph B. "On the shortest spanning subtree of a graph and the traveling salesman problem." Proceedings of the American Mathematical society 7.1 (1956): 48-
50.
Comparison
• Kruskal time complexity worst case is O(E
log E) this because we need to sort the
edges and the best time for Kruskal's is O(E
logV) if the edges are sorted in linear
time. We should use Kruskal when the graph
is sparse, i.e. small number of edges, when
the edges are already sorted or if we can
sort them in linear time.
• Prim time complexity worst case is O(E log
V) with priority queue or even better, O(E+V
log V) with Fibonacci Heap. We should use
Prim when the graph is dense, i.e. number of
edges is high.
https://stackoverflow.com/questions/1195872/kruskal-vs-prim
Data Science in Finance
USA
“Study of statistical correlations in intraday and daily financial return time series”
Gayatri Tilak, Tamas Szell, Remy Chicheportiche, Anirban Chakraborti
http://arxiv.org/pdf/1204.5103.pdf
Ultrametric distances?
d
t
ij 2(1 ) D , where 2 d 0
t
ij
t t
ij
N N
calculate correlations
Correlation 1 1 ijt N ( N 1)
1 2 M t 1
C , C , ..., C ij 2
matrix N N N N N N
transform to distances
Distance matrix D , D
1 2
, ..., D
M
2 d ijt 0 d ijt 1
N ( N 1)
N N N N N N 2
Energy
Yahoo
data
Asset tree: topology change
Normal market topology crash topology
topology
“Dynamic asset trees and Black Monday”, J.-P. Onnela, A. Chakraborti, K. Kaski and J. Kertesz, Physica A 324, 247 (2003)
Erdős & Rényi: Random graph model (1959)
The Erdős–Rényi model, is used for generating random graphs in which edges are set between
nodes with equal probabilities.
To generate an Erdős–Rényi model two parameters must be specified:
the number of nodes in the graph generated as N and
the probability that a link should be formed between any two nodes as p.
https://digiday.com/uk/facebooks-ad-network-extends-mobile-web/
Six degrees of separation
Six degrees of separation is the idea that all living things and everything
else in the world are six or fewer steps away from each other so that a chain
of "a friend of a friend" statements can be made to connect any two people
in a maximum of six steps.
The phrase "six degrees of separation" is often used as a synonym for the
idea of the "small world" phenomenon.
Computer networks: In 2001, Duncan Watts, a professor at Columbia University, attempted to recreate Milgram's
experiment on the Internet, using an e-mail message as the "package" that needed to be delivered, with 48,000
senders and 19 targets (in 157 countries). Watts found that the average (though not maximum) number of
intermediaries was around 6.
A 2007 study by Jure Leskovec and Eric Horvitz examined a data set of instant messages composed of 30 billion
conversations among 240 million people. They found the average path length among Microsoft Messenger users to
be 6.
Facebook: The average degrees of separation between different people is 5.73 degrees, whereas the maximum
degree of separation is 12.
Watts-Strogatz: Small-world model (1998)
The Watts and Strogatz model is a random graph generation model that produces graphs with small-world
properties.
Each node in the network is initially linked to its closest neighbors.
Each edge has a probability p that it will be rewired to the graph as a random edge.
The expected number of rewired links in the model is
Barabási–Albert:
Preferential attachment model (1999)
A "rich-get-richer" effect.
In this model, an edge is most likely to attach to nodes with higher
degrees.
Growth: The network begins with an initial network of m0 nodes. m0 ≥
2 and the degree of each node in the initial network should be at least
1, otherwise it will always remain disconnected from the rest of the
network.
Preferential attachment: New nodes are added to the network one at
a time. Each new node is connected to m existing nodes with a
probability that is proportional to the number of links that the existing
nodes already have. Formally, the probability pi that the new node is
connected to node i is:
The first stage starts with observation and experience and ends with beliefs about the
future performances of available securities. The second stage starts with the relevant
beliefs about future performances and ends with the choice of portfolio.
--Harry Markowitz
In portfolio optimization, our aim is to build feasible combinations of risk and return called the
efficient frontier, figured out in 1952 by Harry Markowitz, for which he was awarded the Nobel Prize
in 1990.
There is a way to estimate the tolerance for loss to imply the amount of collateral (risk-free asset) to
hold, in which the idea is to take a ruler and draw a line to the efficient frontier to discover the best
portfolio of exposures for a hypothetical working capital position: the one that maximizes the return
for the risk, the ratio that William Sharpe figured out in 1966.
https://bookdown.org/wfoote01/faur/portfolio-analytics.html
Portfolio optimization
Portfolio optimization
Portfolio optimization
Portfolio optimization
Portfolio optimization
Portfolio optimization
Portfolio optimization
Portfolio optimization
Portfolio optimization
Portfolio optimization
Portfolio optimization
Portfolio optimization
Portfolio optimization
Portfolio optimization
Portfolio optimization
Portfolio optimization
Portfolio optimization
Forbidden
region
Portfolio
Region with
Minimum risk
Studies & results
• Empirical data
75
“Study of statistical correlations in intraday and daily financial return time series”
Gayatri Tilak, Tamas Szell, Remy Chicheportiche, Anirban Chakraborti
http://arxiv.org/pdf/1204.5103.pdf
Studies & results
• Empirical data
76
Studies & results
• Empirical data
77
Pairs trade
78
Refer Wikipedia
Pairs trade
79
Refer Wikipedia
Studies & results
• Empirical data
80
Studies & results
Studies & results
Studies & results
AUS - Australia
Data BEL - Belgium
CAN - Canada
CHE - Switzerland
We have used the sectoral price indices from the Thomson Reuters Eikon database , within the DEU - Germany
time frames January 2008- December 2009, and October 2014- September 2016. We have DNK - Denmark
ESP - Spain
analyzed the data for a total of 65 sectors of 27 countries across the globe. FIN - Finland
FRA - France
GBR - United Kingdom
GRC - Greece
HKG - Hong Kong
IDN - Indonesia
IND - India
Abbreviations of the 65
JPN - Japan
sectors analyzed. LKA - Sri Lanka
MYS - Malaysia
NLD - the Netherlands
NOR - Norway
PHL - Philippines
PRT - Portugal
QAT - Qatar
SAU - Saudi Arabia
SWE - Sweden
THA - Thailand
USA - United States of
America
ZAF - South Africa,
https://customers.thomsonreuters.com/eikon/index.html
USA
Minimum spanning trees: 20 countries out of the 27 countries
Sectoral dynamics and core-periphery structure
Core Periphery
stable core-
periphery
structure with
no change
Sectoral dynamics and robustness
The bit-strings of sectoral centralities (EVC) and their corresponding inclusion in
the portfolio (PWT) for the different sectors of the USA
n (% of coefficient of variation) vs. D
Additional materials
Additional materials
• http://www.jnu.ac.in/faculty/anirban/index.html