Beruflich Dokumente
Kultur Dokumente
a r t i c l e i n f o a b s t r a c t
Article history: Collusive transactions refer to the activity whereby traders use carefully-designed trade to illegally ma-
Received 5 June 2016 nipulate the market. They do this by increasing specic trading volumes, thus creating a false impression
Revised 20 September 2016
that a market is more active than it actually is. The traders involved in the collusive transactions are
Accepted 23 October 2016
termed as collusive clique. The collusive clique and its activities can cause substantial damage to the
Available online 24 October 2016
markets integrity and attract much attention of the regulators around the world in recent years. Much
Keywords: of the current research focused on the detection based on a number of assumptions of how a normal
Collusive clique market behaves. There is, clearly, a lack of effective decision-support tools with which to identify poten-
Clustering tial collusive clique in a real-life setting. The study in this paper examined the structures of the traders
Knapsack problem in all transactions, and proposed two approaches to detect potential collusive clique with their activi-
Dynamic programming ties. The rst approach targeted on the overall collusive trend of the traders. This is particularly useful
when regulators seek a general overview of how traders gather together for their transactions. The sec-
ond approach accurately detected the parcel-passing style collusive transactions on the market through
analysing the relations of the traders and transacted volumes. The proposed two approaches, on one
hand, provided a complete cover for collusive transaction identications, which can full the different
types of requirements of the regulation, i.e. MiFID II, on the other hand, showed a novel application of
well-known computational algorithms on solving real and complex nancial problem. The proposed two
approaches are evaluated using real nancial data drawn from the NYSE and CME group. Experimental
results suggested that those approaches successfully identied all primary collusive clique scenarios in
all selected datasets and thus showed the effectiveness and stableness of the novel application.
2016 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.eswa.2016.10.051
0957-4174/ 2016 Elsevier Ltd. All rights reserved.
226 J. Zhai et al. / Expert Systems With Applications 69 (2017) 225238
this format usually involves a big clique of traders trading together together for transactions. This was especially useful when expe-
as a collusion to achieve certain effect on the price or volume of riencing high-frequency trading (HFT), where the characteristics
a target equity. The buying and selling activity that results from of whole transactions were important instead of every single one
this does not affect the ownership of that stock in the end, but among the huge number of orders and trades generated by HFT.
gives the fraudulent impression of intensive and active interest of In this approach, the distance between any two traders was de-
the stock to market participants (Cumming, Zhan, & Aitken, 2012). ned as the transacted volume: the larger the volume, the closer
In this format, the trading action of each single trader usually the distance. Through this, the tailor-made k-means clustering ap-
seems legitimate, however, the customised trade sequences by the proach can be easily applied with reduced time complexity since
collusive manipulators complies with their manipulative expecta- the calculation of distances among points in standard k-means
tions. If any single leg or part of such trades were to be moni- has been moved to the proposed transformation of the transaction
tored, it would not be identied as collusive trading. An obvious data. This showed an effective application of well-known data min-
characteristic of the collusive trading is the group of traders, who ing algorithm on an unsolved and challenging problem.
trade among themselves for misleading the market, and also trade The second proposed approach, on the other hand, was a
with others for legitimate transactions as well during the period ne identication, which was to accurately identify the passing
of the manipulation. Such mixed trading actions are another char- the parcel transactions that were intendedly created by collusive
acteristic of collusive trading. Because of their complex and ran- traders. This was particularly useful when the regulatory moni-
dom strategies, it is hard to give an accurate denition of collu- toring intends to nd out in detail which traders are involved in
sive trading. A widely accepted denition is: the collusive clique is collusive transactions and how they organize such activities. This
the group who heavily trade among themselves than others. The ne identication was formulated as a simplied Knapsack prob-
trading behaviour among the clique is collusive trading (Palshikar lem, a well-known combinatorial optimisation problem. In this ap-
& Apte, 2008; Palshikar & Bahulkar, 20 0 0). According to this def- proach, transaction records were considered as the knapsack and
inition, much of the existing related literature examined the col- the symbolic sum of traders in the transactions were considered as
lusive cliques on the basis of similarities in their activity (Wang, the weight of the knapsack while the number of the transactions
Zhou, & Guan, 2012). Their assumption was the traders in collusive shall be maximized. To solve this, a tailor-made unied dynamic
cliques acting identically. However, the assumption does not hold programming based algorithm (DP) was proposed. DP is usually
in practice when traders usually trade among themselves as well used in complex optimization problems. In our approach, DP was
as with others parties. Palshikar et al. investigated this topic thor- proposed as a recursive problem, where each possible solution on
oughly in Palshikar and Apte (2008) and Palshikar and Bahulkar current transaction can be optimized by selecting either the solu-
(20 0 0) through a k Nearest Neighbour based algorithm. They as- tion based on last transaction or the one only based on the cur-
sumed each of the traders in a clique trading with all other col- rent transaction. Through this, the ne identication problem was
lusive traders and their crossed transactions can be identied as split into several smaller problems. This idea is popular in com-
a feature of collusive clique. This assumption is not true in some plex optimization problems where the original problem is usually
cases in practice when the each collusive traders only trade with a split into a collection of simpler sub-problems. In addition, to make
small part of a big collusion. Such case was dened in CESR (2011) the problem more general, margins were included in the optimiza-
as passing the parcel, a manipulation format, where a big shares tion process to compensate the uncertainty occurred in real trading
of certain equity are transacted through multiple traders along a scenarios. Therefore, this approach also showed that the traditional
certain pathway as a parcel without changes of the actual own- optimization algorithm, which was usually applied in scheduling
ers. Therefore, the existing literatures were all based on assump- and traveling, can also be revised and applied to effectively solve
tions of how collusive traders behave. There is, clearly, a lack of ef- complex nancial problems.
fective literature that provides a thorough analysis of the inherent Therefore, two computational approaches, one for generally col-
features of collusive cliques and a practical identication require- lective trend, one for accurate activities, gave a complete cover for
ments from regulators. Hence this paper contributed to the liter- collusive transaction identications, which can full the different
ature by seeking to ll this gap and to suggest a novel approach types of requirements of the regulators. The recognition of the
to detecting a wide spectrum of collusive cliques and to providing collusion with heavily transactions or suspiciously parcel pass-
a decision-support tool for the regulatory departments. Specially, ing transactions led to a detection of potential collusive clique.
it built upon our previous work (Cao et al., 2015) and provided a The validity of those two proposed approaches have been assessed
thorough discussion and analysis of all appropriate scenarios, and through extensive experiments, carried out using real data from
identied and quantied the key features thus aroused. In light of the different nancial products, which showed effectiveness in col-
this, a clear formulation of the problem is given, along with ex- lusive clique and has yielded a considerable body of evidence in
planation of the relevance of conceptual models. As far as we are support of the proposed decision-support expert systems for the
aware, this paper presented the rst study that solved the collu- regulators.
sive clique detection problem in a practical way through two pro- The remainder of this paper takes the following form. Section
posed approaches based on popular data mining algorithms, which 2 gives a brief overview of wash trade manipulation and detection.
also shows an effective application: a challenging problem in real Section 3 contains analysis, formulation and description of wash
nancial area, can be extracted and formulated to a standard for- trade features, and of the proposed approach to detection. Section
mat and then solved by applying tailor-made popular data mining 4 gives an evaluation of the performance of the proposed approach
algorithms. and Section 5 gives a conclusion.
The rst proposed approach was to generally identify the collu-
sive trend of all traders in collected trading records. We termed 2. Collusive clique and its detection
it as: coarse identication of the collusive clique. This approach
was to group all traders into a number of clusters, where the 2.1. Trading in collusive clique
traders tend to trade heavier with traders within the cluster than
the ones outside. We proposed a transformation of the transac- Traders in capital markets use limit or market orders to buy
tion data together with k-means clustering algorithm to group the or sell volumes of an equity, at a specic or better price (a bet-
traders according to their trading behaviours. The result of this ap- ter price, in this context, being a higher selling or lower buying
proach aimed to show a general tendency of how traders gather price) (SEC, 2011). When orders are linked under order-matching
J. Zhai et al. / Expert Systems With Applications 69 (2017) 225238 227
Fig. 4. Transformed value of the aggregated volumes in Table 2. A-E represent the
traders.
Fig. 3. The directed graph illustration of net transacted volumes in Table 1. where Vi,aj is dened as the aggregated transactions between
a
trader i and j, Vi, j is the total transacted volumes between i and
j
actions #1, 3 and 5 is A selling B 500 shares through 3 times trans- Va
all other traders, and i, j a shows the ratio of volumes between i
actions. Thus, if we calculate the aggregated transaction of #16, j Vi, j
we can have: and j to total transacted volumes of trader i. In this function, we
choose W = 5 as a scaling weight to transform the volume ratio
6
T rani = {A, B, , , 500} + {B, C, , , 550} (ranged from 0 to 1) to its exponential value (ranged from 1 to 0)
i=1 so that the higher the ratio is the smaller the transformed value
is. By applying this function to the transactions in Table 2, we
+ {C, A, , , 550} = 0 , , , {500, 550, 550}
have the results in Table 3 and Fig. 4. Considering each trader as a
The trader ID is summed up as 0 by symbolic calculation and node as the illustration in Fig. 2 and the values in Table 3 as the
the three transaction volumes are recorded as a list {500, 550, distance among the nodes, the heaviness among the trader has
550}. The zero-valued symbolic sum of trader ID, which suggests then transformed to the distance between the traders. That is, if
that each trader of a group is acting at both the buying and selling traders are very close in distance, they are very likely to be in a
ends of the market, together with the similar value of three trans- cluster. This is basically identical to the traditional clustering prob-
acted shares show that after a sequence of similar sized transac- lem in data mining area. Therefore, we can summarize the coarse
tions, almost no genuine equity shares change hands. Thus the cal- identication of the collusive clique as: given N traders with their
culated results strongly suggest that traders A, B, and C trade sus- large numbers of trading records, we aim to partition the N traders
piciously as passing the parcel. We can also illustrate the aggre- into k clusters, so as to minimize the within-cluster sum of the
gated the vector forms of transactions in Table 1 in directed graph distance of each trader in the cluster to the k centre traders. By
as Fig. 3. The parcel-passing transactions are highlighted. this, the coarse identication has been formulated as a traditional
k-means clustering problem.
On May 31, 2013, large numbers of collusive trading activities
3.2. Coarse identication of the collusive clique
from HFT traders were reported by NANEX (NANEX, Chicago PMI,
2013). Around 550,0 0 0 shares of SPY shares trade were gener-
As discussed in Section 2, nancial market can be overloaded
ated from a big number of trading accounts within less than 1 s.
by the high-frequency trading (HFT) activities. The HFT traders
The regulator noticed this and investigated the HFT traders collu-
strategically use millions numbers of orders and transactions to
sive behaviours. We applied our proposed coarse identication ap-
overload, delay and disrupt the markets and other participants.
proach on this example and identied the same collusive clusters
In such cases, analysing every single trade is neither ecient nor
of HFT traders as the regulators and NANEX reported. The clusters
contributing to a systematic understanding of the collusive groups
of traders were illustrated as Fig. 5. The dots on the gure rep-
of HFT traders. Instead, aggregating the corresponding transactions
resent the traders involved in those transactions. Since we do not
and identifying their trend of colluding parties and trading actions
use any x-y axis to represent a trader, the representation on Fig
are important to the regulator to prohibit such disruptive activities
5 is merely to show the relative distances among the traders rather
(CME, 2014). We dene the term for such identication as coarse
than the exact locations of them. We tried k = 2 and 3 in coarse
identication of the collusive clique (short as: Coarse Identica-
identication approach and both congurations identied the col-
tion).
lusive cluster of traders, which are shown as the blue dots in Fig.
In coarse identication, we following the denition of heav-
5(a) and (b).
iness of the trading between two traders in Palshikar and Apte
(2008) and Wang et al. (2012) as the total aggregated absolute
3.3. Fine identication of the collusive transactions
transacted volumes between those traders. This has been dened
as the aggregation of transactions in the scalar form in Section 3.1.
3.3.1. Formulation of the problem
As an example, we aggregate the transactions in Table 1 and show
The second proposed approach we proposed, on the other hand,
the results in Table 2. From the results and the corresponding Fig.
is a ne identication, which is to accurately identify the collu-
2 in Section 3.1, we can observe that the heavier the transactions
sive transactions such as the format of passing the parcel that
between two traders, the more likely they are collusive. To further
are intended created by a group of traders. The FCA and CESR have
formulate the question, we dene a simple transformation function
both stated, in consultation reports (CESR, 2011; FSA, 2006), that
as,
accurately identifying collusive transactions is a challenge, since
W
Va
i, j
a
the form that such trading takes may vary, and collusive actions
f Vi,aj =e j Vi, j
can easily be hidden amidst the huge volumes of normal trad-
230 J. Zhai et al. / Expert Systems With Applications 69 (2017) 225238
Fig. 5. clusters identied by coarse identication approach in the HFT examples on May 31, 2013 reported by NANEX, (a) the identied clusters if targeting on 2 collusive
groups; (b) the identied clusters if targeting on 3 collusive groups. The left-bottom blue dots represent the collusive HFT traders in the examples.
ing activities. This is further illustrated in report by Nanex, dated imate transactions with the collusive ones. Thus, not every trader
31 May 2013 (NANEX, Chicago PMI, 2013), which shows complex maintain zero position changes in those transactions. If represent-
networks and illustrates these using vertices to represent traders ing those transaction in the vector format and aggregating those
and directional connections between them showing the transac- traders, we might not achieve a 0 result, which means, additional
tions they share. In this paper, we use a directed graph (as Fig. 3) to a selling and buying pairs, some trader ID may also be involved
with net transacted volumes to represent the potentially collusive in intended legitimate transactions. Therefore, we consider a mar-
transactions. In the directed graph, nodes are used to represent gins of trader (id) when identifying the traders collusions. That
traders, while the short arrows attached to each node represent is, if the number of trader IDs in the aggregated result of all signed
the net transacted volumes from one trader to another. The arrow trader ID is within the margin id, we consider this as suspicious
direction shows the net transaction ow. As the example shown and carry out the further investigations.
in Fig. 3, the highlighted arrows connect a small group of traders, Feature 2: Mostly identical net transacted shares. Similar to the
A, B and C, and the direction of the arrowhead shows the direc- feature 1, traders in the collusion tend to maintain their positions
tion of the transacted shares, for example A pointing to B means unchanged by both selling and buying identical shares through dif-
that trader A sells shares to trader B and thus transfers the eq- ferent transactions. However, they might also transact slightly dif-
uity shares. Therefore, in Fig. 3, traders A, B and C collude in a ferent to avoid to be recognized as manipulation. To compensate
sequence of parcel-passing transactions as a closed cycle. Transac- this, we also consider a margin of shares (v) when monitoring
tions among them ow in a single direction, either clockwise or the transacted volumes. When the shares in the transactions are
anti-clockwise in a trading form of passing the parcel (Aitken, all within the margin v, we consider it as a potentially collusive
Harris, & Ji, 2009). In this context, when a transaction loop is com- transaction.
pleted, there has been no transfer of benecial interest across the Thus, we propose a three-step approach for identifying the
group, and no one of the collusive traders is in a position any dif- parcel-passing type collusive transactions, namely:
ferent to that they were in when the process began. In the example
of the aggregation of the transaction of #16 discussed in Section Step 1: aggregate the transactions with same seller and buy-
3.1, the zero-valued symbolic sum of trader ID (0) and the similar- ers in the vector format dened in Section 3.1. By this,
valued transacted shares ({500, 550, 550}) indicates the involved we can obtain the net transactions shares with directions
traders passing similar-sized parcels through themselves with al- between any two traders. After this step, all transactions
most no genuine equity shares change hands. However, in practice, are represented as: T rani = {Seller_ID, Buyer_ID, Time,
smart manipulators intend to mix the collusive trades with other Price, Volume}
trades to make the sum of trader ID non zero and the transacted Step 2: identify a group of transactions T rani , where the
share list not similar-valued. As a result, such mixed transactions number of remained trader IDs after further aggregating all
are easily avoid alerting traditional regulatory inspections (Aitken signed trader ID in this group is within the margin id. Af-
et al., 2009). ter this step, the transaction volumes are recorded as a list
To compensate the smart manipulators strategy and effec- in the aggregated result of this group.
tively identify the potential collusive transaction, we propose a Step 3: identify the deviation degree of the transaction volume
straight forward approach with margins of trader (id) and trans- list. If the volumes are within the margin of shares (v) (not
acted shares (v). The traders in collusive transactions may not signicantly deviated), we consider this as an identication
100% pass the parcel one by one to construct as a loop. The pass- result: a group of traders transacted collusively in a parcel-
ing can have a complex format, where each trader may or may passing style.
not complete a buy together with a sell transaction. The transacted
shares may not be similar as well. As a result, the aggregation of 3.3.2. Fine identication approach
such collusive transactions may have non-zero sum value of trader The ne identication approach proposed in this paper is ap-
ID and non-similar transacted shares. Based on our analysis, we plied to transaction streams in the market. This is to uncover any
can reveal two features of the general strategy used in collusive potential trend of collusive trade in early stage, thus fullling the
transactions, which are: requirements of the latest regulations. An occurrence of a trans-
Feature 1: Mostly matched traders. To manipulate the market action updates the transaction stream. A transaction, as shown in
trading volume, the traders transact among themselves for creating Table 1, comprises a transaction ID, seller and buyer ID, time, price
the false appearance only. No one in the collusion like to sustain and volume. For the purposes of this study, we assume that trans-
any loss from this. As a result, most of the traders carry out both actions in the stream focus on specic, single, nancial product
selling and buying identical shares through different transactions and so any product information within the stream can be ignored
to maintain their position unchanged. However, to bypass the reg- once that specic nancial product is determined. While on one
ulatory monitoring, the collusion might intend to mix some legit- hand this assumption narrows the scope of the study, it has the
J. Zhai et al. / Expert Systems With Applications 69 (2017) 225238 231
virtue of conforming to the practicalities of a trading environment, der N transactions, and id, and to denote iQ a T rani T as TQ a
and thus the algorithm can easily be applied to selected product in and the nal subset of transactions in the optimum solution to
a genuine trading platform context. As discussed in Section 3.3.1, the original problem as SN . We dene OPT_ID(N, TQ a , id ) to
the approach is composed of three steps. In order for these steps represent the sum of the trader IDs of the rst N transactions
OPT_ID(N, T a ,id )
to be taken, the transaction stream must be pre-organised. A phys- in subset S, under constraint | TS
Q
| id. There-
ical time sliding window, of size T , is specied and the transac- fore the trader sum of the rst N 1, N 2, . . . , 1 orders can
tions with same seller and buyers are aggregated in the vector for- be shown as OPT_ID(N 1, TQ a , id), OPT_ID(N 2, TQ a , id),,
mat dened in Section 3.1. After the aggregation, the transaction OPT_ID(1, TQ a , id). To determine OPT_ID(N, TQ a , id ), not only
are in a queue, Qa which also maintains a length of T . Thus, if a
, is the solution of OPT_ID(N 1, TS , id) required, but also
new transaction T ran i is coming and the length of the updated OPT_ID(N 1, TQ a TN , id), the best solution for the rst N 1
queue exceeds T , the earliest transaction should be popped in or- orders with the remaining trader sum TQ a TN , which constructs
der to maintain the length of the sliding window so that the cor- OPT_ID(N1,TQ a TN ,id)
responding transactions are re-aggregated again. Since the trans- the constraint as | (TQ a TN ) | id. The recursion can
action stream is measured in terms of event time, calculation of then be summarised in the following steps, if TranN is not one
the difference between the time stamps of the rst and last trans- of the transactions in the nal subset SN , then the transaction
actions in the queue is used to maintain T . Thus, the number of N can be ignored and OPT_ID(N 1, TQ a , id) determined, how-
transactions in each queue depends upon the underlying frequency ever, if TranN is among the transactions, an optimal solution for
of transactions, and so uctuates over time. the remaining transactions 1, . . . , N 1, must be sought, which is
After the step
1
aggregation, the step 2 is to nd out a group OPT_ID(N 1, TQ a TN , id). Using this set of sub-problems, the
of transactions T rani , where the sum of the signed trader IDs in OPT_ID(N, TQ a , id ) can be expressed simply, in terms of values
the group is within the margin id. Considering the sliding win- from the smaller problems, and so the recursion is summarised
dow T , the step 2operates
on the basis that, among all aggre- as two conditions:
gated transactions T ran i in T , nd transaction sub-group where
1. If Trani SN , then OPT_ID(N, TQ a , id )= OPT(N 1, TQ a , id);
the trader ID can be calculated and cancelled so that the re-
2. If Trani SN , then OPT_ID(N, TQ a , id )=TN + OPT(N 1, TQ a
mained ID number is within the margin id. We dene the step
TN ,
2 as: ID_MATCH(Q a ), where Qa is the set of aggregated trans-
id ).
actions in vector format. Based on our discussion, the function
ID_MATCH(Q a ) is thus dened as: given a batch of aggregated Algorithm 1 is generated by a re-organisation of this recursive
transactions
in Qa , identify subsets S of transactions from Qa so process, based on the two conditions above and can be applied by
iS T rani T
that | | id, where operator T represents the invoking OPT_ID(N, TQ a , id ) for N transactions and the capacity
iS T rani T
number of the traders in the transaction so that T rani T repre- TQ a .
iS The Algorithm 1 gives, as has been shown, the SN transactions
sents the trader number after summing up all signed traders and from Qa . The traders in SN are potentially involved in collusive
iS T rani T is the number of traders in the group Q . There-
a
transactions. To further investigate, we need to identify the devi-
fore, if group Qa contains 100 traders and we choose id = 10%, ation degree of the transaction volume list discussed in step 3 in
it means that once the trader number in sub-group S equal or Section 3.3.1. If the volumes in the transactions in SN are all sim-
smaller than 10, we need to further investigate the transactions in ilar, it indicates that the involved traders trade as passing a simi-
S. lar parcel along a certain loop within themselves. As discussed in
If the number of transactions contained within subset S is Section 3.3.1, the identication is also tolerant of intended mixed
given as ns (ns is equal or smaller in size than Qa ), fundamen- transactions with different volumes. Therefore we use a straight
tally, function ID_MATCH(Q a ) can be considered as a simplied forward approach for examining the transacted volume list. Once
version of the Knapsack Problem (Andonov, Poirriez, & Rajopadhye, we obtain SN , the transacted volume list can be represented as
20 0 0; Poirriez, Yanev, & Andonov, 2009; Zukerman, Jia, Neame, & {V1 , V2 , . . . VN }. We rst sort the list and ignore the top and bot-
Woeginger, 2001). The Knapsack Problem refers to the challenge of tom v1 extrema values respectively, where v1 represent the
lling a knapsack, which has capacity W, using a subset of m items percentage, i.e. 5%. We calculate the standard deviation Vstd of the
{1, . . . , m}, each of which has both mass and value, in such a way remained volumes and compare with the volume margin v2 . If
that the total weight of the combined selected items is the same Vstd < v2 , we consider each transaction in SN having similar vol-
or less than W, and the highest possible total value is achieved. umes and thus the traders in SN are involved in parcel-passing
The trader ID matching problem is a simpler version and can be type collusive transactions. The identication approach is described
stated as: given a margin id (which is equivalent to the knap- in Algorithm 2.
sacks size) and a set of items Qa , each being of (non-negative) a
pair of signed trader ID, identify all possible subsets S of items, 4. Experimental evaluations
ultimately to make:
iS T rani T The evaluation of any detection model generally involves the
iQ a T rani T id. use of genuine data from both genuine and abusive trading cases.
However, because there is little tangible data arising from collusive
Given the similarity of those two problems, the approach of- trading, and in any case regulations forbid the disclosure of fraud-
ten used to solve the Knapsack Problem, i.e. dynamic program- ulent market data, there is an extremely limited amount of gen-
ming, is used in ID_MATCH(Q a ). Dynamic programming requires uine collusive trading data available, and certainly vastly less data
a number of sub-problems, so that the solution to each can be than is available from records of routine trading. Thus, in order to
found using smaller sub-problems, and thus the original prob- evaluate the proposed detection model in a manner acceptable to
lem can be solved easily once all sub-problems have been re- the nancial industry, in this study the characteristics and patterns
solved (Kleinberg & Tardos, 2005). Dynamic programming has found within collusive trading cycles are reproduced and intro-
been thoroughly studied in optimisation problems in Jiang and duced into original trading records, in order to generate a dataset
Jiang (2014) and Ni, He, Wen, and Xu (2013). Thus, the inten- that includes both normal and abusive trading cases (NANEX, Ex-
tion is to solve this simplied form of the Knapsack Problem un- ploratory Trading in the eMini, 2013). An advantage of this is that
232 J. Zhai et al. / Expert Systems With Applications 69 (2017) 225238
6 return;
TQ a TN
7 if | TQ a
| id // if condition is satised, then transactionsIn SN is one solution;
8 Output SN ;
9 push TranN into SN // assume TranN SN ;
10 OPT_ID(N 1, TQ a TN , id); // recursively nd solution by condition 2;
11 Discard TranN from SN // assume TranN SN ;
12 OPT_ID(N 1, TQ a , id); // recursively nd solution by condition 1;
13 end of OPT_ID
14 return SN ;
such randomly synthesised manipulation cases can mimic any ver- tially placed orders being completely lled. In some cases, the ini-
sion of collusive trading; the collusive transactions can be gener- tially placed order (for example, a sell order with size 100) cannot
ated at any time, with any volume sizes and margins. Furthermore, be 100% lled (for example, 80 shares of the sell order can be lled
the use of synthetic nancial data is already accepted in academia, by a later matched buy order). In that case, the Transaction Quan-
for the evaluation of a proposed model, in the absence or dearth
of real data (Franke, Hoser, & Schrder, 2007; Ho & Zhou, 2008;
Palshikar & Apte, 2008). Table 1
Transaction sequences.
For this study, the experimental evaluation occurs in two parts:
part 1: experimental evaluation, using genuine trading datasets;
part 2: experimental evaluation using genuine trading datasets
with the addition of synthetically-generated wash trade scenarios,
as per the analysis given in Section 2.1.
This evaluation uses genuine market data (in the form of trans-
action information) concerning four nancial products, namely S&P
500 ETF (SPY), S&P 500 Index Future (Emini), 10-Year T-Note Fu-
tures (ZNM13), Eurodollar futures (GE13) from NYSE and CME
group. Those products have been chosen because they were all re-
ported to be targeted in manipulative collusive trading events in
2013. We obtain all collusive transaction data as well as the legit- Table 2
imate trading records in the same time period from our industry Aggregated volumes of scalar form transactions in Table 1.
tity Filled will be smaller than the Order Total Quantity. Limit
20,130,503
20,130,503
Price and Filled Price items are the prices of initially placed or-
Date time
Execution
09:35:56
09:36:12
ders and the transaction. Settlement Currency item indicates the
transaction currency. The Number of Fills indicates the number
Last
of orders that were lled with the initially placed order. For ex-
ample, if a sell order with size 10 0 0 shares was initially placed in
20,130,503
20,130,503
Date Time
09:34:37
09:35:01
Entered
the market and two later orders with size 500 shares were then
lled with the sell order, the Number of Fills is equal to two. The
last two items, Entered Date Time and Last Execution Date Time
Number of
show the date and time of initially placed order and the transac-
tion respectively. In our study, since the transaction size, prices and
Fills
1
recognition, the items, Counterparty Code, Transaction Quantity
Settlement
USD
cation, are all applied to all of four datasets to identify rstly the
trend of traders clusters and secondly any collusive transactions in
Filled Price
163
165
1270
0 0 0 05497636ORLO0
TN TP
to evaluate the test results, where T NR = T N+ F P and T P R = F N+T P .
The true positive, TP, is dened as legitimate cases detected as be-
ing legitimate and false negative, FN, is dened as normal cases
detected as being collusive transactions. In those experiments, we
Order ID
Client 7
Client 2
which, the reported 40 trader accounts are the most active ones
NASDAQ
NASDAQ
MAIN
SPDR S&P
500 ETF
500 ETF
Trust
Code
SPY
SPY
Fig. 6. Test results on original dataset, (a) S&P 500 Index Future (Emini); (b) 10-Year T-Note Futures (ZNM13); (c) Eurodollar futures (GE13).
Fig. 7. Experimental results across datasets of SPY, Emini, ZNM13, and GE13 with injected synthetic collusive transactions.
mixed trading records requires a exible approach that can be tol- the collusive orders among the clique are unintentionally picked
erant to revised format of the manipulative activity. up by other traders or the case that the traders inside the clique
A relatively large trader ID margin implies that the collu- intentionally trade with others outside since the collusive trader
sive clique with heavily trading among themselves can include a can also trade normally to cover their collusive activities. How-
marginal percentage number of traders who might not trade with ever, a very large margin value, for example 20% in the exam-
the clique as heavily as others but are unconsciously involved in ple in Fig. 7, achieved a relative lower TNR value. That indicates
some trading activities with the clique. This might be the case that that including a large percentage number of traders who dont
236 J. Zhai et al. / Expert Systems With Applications 69 (2017) 225238
Fig. 8. (a) false alarm under different trader margins for four datasets, SPY, EMINI, ZNM13, and GE13. (b) false alarm reported in Table 6 in Palshikar and Apte (2008).
trade heavily when recognizing the collusive clique might heav- Table 6
Physical Test Time for four datasets.
ily bring irrelevant traders so that immediately reducing the per-
formance of recognising the transaction loop in Algorithms 1 and SPY EMINI ZNM13 GE13
2. Consequently, either a small or a large trader ID margin does
Test Time (seconds) 72.93 145.85 111.46 110.42
not bring the best recognition accuracy. Selecting the margin is a Number of Traders 1349 2396 3694 2986
trade-off between compensating the smart-trader tactic and ac-
curately recognising the collusive trader loop. To achieve a better
performance, selecting and tuning an appropriate value of trader
ID margin based on historical data is crucial for each single secu- clique. In Fig. 8, we followed Palshikar and Apte (2008)s mea-
rity. In another word, the ability of the proposed method to con- sure, reported our false alarms, and compared with their results.
gure the margin makes it particularly practical in a genuine trad- Fig. 8 clearly shows that the highest false alarm in our approach
ing environment, where each nancial security requires a tailor- is around 0.4 at 20% trader margin and is slightly higher than the
made congurations according to different market features. After lowest false alarms in Palshikar and Apte (2008) algorithm, 0.3. In
tuning the margin based on historical data, as the experimental average, the false alarm of our approach is signicantly lower than
results, our approach can effectively and stably identify the col- the algorithm in Palshikar and Apte (2008). Under the best con-
lusive actions under the optimized conguration, i.e. 15% trader ID guration of 15% trader margin, our approach achieved the 100%
margin. Overall, the experimental results suggest that the proposed accuracy at a low cost of 0.1 false alarm when detecting in real
approach uncovers primary collusive transaction scenarios very ef- market data where the collusion included hundreds of traders. The
fectively and consistently across all selected data. algorithm in Palshikar and Apte (2008) can also achieve the 100%
accuracy but with a much higher cost of 3.6 false alarm when
4.3.3. Theoretical comparison detecting 10-trader collusion. Therefore, our approach signicantly
As discussed in Section 2.1, collusive party detection problem outperformed Palshikar and Apte (2008)s algorithm on the false
was also thoroughly studied in Palshikar and Apte (2008). In their alarm.
study, each trader in a clique was assumed to trade with all other
collusive traders and their crossed transactions were identied as 4.3.3.2. Complexity. In Palshikar and Apte (2008), the algorithm
a feature of collusive clique. This assumption is not true in most complexity was reported in two measures. In big O measure, the
of cases in practice when the each trader only trade with parts time complexity of the algorithm is O(kn3 ), where k is the num-
of a big collusion. Our approach is not based on assumptions but ber of nearest neighbours and n is the number of traders, while
only consider different margin values. In the experimental evalu- the space complexity is O(n3 ), and in physical time in real tests,
ation section of Palshikar and Apte (2008), only the detected col- the algorithm reported 10 0 0 seconds for detection collusions in a
lusive clique number (true negative) and the normal trading activ- dataset with 10 0 0 traders. To compare with it, we reported our ap-
ities that were identied as collusive clique (false negative) were proach under the same measures. As the discussion in Algorithm 2,
reported as the results. In addition, their experiments were all on our approach is a recursive algorithm, which iterates each transac-
synthetically generated data while our experiments were all on tion for only once, therefore, the time complexity of our approach
real market data. Although there are a number of differences be- is O(T), where the T is the total number of transactions. Since the
tween their study and ours, we still use their result as the bench recursive algorithm requires to temporarily save each iteration in
mark for theoretical comparison as it was the only previous study the memory, the space complexity of our approach is O(T!). The
on the identical problem. physical testing time across four different datasets are shown in
Table 6. It is very clear that, although the two approaches are
4.3.3.1. Accuracy and false alarm. As reported in Table 5 in Sec- not comparable under big O measure, our approach is signicantly
tion 6.3 in Palshikar and Apte (2008), collusion-clustering algo- faster than the algorithm in Palshikar and Apte (2008) even when
rithm achieved 100% accuracy on detecting the collusion set un- applied on big datasets (more than a million transaction records
der the best conguration. In our experiments, Fig. 7 also re- across thousands of traders) and the performance is stable across
ported 100% true negative rates on 15% margin across four dif- four different datasets. However, according to the big O measure,
ferent datasets, indicating that all collusive cliques were detected our approach requires more memory in operation.
completely and stably in all experiments. In Palshikar and Apte Although our approach achieved a faster performance, we
(2008), the false alarm was used as the error measure of the still believe that this approach is more appropriate for overnight
normal trading activities identied as collusive clique. In Table 6 screening in real nancial environments. On one hand, the HFT
in Palshikar and Apte (2008), collusion-clustering algorithm re- traders can ood the trading system by placing huge amounts of
ported 0.3 false alarm as the best case for detecting a single case orders with later cancellation in daily trading. Its extremely chal-
of 3-trader collusion. When the collusion contained ten traders, lenging for any regulators to track the HFT in real-time due to
the false alarm increased to 3.6 for detecting a case of collusive their unbeatable speed achieved by super high performance com-
J. Zhai et al. / Expert Systems With Applications 69 (2017) 225238 237
puting facilities. On the other hand, regulatory inspection usually our proposed clustering approach to monitor the traders trend in a
takes months for determining a manipulation case. The real-time pseudo real time. The ne detection can also be operated in every
monitoring would not shorten the process but brings huge cost on 2 to 3 hours in real trading environment. Based on our experimen-
computing facilities. Consequently, compared with the algorithm in tal results, screening the 3 hours transaction data merely requires
Palshikar and Apte (2008), our proposed approach can achieve high hundreds of seconds, which might not be dicult to be imple-
detection accuracy and low cost of false positive with faster detec- mented in pseudo real time as the future work. In addition, as the
tion time. But our approach requires more computer memory in transaction records and volumes in one single day increase rapidly,
operation due to high space complexity. When applied as an over- we might require a pre-screening method to remove the transac-
night screening tool, this drawback of our approach, we believe, tion records that are obviously not involved in collusive trading, for
can be compensated in real application due to the low memory example transactions with unusually tiny or huge volumes. How-
costs. ever, such pre-screening method requires more empirical studies of
the nancial market in the future. In the last, the collusive transac-
5. Conclusion tion format might be changing fast. To maintain the effectiveness
of our approach, adaptive adjusting our approach according to the
This paper proposes a novel approach for the detection of col- collusive format is also a crucial future strand.
lusive clique in nancial markets. This method is suggested in light
of careful and thorough study of various behaviours and scenarios Acknowledgement
in collusive clique. Analysis of the collusive activities undertaken
is given in the form of directed graphs of traders, with transac- This research is partially supported by the National Education
tions shown by directed connections between vertices. This illus- Department of China, and the Fujian Province Nature and Science
trates the basic structure of collusion, which follows a closed cycle Foundation, P.R. China, project No. 2015J01236.
of transactions between given traders. Further work shows that in
collusive clique, transactions are usually executed within a collu- References
sive traders with similar volumes. In light of this understanding,
this paper propose two identication approaches for detecting col- Aitken, M., Harris, F. R., & Ji, S. (2009). Trade-based manipulation and market e-
lusive trends of the traders in a general format and the parcel- ciency: A cross-market comparison. 22nd Australasian Finance and Banking Con-
ference.
passing collusive transactions among the traders. To the best of our Andonov, R., Poirriez, V., & Rajopadhye, S. (20 0 0). Unbounded knapsack prob-
knowledge, the proposed method offers two key contributions to lem: Dynamic programming revisited. European Journal of Operational Research,
the expert system as well as the nancial surveillance areas. 123(2), 394407.
Bowen, C. (2013, July 9). Market regulation advisory notice. Chicago Mercan-
1. In the expert system area, well-known k-means clustering and tile Exchange Group Retrieved from http://www.cftc.gov/stellent/groups/public/
@rulesandproducts/documents/ifdocs/rul070913cmecbotnymexcomandkc1.pdf.
unied dynamic programming algorithms are revised and ef- Cao, Y., Li, Y., Coleman, S., Belatreche, A., & McGinnity, M. (2015, February). Adaptive
fectively applied to collusive clique identication problem. Its hidden Markov model with anomaly states for price manipulation detection.
the rst time those two algorithms are thoroughly analysed and IEEE Transactions on Neural Networks and Learning Systems, 26(2), 318330.
Cao, Y., Li, Y., Coleman, S., Belatreche, A., & McGinnity, M. (2014). Detecting price
tailor-made to specically apply in trading activity monitoring manipulation in the nancial market. In International Conference on Computa-
problem. The effectiveness of the application outperformed the tional Intelligence for Financial Engineering & Economics (CIFEr) (pp. 7784).
previous studies in this area. Therefore this paper, on one hand, Cao, Y., Li, Y., Coleman, S., Belatreche, A., & McGinnity, M. (2013). A hidden Markov
model with abnormal states for detecting stock price manipulation. In 2013 IEEE
contributes to the expert system literatures on computational International Conference on Systems, Man, and Cybernetics (SMC) (pp. 30143019).
algorithm application in nancial area, on the other hand, pro- Cao, Y., Li, Y., Coleman, S., Belatreche, A., & McGinnity, M. (2015). Detecting wash
vides an inspiration that even a widely-used computational al- trade in nancial market using digraphs and dynamic programming. IEEE Trans-
actions on Neural Networks and Learning Systems available online.
gorithm can be revised and applied to effectively solve a real
CESR. (2011). Market abuse directive. Paris: The Committee of European Securities
and complex problem. Regulators.
2. In the nancial surveillance area, it is the rst time the collu- CME. (2014, August). Retrieved from U.S. Commodity Futures Trading Commision:
http://www.cftc.gov/lings/orgrules/rule082814cmedcm001.pdf.
sive clique detection problem is thoroughly analysed, extracted
Cumming, D. J., Zhan, F., & Aitken, M. J. (2012, September 12). High frequency trading
and formulated in a practical way and two approaches are in- and End-of-Day manipulation. Social Science Research Network.
ventively proposed to coarsely and nely solve this problem re- Franke, Hoser, & Schrder (2008). On the analysis of irregular stock market trad-
spectively. Through this, this paper provides effective tools for ing behavior. In Data Analysis, machine learning and applications (pp. 355362).
Berlin Heidelberg: Springer-Verlag.
regulators according to their different detection requirements Franke, M., Hoser, B., & Schrder, J. (2007). On the analysis of irregular stock market
and also lls the gap in the literature by suggesting a novel ap- trading behavior. Data analysis, machine learning and applications. Freiburg.
proach to detecting a wide spectrum of collusive cliques. Franklin, & Douglas (1992). Stock price manipulation. The Review of Financial Studies,
5(3), 503529.
Rather than restrict itself to detecting buy/sell orders of com- Franklin, Lubomir, & Mei (2006). Large investors, price manipulation, and limits to
arbitrage: An anatomy of market corners. Review of Finance, 10(4), 645693.
parable prices, as in the engine level detection mechanism for FSA. (2006, March). The Code of Market Conduct Retrieved from http://www.fsa.gov.
CME, this proposed method allows users to identify collusive clique uk/pubs/hb-releases/rel52/rel52mar.pdf.
and their activities on a wider scale, taking into account any sus- Hautsch, N., & Huang, R. (2012). The market impact of a limit order. Journal of Eco-
nomic Dynamics and Control, 36(4), 501522.
piciously transactions and collusive groups, and to assess these in Ho, T. B., & Zhou, Z. H. (2008). Domain-driven local exceptional pattern mining for
light of trading activities within a dened and sustained time pe- detecting stock price manipulation. PRICAI 2008: Trends in Articial Intelligence.
riod, in real time. Therefore, the authors believe that this approach ITG. (2010). Global trading cost review. Investment Technology Group.
Jamal, N. (2012, December). LSE broker ned for wash trade Retrieved from http:
is appropriate for overnight use in real nancial environments and
//dawn.com/news/771335/lse- broker- ned- for- wash- trade.
can be a powerful decision-support tool for the regulatory author- Jiang, Y., & Jiang, Z.-P. (2014). Robust adaptive dynamic programming and feed-
ities. back stabilization of nonlinear systems. IEEE Transactions on Neural Networks
and Learning Systems, 25(5), 882893.
However, our approach is still far from perfection. The fre-
Kleinberg, J., & Tardos, E. (2005). Algorithm design. Addison-Wesley.
quency of trading is increasing rapidly, which poses a challenge Loh, T., & Cumming, G. (2012). Market manipulation: Safe harbour for wash trades and
to detection mechanisms in terms of eciency. Real-time surveil- matched orders upheld. Hong Kong: Hong Kong Securities and Futures Ordinance.
lance of the collusive clique is one of the future strand and might NANEX (2013, May 31). Chicago PMI Retrieved from http://www.nanex.net/aqck2/
4304.html.
also be implemented in two ways: coarse and ne detection. For NANEX (2013, July 10). Exploratory Trading in the eMini. (NANEX) Retrieved from
coarse detection, a sliding window mechanism can be applied with http://www.nanex.net/aqck2/4136.html.
238 J. Zhai et al. / Expert Systems With Applications 69 (2017) 225238
Ni, Z., He, H., Wen, J., & Xu, X. (2013). Goal representation heuristic dynamic pro- SEC (2011,). US securities & exchange commision Retrieved from http://www.sec.gov/
gramming on maze navigation. IEEE Transactions on Neural Networks and Learn- answers/limit.htm.
ing Systems, 24(12), 20382050. Tsang, E., Olsen, R., & Masry, S. (2013). A formalization of double auction market
Palshikar, & Bahulkar (20 0 0). Fuzzy temporal patterns for analysing stock market dynamics. Quantitative Finance, 13(7), 981988.
databases. In Proceedings of the international conference on advances in data man- Wang, J., Zhou, S., & Guan, J. (2012). Detecting potential collusive cliques in futures
agement (pp. 135142). Tata-McGraw Hill. markets based on trading behaviors from real data. Neurocomputing, 92, 4453.
Palshikar, G. K., & Apte, M. M. (2008). Collusion set detection using graph clustering. Zhai, J., Cao, Y., Yao, Y., Ding, X., & Li, Y. (2016, September). Computational intelligent
Data Mining and Knowledge Discovery, 16(2), 135164. hybrid model for detecting disruptive trading activity. Decision Support Systems.
Patterson, S., Strasburg, J., & Trindle, J. (2013, March). Wash Trades scru- doi:10.1016/j.dss.2016.09.003.
tinized. (Wall Street Journal) Retrieved from http://online.wsj.com/article/ Zukerman, M., Jia, L., Neame, T., & Woeginger, G. J. (2001). A polynomially solvable
SB10 0 01424127887323639604578366491497070204.html. special case of the unbounded knapsack problem. Operations Research Letters,
Poirriez, V., Yanev, N., & Andonov, R. (2009). A hybrid algorithm for the unbounded 29(1), 1316.
knapsack problem. Discrete Optimization, 6(1), 110124.