The Coordinated Max-Median Rule

Connexions module: m32523
The Coordinated Max-Median Rule for Portfolio Selection
Ricardo Anito
This work is produced by The Connexions Project and licensed under the Creative Commons Attribution License
Abstract
A partial summary of the work performed by one Computational Finance PFUG [under Rice University's VIGRE Summer Reserach Program] is provided in this module. VIGRE (Vertically Integrated Grants for Research and Education) is funded by the NSF (National Science Foundation). Empirical Research was geared towards assessing the performance of an "improved" n-at-a-time stock selection rule for portfolio construction. The "Coordinated Max-Median" algorithm developed is described in detail along with its computational challenges. Also included are various evaluations performed with real world data (S&P 500 Index). This Connexions Module summarizes the details of such research.
1 Motivation
1.1 The Max-Median Rule for Portfolio Selection
Previous research suggests that there exist strategies that, when implemented over a long period of time, might provide higher returns than overall market performance (see, e.g. [1]). One of these strategies, namely the Max-Median Rule, was investigated by Thompson and Baggett (see [2]), and served as a general motivation for this research. By selecting a handful of stocks, according to some robust criterion (e.g. the median) and rebalancing consistently without straying away from the strategy, virtually any investor could easily manage his or her portfolio quite reasonably. Over the long-haul, this strategy would provide decent returns when compared to a benchmark index (e.g. investing retirement funds such as a the S&P 500 Index). It is worthwhile noting that in strategies such as these, time is a major consideration (and one which investors can control, e.g. when
401K ), and that these methods do not constitute day-trading strategies,
and should be adhered-to consistently over a given period. Several salient points of this motivating investment strategy are: 1. It is accessible to any individual investor. 2. Over an extensive time-period for which it was examined (i.e. outperformed the S&P 500 Index by about 50%. 3. It was slightly more volatile on a yearly-basis. An eect that can, to a reasonable extent, be used to an investor's advantage in longer-term investment strategies.
Version
1.1: Oct 19, 2009 1:46 am GMT-5
37 years - 1970 through 2006) it
http://creativecommons.org/licenses/by/3.0/
http://cnx.org/content/m32523/1.1/
These points clearly serve us as a motivation for further investigation and potential improvements.
In
particular, through recognizing that the existing strategy, albeit well-performing, is inherently a one-at-atime strategy and therefore does not capture any correlation-related dynamics through its selection criteria. Lastly, we were also motivated to investigate (at least initially) equally-weighted portfolios. An interesting nding (see, e.g. Wojciechowski, Baggett, and Thompson [3]), is that for the 33 years from 1970 through 2002, not simply a ukish few, but a staggering 65 percent of the portfolios selected randomly from the 1,000 largest market cap stocks lie above the Capital Market Line (CML). Also, it has been shown (see [2]) that any individual who invested equally in the S&P 500 Constituents (time period of 1970 through 2006) would have made, on a yearly average, 13.7%, as opposed to 8.9% with a competing market-cap weighted strategy. Both of these empirical realities make, at least preliminarily, a case against considering long-term market-cap weighted strategies.
2 The Coordinated Max-Median Rule

2.1 Introduction
We now consider a strategy which allows us to implicitly capture the joint performance of securities as part of our selection criteria. Our goal is to pick, from the universe of investible stocks, a meaningful handful on which to equally allocate a given investment quantity on a yearly basis. As a rst step, we consider the S&P 500 constituents to be our universe of stocks from which can select a critical few (a number which we have set initially, and somewhat arbitrarily, to 20. This quantity seemed both appealing and reasonable in terms of being nancially manageable and computationally feasible). It is also worthwhile noting, that we regard limiting an investor to select from the S&P 500 (or any other well-known index) as both a reasonable and soundly restricted starting point. Furthermore, we also know that stocks listed in the S&P 500 are representative of various market sectors (inherently diversied) as well as of various reasonable company sizes (in terms of market capitalization). Additionally, other ltering criteria inherent in a reasonable-size index (in terms of the number of constituents), seem to provide a good baseline both as a benchmark (to outperform) and as a sensible constraint to the universe of all potentially-considered stocks.
2.2 Preliminary Setup

Our rst step is to select a subset of stocks from a given index in which to allocate a given investment at any given point in time. Here, and in general, we can start by considering a subset of n stocks from a given index I with K constituents. Our evaluations considered I =S&P 500 (for which baskets of
n = 20
each time. Based on this setup, there is a total of
K = 500) and assembling 500 C20 2.667 1035 unique baskets
of randomly selected securities that we could potentially consider.
Clearly, if we require evaluating some
optimal objective function over all possible combinations, this becomes computationally infeasible. Instead, we proceed by selecting stocks according to some plausible robust criterion that can be applied to any randomly assembled basket (the most appealing, both in terms of interpretation and prior results, being the median of the portfolio daily-returns). parallelize this eort. We also note, and quite emphatically so, that to both evaluate a meaningful amount of portfolios as well as to assess procedure repeatability, we clearly need to
2.3 Algorithm
Consider the following algorithm: Step 1. Pick n-stocks (e.g. n-stocks.
n = 20)
from the S&P 500 Index at random.
Step 2. Form Portfolio j (start with
j = 1)
at time
t = 0,
i.e.
Pj (t = 0),
by equal-weight investment in these
Step 3. On a day-to-day basis (and for T trading days in any given year) compute the daily-returns for Portfolio j:
rj (t) :=
Step 4. Sort these for the years trading days.
Pj (t) Pj (t 1) ; t = 1, 2, ..., T Pj (t 1)
(1)
Step 5. Calculate the median daily-return for Portfolio j, let Step 6. Step 7.
Pj := median (Pj ). Repeat Steps (1-5) above for j = 1, 2, ..., J (e.g. J = 10, 000) additional randomly Pick the portfolio with the highest median, i.e. Pj s.t. j = argmax Pj .
j{1,2,...,J}
selected portfolios.
Step 8. Invest equally in
Pj .
Step 9. Hold for one year, then liquidate. Step 10. Repeat Steps (1-9), yearly, over the time-frame of interest. We pick a subset of
n = 20
yearly investible stocks according to the aforedescribed criterion at the end
of any given year (using the most recent one-year data). We then allocate our investment quantity in this portfolio, on the rst trading day of the subsequent year, holding it for one year and concurrently collecting data during this year to repeat this procedure at the end of the year. We essentially keep on repeating this procedure over the period for which we want to evaluate the strategy. It is also interesting to note that under the previous motivating rule (i.e. rather short, amount of time (an essentially the Max-Median Rule), we would always get an exact answer regarding which stocks had the single highest max-medians, in a nite,
N P -complete
problem).
This implied determinism, in the
sense that any subsequent runs would produce the same results, amounts to a variance of zero. However, it is rather evident that our modied algorithm is inherently stochastic as we cannot evaluate all possibly imaginable combinations of portfolios. As a direct consequence, and by randomly selecting a reasonable number of portfolios for evaluation we expect to observe some natural variation, in the sense of the procedure's repeatability (each run will be essentially unique). It is possible (and rather interesting) to exploit this natural variation to assess the overall repeatability of this modied procedure.
2.4 Data Summary and Description

Data were obtained from the University of Pennsylvania Wharton/WRDS Repository [4]. data were utilized for our evaluations: 1. S&P 500 December Constituents' Factors, The following
GVKEYs,
1965 to 2006 (Compustat).
2. S&P 500 Daily Data [including: Returns with Dividends, Share Price, Shares Outstanding, Adjustment 3. Mapping Table from
PERMNOs (CRSP)]. GVKEYs
to
PERMNOs.
Finance:
Data were also obtained from
Yahoo!
1. Company Tickers for S&P 500 December 2007 Constituents. 2. Index Returns for 3. Index Returns for
SPX (S& P 500 Market-Cap Weighted). SPX.EW (S& P 500 Equally Weighted, available
from mid-2003 to present).
For our evaluations we note that our yearly returns with dividends were calculated from the rst trading day to the last trading day per year and that dividends were included. Also the size of the data les analyzed was approximately
900MB.
2.5 Parallel Processing Environment and Software

It is worthwhile mentioning some general details regarding the overall parallelized implementation of this procedure. It was successfully implemented using the software prehensive R Archive Network (CRAN). Several packages available for
R,
widely and freely available from the Com-
R, make a parallelized implementation of
the algorithm very straightforward. In particular, we made use of (see [7]), both running over
open-MPI.
snow (see,
e.g. [5] and [6]), and
snowfall
Some of the reasons for choosing this implementation were:
1. Framework provides a powerful programming interface to a computational cluster (such as those avail2. Freely available under the Comprehensive R Archive Network (CRAN). 3. Easily distributes computations of existing functions (after pertinent modications) to various computation nodes. 4. Excellent for embarrassingly-parallel implementations and computations. In essence, this approach was very appealing in terms of performance, development time, and cost (essentially able at Rice University, e.g.
SUG@R
and
ADA).
open-MPI,
parallel
free).
Although faster and more ecient implementations are possible (e.g. the aforementioned implementation was sucient for our purposes).
C/C++
and
Fortran
with with
The code utilized was initially developed for sequential execution (in
SAS)
and then converted to
similar performance. It was subsequently converted from sequential to parallel to exploit the benets of a
R-implementation.
The standard steps for this conversion process are pretty standard, essentially:
1. We identify the loops which carry independent computations. we have two main loops. Firstly, simulating J portfolios can be regarded as J independent operations, which we can execute concurrently. Secondly, running the algorithm over a number of N years, can be regarded as N independent compu2. Vectorize loops (Workhorse Functions). tational eorts.
3. Gather results (Combine/Gather Functions). 4. Distribute Execution (Call functions with
snow). Cray XD1 Research Cluster (ADA)

- occupying 16
Several 64 processor jobs were submitted to Rice's simulated tracks of 40 years each.
nodes, each with a total of 4 processors. The jobs would take less than 20 hours to complete, for about 64
3 Results
3.1 Preliminary Results
Several simulations were run with observed, above average performance (number of portfolios inspected per simulation was in the range of
J = 25, 000
to
J = 50, 000).
Figure 1 shows a simulation of
J = 50, 000
portfolios per year over a period of 43 years. Several interesting features can be noted through this gure. We can appreciate that any investor that made up a portfolio with an initial investment of $100,000 in 1970 and selected the same stocks chosen by our algorithm would have allowed his or her portfolio to compound to a total of $3.7M (by the end of 2008), which performed better than both an equal-investment strategy in the S&P 500 Index (about $2.7M) or a market-cap weighted investment strategy in the S&P 500 Index (slightly below $1M). Of course, we can imagine that the computational power that was used was not available in the early seventies, but moving forward it will be and to an extent this is what matters. Also it is clearly seen that the Coordinated Max-Median Rule is inherently a more volatile rule (as compared to the S&P 500). Next, Figure 2 describes 1 of 3 pilot runs that were evaluated with various measures suggesting the superiority of the max-median (as opposed to, say for instance the mean, as well as the benet of using the max rather than the min). This seems plausible, at least heuristically, as the median is most robust (the mean is least robust), and in some sense the previous years best performing companies are more likely to perform better next year than the previous years worst performing companies (in terms of returns).
Figure 1:
Coordinated Max-Median Rule
(Single Run) 50,000 Portfolios Evaluated per Year
Figure 2:
Coordinated Max-Median Rule
additional evaluations
(Single Run) 25,000 Portfolios Evaluated per Year with
3.2 Recent Evaluations and Results

Recent evaluations have been mostly focused on evaluating the following: 1. Repeatability of the procedure in terms of the variability associated with its possible tracks for each realization. 2. Determining any additional gain (if any) in terms of returns as a function of the number of portfolios evaluated (J ) at any given year. 3. The existence of any indications between current year portfolios' medians and subsequent year (same portfolio) performance. Are there any associations and if so how weak or strong are these? 4. Investigating a stable and plausible stopping rule and assessing how benecial it might be to run the random search until this condition is met. Several experiments were set up to determine if it would be worthwhile to inspect more randomly sought portfolios on a yearly basis as part of the overall procedure. A job simulating a total of 104 tracks (each consisting of
J = 25, 000
ADA
portfolios per year over a 43 year period, 1965 though 2008) was submitted to Several important observations can be made from
and took approximately three days to complete.
the outcomes of these simulations (shown in Figures 3 and 4, below). We note that, here, we can exploit the independence regarding the portfolios evaluated to get 52 tracks of combining pairs of
J = 50, 000
portfolios each by
J = 25, 000
tracks and selecting the maximum of the pair (simply the maximum of a
longer execution). Essentially this gives us information regarding what would have happened (in terms of the performance of the strategy should we have run it for twice as long). Analogously, tracks for and
J = 100, 000
J = 200, 000
portfolios were constructed. Finally, some overall discussion of the results is given after
the gures.
Figure 3: Procedure Repeatability and Number of Portfolios Sampled Simulation (Left: J = 25, 000
tracks, Right: J = 50, 000 tracks)
Figure 4: Procedure Repeatability and Number of Portfolios Sampled Simulation (Left: J = 100, 000
tracks, Right: J = 200, 000 tracks)
The total portfolio value was evaluated at the end of both years 2006 and 2008 and contrasted to both market performance (blue track) and the performance of a single track of a whopping portfolios considered yearly (green track). function of time, and by chance we might under-perform the market.
J = 2, 600, 000
As expected the variability of the procedure compounds as a However, more often than not the
procedure out-performed the market and by a quite reasonable amount. The proportion of portfolio-tracks simulated that were over an equally-weighted alternative at the end of 2006 was over 80% (for the cases where
J = 25, 000 and J = 50, 000) and over 90% (for the cases where we assessed more randomly sought portfolios, J = 100, 000, and J = 200, 000). Also, there is weak evidence suggesting that, although running as many as J = 2, 600, 000 portfolios might at times outperform the market, this approach is generally not
i.e. consistently higher on-average than considering tracks consisting of less yearly-evaluated portfolios. Another rather interesting observation is made through the scatter-grams produced (see Figure 5, below) assessing the correlation between current year portfolio median and (same portfolio) next year performance contrasted to the performance of the S&P 500 index. The number of portfolios evaluated for this purpose was
2, 000, 000 and the those that are highlighted as producing the maximum of the medians represent (< 0.1%, i.e. < 2, 000). The main purpose of this eort was to assess any associations between the current year
medians as a forward-looking measure of portfolio performance (as we intend to pick the maximum and by chance we can pick portfolios of performance similar to those in the top 0.999 percentile). As expected the associations are weak, though not extremely weak (correlations are
0.209
for the rst case and
0.182
for
the second), however can be noticed and depend highly of the year evaluated. More often than not, we observed a positive correlation for the years inspected (the strongest correlations are those shown in the gures below). It turns out that for certain years (those with a negative correlation), we ought to utilize the min-median as a selection criterion. However, this cannot be known ex-ante, and the best we can do is utilize a measure that more often than not, produces above-average results. Here again, we can appreciate how these conicting eects would average-out with time in a favorable direction, reiterating the fact that a strategy such as this one, if considered, should be evaluated over the long-haul.
Figure 5: Next Year Portfolio Performances vs. Current Year Portfolio Medians (Left: 1998 vs. 1997,
Right: 2001 vs. 2000)
Lastly, several evaluations were performed comparing the various max-medians of the portfolios simulated as a function of the number of portfolios run (i.e. J ) and compared to the single-stock max-median (See Figure 6 below), which could, at least heuristically, serve as an upper bound. This resulted (empirically) to be somewhat unstable as there is no guarantee that any thresholds set in terms of percentage to the bound could be attained in any reasonable computing time, mainly due to the fact the after a reasonable amount of simulations (namely
J = 500, 000 and up to J = 2, 000, 000) the percentages of this single-stock max-median J = 10, 000
simulations, which seems stable,
attained depended considerably on the year inspected, making a generalization impossible. The most recent evaluations were performed with stopping after 5 ticks past to, for instance a hard-coded constant J stopping rule. however based on aforementioned results it seems to not provide any incremental benet when contrasted
10
Figure 6: Max-Median Searches as a function of J (Left: 1984, Right: 1991)
3.3 Future Directions

Several items are open at this point that might be worthwhile investigating in future research. them are the following (to mention a few): 1. The identication and investigation of any exogenous variables contributing to any observable associations between current-year portfolio medians and next-year portfolio performances. This is of particular interest as it would provide us with the possibility of meaningfully modifying the simple-criterion to make more informed decisions based on empirical evidence. 2. Considering data from previous years to make the decision at a given year (rather than only considering data from the previous year) as well as investigating any robust-type interpolations (e.g. median or quantile related regression methods). 3. Assessing the reproducibility of the procedure (or in general its performance) in other markets (international) and or other indexes (S&P 100, Russel 1000, NASDAQ, etc.) 4. Investigating a more meaningful rule regarding when to stop the random-search, and how it relates to overall procedure performance. Amongst
4 Conclusions
In this module, we have presented the details of a modied version of the existing Max-Median Rule allowing for the joint selection of securities within this long-term investment strategy. This modied rule, namely the Coordinated Max-Median Rule, essentially bases the median selection criterion on the joint portfolio performance, rather than on single-stock individual performances. We saw that these modications came with a cost of increased combinatorial complexity and that due to the impossibility of evaluating all potentiallyinvestible portfolios, a parallelized computational approach had to be considered to assess a satisfactory number of portfolios on a yearly basis for potential investment. The algorithm's implementation was discussed, and several conclusions were drawn, the most signicant being that our modied algorithm, much more often than not, seems to out-perform the market (in terms of the S&P 500 Index) when a disciplined investor adheres to it for a reasonable amount of time. The data suggest that one of the contributing factors
11
for this on-average higher performance, at least in part, are the correlations between current year portfolio medians and next year portfolio performance, which seem both weak and not always positive. We noted that, more often than not, these correlations tend to be positive, an eect that seemingly averages out in a positive direction over the long-haul. We have also evaluated the performance of the described procedure on real-world S&P 500 data consisting of 43 years, and several potential future improvements, such as further work regarding a more robust stopping rule and the assessment of the procedure reproducibility with other indexes and or markets, were discussed.
5 Acknowledgements
Special thanks are given to Drs. James Thompson and Scott Baggett, as well as to Drs. Linda Driskill and Tracy Volz, for their overall help and coaching throughout this summer research project. In particular special thanks are given to both the NSF and VIGRE for making this research a reality.
6 Bibliography
1. O'Shaughnessy, James P. (2003). What Works on Wall Street. A Guide to the Best-Performing Investment Strategies of All Time (Third Edition). 2. Thompson, James R., Baggett, L. Scott (2005). Everyman's Max-Median Rule for Portfolio Selection. 3. Thompson, James R., Baggett, L. Scott, Wojciechowski, William C. and Williams, Edward E. (2006). Nobels for Nonsense. The Journal of Post Keynesian Economics, Fall, pp. 3-18. 4. Wharton Research Data Services (URL: http://wrds.wharton.upenn.edu/) 5. Rossini, A., Tierney, L., and Li, N. (2003). Simple parallel statistical computing. in R. UW Biostatistics working paper series, Paper 193, University of Washington. 6. Tierney, L., Rossini, A., Li, N., and Sevcikova, H. (2004). Workstations. Version 0.2-1. 7. Knaus, Jochen (2008). Developing parallel programs using snowfall The snow Package: Simple Network of

The Coordinated Max-Median Rule

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

The Coordinated Max-Median Rule

Hochgeladen von

Copyright:

Verfügbare Formate

Connexions module: m32523