Beruflich Dokumente
Kultur Dokumente
prediction
2014-01-26 Joseph Ellsworth 4 Comments ensemble, ensemble stock, ensemble stock prediction, ensemble stock
price,knn, knn ensemble, knn neighbor, knn optimization, knn optimizer, knn prediction, knn price prediction, knn
process, knn stock price,knn stock selection, nearest neighbor, Percent Higher High, predict stock prices, similar
movements. How to combine KNN, score normalization and and Ensemble techniques for
trading predictions.
algorithms to predict future stock price movements. Please send feedback on how to
make it more clear. I wrote this at the level of abstraction where I prefer to receive
knowledge. If there is strong demand for sample code then I may be willing to provide it in
a second article.
Basic KNN process for trading
prediction.
There are many KNN introductions available but at the core it is a statistical technique used
It requires a few extra steps to apply KNN to predict stock prices. The first step of this
process requires figuring out which indicators you want to measure and the second is
specifying a goal such as the stock price rising 1% with 10 days. For this example I will
use K=6 which means using the scores from the 6 neighbors with the closest score.
Maury Osborne introduced a concept that 2 Consecutive Higher ticks tend to have a higher
probability of the next tick going higher. I extended this to an indicator that measures the
percentage of bars or ticks with higher highs over a specified period of time. The
Percentage of higher high for 20 bars is abbreviated to PHH(20). I use PHH in this article
For this example I will use a simple goal of the closing price to rise by at least 1% in the next
40 bars. You can consider the 1% gain as magnitude and the 40 bars the duration
component of the goal. Some implementations require both magnitude and duration but
Let’s say that your measurement for the most recent bar is that 16 out of the previous 20
bars had a high that was above than the high from the prior bar. This would yield a raw
The raw indicator score is used to select the K scores closest to your computed
measurement. With K=6 then you would take the 3 with scores that are immediately above
and 3 scores immediately below our raw score 0.80. If 6 out of 8 of them had met the
was 1 if the goal was met and 0 if the goal was not met. I personally like the decimal
Using the 3 closest scores above and below is an arbitrary choice for distance computation.
It is possible for our raw indicator score of 0.80 to match more than K neighbors. When
this occurs we use the neighbors which are closest in time as the secondary selection filter
It is also reasonable to expand K when more exact matches are available. Since we are
using arrays with numeric indexes where bars and trades are known to be sorted by time
we can compute a distance in time from any bar as the absolute value of it’s index minus
Some KNN implementations always use a time component as part of the section criteria for
the neighbors and in that case a more complex selection algorithm. You could use the
time measured in # bars, minutes, hours or days as the X axis and the raw PHH(20) score
for the Y axis. The net difference in distances is computed by finding time difference and
the feature difference between the candidate bar and all other bars. These are then
combined by applying the Pythagorean theorem hypotenuse formula. Once these are
computed then a separate selection process used to find the bars with the shortest
hypotenuse. The more difficult part when including time as a selection point is to find a
way to normalize the time contribution against the raw indicator output scale so you can
properly balance the input of both. I tend to prefer restricting the total compare set and
using only the raw indicator score for selection of the neighbors but there are valid
where X is the duration portion of the goal. With K=6 you may have percentage gains of
[1.3%, 0.2%, -2%, 1%, 0.5%, 0.4%]. These can be used to compute a score that is an average
I like this input because it allows the magnitude of the change to be represented both
positive and negative as a component of the output numbers without requiring explicit
statement in the goal. For this sample it would be a raw score of 0.667%.
Normalize this score against the entire set which allows easy combination of disparate
scores in the ensemble described below. (Think of it as poor man’s gauge normalization).
If the entire set had an average score of 0.183% and our average score from our neighbors
This can be interpreted as this bars neighbors had an average gain 3.64 times higher than the
average gain for the entire set. Since this bar is similar then it is likely to produce a similar
result.
change for both sets. These can be used as separate inputs to the optimizer described
below as they both provide interesting inputs where one predicts the amount of movement
Performance Note: If you pre-compute the indicator to produce an array containing the output
metric and index into the trade or bar array and then sort by the output metric you can binary
search for closest matching scores very fast or you can use a tree structure with is fast to look-
up and fast to update. I prefer a tree structure which makes it easy to add new measurements
without inuring the sort overhead. A linear search also works but is much slower.
The scores for any bars too new to compute success or to provide the forward looking score
good single KNN measurement that is probably useful but is unlikely to be highly accurate
Ensembles combine the output from multiple models with the assertion that the combined
output is likely to yield more accurate results than the prediction from any single model.
Ensemble enthusiasts also claim that ensemble output are less likely to be negatively
The next step is to repeat the process for several additional measures and different time
frames. For example you may choose something like the following.
• Distribution rank of this max compared to the last (100) max scores.
each of these measure sets you can average those measures to produce the ensemble
score which hopefully will be more accurate than the score for any single indicator.
The score magnitude normalization step above is critical to allow easy combination via
averaging because all the scores are already relative to the entire set. One extension
that can yield substantial benefits is to extend the ensemble averaging method so it can
apply different weights for each measure set. An optimizer can be used to find the set of
weights which yield the highest success rate for the entire set. You can think of this as
choosing which expert you trust the most and giving their input the greatest weight.
single neighbor for a single feature can drive too much change in the overall score. I have
seen instances where 5 out of 6 neighbors had an average gain of 0.6% over the next 40
bars while one of the 6 neighbors had a loss of 4%. Statistically the outlier changed the net
score more than should have been allowed which made the output less accurate than
desired. This is only an issue when using the normalized score mechanism mentioned
above. If you are using the pure consensus rules then the statistical outlier is
automatically eliminated.
neighbors which are selected the most times by multiple features are those which have
exhibited the greatest similarity. Since they are the most similar you can you increase their
weighted contribution to the ensemble score relative to those which matched on fewer
facets. This allows you to accept some input from all the neighbors but give the greatest
impact to those neighbors which are most similar. This requires that you defer the
calculation of the feature average until after you have gathered the set of neighbors all of
neighbors that have future gains that are the furthest from the mean for the set. Think of
this as throwing out the statistical outliers from a Cauchy Distribution, Levy Distribution. I am
generally not a fan of throwing away data but in this instance our sole goal is to increase
prediction accuracy and in this special case it can be beneficial. Benoît B. Mandelbrot
introduced the idea that while stocks normally behave according to standard distribution
rules that they occasionally exhibit wild moves outside of statistical expectations. We normally
protect against these wild moves with hedges and stop losses but they have a secondary
say that “Some conditions produce wildly different effects that are unlikely to repeat in a
useful period of time”. If you can collect these wild effects into a cluster and use them, they
can be aids in prediction as kind of a reverse neighbor look-up. In most conditions they
represent a small number of wild outlier readings in a otherwise rational set that can skew
statical accuracy and need to be suppressed. Our ultimate goal is to maximize the number
of times we correctly predict the price movement magnitude or direction and to meet this
If more than a small number of neighbors are showing wild forward looking effects outside
the statistical norm then it is prudent to pay attention as it can be a warning of severe
risk. In this case we may actually keep the wild inputs and either tune down or suppress
of the input normal results. A high number of wild results in any set of neighbors can also
be used to dramatically reduce the confidence in the prediction or possibly eliminate the
prediction all together. This can be used when modeling to find and cluster wild effects
which can be a component of an early warning system for major market moves. This kind
of warning signal can be beneficial to the portfolio risk manager to suppress purchases,
adjust hedge levels and cause early sale of long positions. Using a system for wild
behavior identification is a perfect example of when more years of data and a larger
number of neighbors can yield beneficial results by providing more wild events to learn
from.
Any easy way of filtering wild input is to reject any neighbors that produce predictor
outputs that are are more X standard deviations away from the mean of the K neighbors. A
faster selection mechanism is to throw away the upper and lower 10% of the readings
keeping only the 80% in the center. The nice part of this approach is that it allows the
normalized score to be computed earlier which simplifies some data management. These
approaches can also be combined. I tend to prefer maximizing the contribution from
neighbors that have been judged the most similar but it could be beneficial to combine this
the KNN over some of other classification systems where it is more difficult to identify and
never seen it deliver pin point accuracy and it sends the wrong message to the user’s who
inevitably try to use it that way. On the other hand I have seen it deliver high accuracy
when predicting which stocks will rise in price versus which will drop. It is also beneficial
when comparing purchase opportunities where the greatest profit may be from the
purchase with the larger predicted price increase. If you use it as a directional indicator
rather than an absolute prediction I think you will see greater trading success.
If you applied the score magnitude normalization above then you can can take the output
score from the ensemble and multiply to obtain the future price prediction. The formula
for price change would be Last Bar Closing Price * Avg Perc Movement during duration
I have not tested this but I suspect the probability of accurately predicting the exact price
movement is relatively low. It is more useful when used as a +- 20% indicator so the actual
price movement would be fairly likely to be between $0.97 and $1.46. One way to measure
the accuracy of the system is to compute average % deviation between predicted score and
actual score. I suspect that the most accurate predictions will be for the top 85% to 95%
scores and the 5% to 15% scores but this would need to be proven with read data tests.
of how much you should trust the output of the algorithm to influence live trading. The
keep these two aspects distinct in the mind of the engineer because for optimization the
scoring must be fast to allow many passes whereas scoring for trust is best done with a
When optimizing you typically score against the training data set but and when scoring for
trust you score against the portion of data set aside for testing and which the training and
output score. Then sort these bars to find the top 10% of the bars with the best score. If
you have done everything right and had some luck in choosing the features (indicators)
then you should find that your bars which have the top 10 percentile of the combined
scores will have a success rate substantially higher than the average score for the total set.
You must suppress any bars from the earliest part of your history where you do not have
You can measure success based on maximizing the number of successful bars in the top 10
percentile of your computed scores. A more accurate measure for the optimizer is
obtained by running a quick back-test after each change. Consider the change good if it
Make sure you include transaction fees for the back-test or you will get inaccurate results.
3. Classify each items as a success if the price fell inside the predicted price range or a
5. iterate through the items keeping a count of total items reviewed so far versus successful
items.
7. The items success rate = Min (last items success rate, current items sucess rate)
8. Stop when current sucess rate drops below a threshold such as 90%
9. Classification Score for the pass is computed by # of sucess before threshold was
encountered / total set size. This score can be easily used by the optimizer.
• A similar approach can be used but rather than choosing a range, Use the logic
of if the magnitude of the change was positive and the move within duration
target was positive then sucess or if the magnitude of predicted change was
negative and the move was negative then sucess. Otherwise failure.
• A similar approach can be used but rather than choosing an absolute range and
actual move and keep a running average for absolute value of this difference.
When the value of this average rises above a threshold then exit. The number
• In trading we are generally most interested in the records at both ends of the
extreme movement scores because that is where trades represent the greatest
potential profit. You would run this score mechanism from both directions high
to low and low to high and it is quite probable that you may find a higher
2. Classify each items as a success if the price fell inside the predicted price range
3. Take the absolute range of bucket scores and divide by a target number of
buckets to obtain a target bucket size. For example if the absolute move
magnitude scores range from -6.83 to 5.2 and you want 100 buckets then the
have sufficient number of items in them. Buckets with too few of items should
be considered statistical outliers and rejected. If you have too many buckets
with too few of items then reduce the number of buckets or use an alternative
approach.
stocks or options based on the predicted price. Back-test results which yield the greatest
In this system you may choose to place long purchases any stock / bar combination where
the last purchase or last bar produced a score in the top 20% of the ranked scores and sell
when the last purchase or last bar produces a score lower than the top 70% of the top
scores. You may also include a limit which requires the predicted profit to be at least
300% of commission and fees costs. Force all transactions open at end of the test to exit
as if sold at market at that time. You should also include a maximum long value at risk so
you can accurately predict the effect on a portfolio. Use of a stop loss should be tested as a
optional component. Finally test for bankruptcy and abort the test when you reach the
I suspect that if using the Absolute normalized scoring above that the top 5% and bottom
5% of scores will be less accurate than those between the 5% to 25% and those 75% to
95%. I also expect those in the 40% to 60% to be less accurate when used for prediction
directional movement. If you use the ratio of accuracy for predicting a two part goal
shown then the most accurate should be those at the very top of the score range. I have
been surprised many times by so this is really a wild guess, the only real way to know if with
extensive testing.
Optimizing
Once you have the first set of scores for the ensemble you may find that it is very accurate
for some symbols and much less accurate for other symbols. I have seen this many times
and it tends to result from different stocks exhibiting different trading personalities. A
given indicator such as a SMA(5) as percent of SMA(50) may be very valuable for SPY and
produce negative results for AIG. I believe this is driven by the internal processes and
personalities of the institutional traders who are actively trading those symbols and these
Optimizing is the essentially the process we use to find combined set of factors which
produce the best result for a given subset of stocks during a given time-frame. The
5. Repeat for a Specified number of passes or until the system has reached a semi-stable
state.
The Optimizer can be used to change many factor such as:
• The contribution weight for various indicators or measure sets of the ensemble
• Change from limit or magnitude or directional sucess criteria to one of the others.
also saves a lot of time if you load the key values that can be changed by the optimizer from
a data file so you can save the results after the run. I always save the intermediate results
after each major pass so we can use those combinations as input to more advanced genetic
optimizers.
Designing optimizers is a art as much as a science and there is a risk of the optimizer finding
micro cases which look good but actually produce negative trading results. The easiest
value within a range +-50% of current value and test for that single change. More
advanced genetic optimizers have been used but be fore-warned that it is easy to sink
Many optimizers have a tendency to go down the wrong path which can cause them to
never discover better optimization opportunities. One way to work around this is to
periodically clear the entire optimization set to default state and then run new passes. If
you keep the results from these clean slate passes as data files then it is relatively easy to
use them as the input as the input genome for future genetic optimizer passes.
Early in my ML (Machine Learning) work I found that genetic optimizers depending on pure
random generated genome values very often yielded negative results for the entire set and
they never recovered. I invested weeks testing various ways to add random perturbation to
the genome combination process and found several that worked but none of them seemed
settings from known good result sets that where produced by a less sophisticated random
sequence optimizer.
measure how well the system will perform. The test data must be completely isolated from
the training data or you will not get obtain an accurate measurement of your effective lift.
The ultimate measure of success is how many of the bars from your reserved test data set
which have a score in the top (filter-range eg 90%) of the set succeed.
When testing new algorithms I like to reserve 20% of my total time for use as testing data.
You should expect your classification success rate in the test data set to be less than when
applying it to the training data set. One measure of brittleness is how much the difference
I would be pretty cool if your top 10% scores had 3 times the average win rate and your
Contact us
Required Disclaimer
Forex, futures, stock, and options trading is not appropriate for everyone. There is a
substantial risk of loss associated with trading these markets. Losses can and will occur. No
system or methodology has ever been developed that can guarantee profits or ensure
freedom from losses. No representation or implication is being made that using this
methodology or system or the information in this letter will generate profits or ensure
freedom from losses. Forex and Option trading can result in losses that exceed the original
principal balance.
performance record, simulated results do not represent actual trading. Also, since the
trades have not been executed, the results may have under-or-over compensated for the
impact, if any, of certain market factors, such as lack of liquidity. Simulated trading
programs in general are also subject to the fact that they are designed with the benefit of
hindsight. No representation is being made that any account will or is likely to achieve profit
Even results from Live cash trading can be subject to specific market conditions that may
not repeat in the future and as such, duplicate results from future trading is unlikely to
duplicate past results. Changing the dollar amount traded can cause different behavior in
live trading markets especially when trading large positons that can exceed the liquidity
Bayes Analytic LLC provides software that can produce trading signals. The customer is
responsible for choosing a configuration and parameters for the software that meets their
own goals. The customer is responsible for conducting their own tests and only the
customer can activate the software to start trading. The software runs in an account the
customer has logged into and then activated the software. Bayes Analytic has no control
of, influence over or visibility to the signals specific to given user because we have no
visibility into configuration parameters the user has chosen to operate with. The Bayes
customers responsibility to test the software to ensure it meets their trading requirements.
Every time Bayes Analytic releases a new version of the software the customer should
conduct new tests to validate the new version continues to meet their requirements
because every software change could have unexpected side effects that may not be obvious
until the customer has tested it in their environment with their configurations. The Bayes
Analytic software may run as a script inside of other software packages or talking to API that
Bayes Analytic has no control of or Influence over so the customer should test entire
ecosystem to ensure it meets their trading requirements. Bayes Analytic may provide the
software in source form since that is required by some trading systems but it remains the
exclusive copyrighted property of Bayes Analytic and may not be reverse engineered or
redistributed. The customer is responsible for choosing their own broker and installing
the Bayes Analytic software so it can trade using the desired account. Bayes Analytic has no
control over or influence of the broker and many brokers have different ways of quoting
spreads, charging commissions, flow of orders and latency of information. As such a
strategy and software that performs well at one broker may and probably will require
changes to perform well at other brokers. It is the customers responsibility to test the
software with their selected broker to ensure it meets their trading requirements.