[ View menu ]
Easy Forex

Backtesting & Data Mining

July 1, 2008

Introduction

In this article we’ll take a look at two related practices that are widely used by traders called Backtesting and Data Mining. These are techniques that are powerful and valuable if we use them correctly, however traders often misuse them. Therefore, we’ll also explore two common pitfalls of these techniques, known as the multiple hypothesis problem and overfitting and how to overcome these pitfalls.

Backtesting

Backtesting is just the process of using historical data to test the performance of some trading strategy. Backtesting generally starts with a strategy that we would like to test, for instance buying GBP/USD when it crosses above the 20-day moving average and selling when it crosses below that average. Now we could test that strategy by watching what the market does going forward, but that would take a long time. This is why we use historical data that is already available.

“But wait, wait!” I hear you say. “Couldn’t you cheat or at least be biased because you already know what happened in the past?” That’s definitely a concern, so a valid backtest will be one in which we aren’t familiar with the historical data. We can accomplish this by choosing random time periods or by choosing many different time periods in which to conduct the test.

Now I can hear another group of you saying, “But all that historical data just sitting there waiting to be analyzed is tempting isn’t it? Maybe there are profound secrets in that data just waiting for geeks like us to discover it. Would it be so wrong for us to examine that historical data first, to analyze it and see if we can find patterns hidden within it?” This argument is also valid, but it leads us into an area fraught with danger…the world of Data Mining

Data Mining

Data Mining involves searching through data in order to locate patterns and find possible correlations between variables. In the example above involving the 20-day moving average strategy, we just came up with that particular indicator out of the blue, but suppose we had no idea what type of strategy we wanted to test? That’s when data mining comes in handy. We could search through our historical data on GBP/USD to see how the price behaved after it crossed many different moving averages. We could check price movements against many other types of indicators as well and see which ones correspond to large price movements.

The subject of data mining can be controversial because as I discussed above it seems a bit like cheating or “looking ahead” in the data. Is data mining a valid scientific technique? On the one hand the scientific method says that we’re supposed to make a hypothesis first and then test it against our data, but on the other hand it seems appropriate to do some “exploration” of the data first in order to suggest a hypothesis. So which is right? We can look at the steps in the Scientific Method for a clue to the source of the confusion. The process in general looks like this:

Observation (data) >>> Hypothesis >>> Prediction >>> Experiment (data)

Notice that we can deal with data during both the Observation and Experiment stages. So both views are right. We must use data in order to create a sensible hypothesis, but we also test that hypothesis using data. The trick is simply to make sure that the two sets of data are not the same! We must never test our hypothesis using the same set of data that we used to suggest our hypothesis. In other words, if you use data mining in order to come up with strategy ideas, make sure you use a different set of data to backtest those ideas.

Now we’ll turn our attention to the main pitfalls of using data mining and backtesting incorrectly. The general problem is known as “over-optimization” and I prefer to break that problem down into two distinct types. These are the multiple hypothesis problem and overfitting. In a sense they are opposite ways of making the same error. The multiple hypothesis problem involves choosing many simple hypotheses while overfitting involves the creation of one very complex hypothesis.

The Multiple Hypothesis Problem

To see how this problem arises, let’s go back to our example where we backtested the 20-day moving average strategy. Let’s suppose that we backtest the strategy against ten years of historical market data and lo and behold guess what? The results are not very encouraging. However, being rough and tumble traders as we are, we decide not to give up so easily. What about a ten day moving average? That might work out a little better, so let’s backtest it! We run another backtest and we find that the results still aren’t stellar, but they’re a bit better than the 20-day results. We decide to explore a little and run similar tests with 5-day and 30-day moving averages. Finally it occurs to us that we could actually just test every single moving average up to some point and see how they all perform. So we test the 2-day, 3-day, 4-day, and so on, all the way up to the 50-day moving average.

Now certainly some of these averages will perform poorly and others will perform fairly well, but there will have to be one of them which is the absolute best. For instance we may find that the 32-day moving average turned out to be the best performer during this particular ten year period. Does this mean that there is something special about the 32-day average and that we should be confident that it will perform well in the future? Unfortunately many traders assume this to be the case, and they just stop their analysis at this point, thinking that they’ve discovered something profound. They have fallen into the “Multiple Hypothesis Problem” pitfall.

The problem is that there is nothing at all unusual or significant about the fact that some average turned out to be the best. After all, we tested almost fifty of them against the same data, so we’d expect to find a few good performers, just by chance. It doesn’t mean there’s anything special about the particular moving average that “won” in this case. The problem arises because we tested multiple hypotheses until we found one that worked, instead of choosing a single hypothesis and testing it.

Here’s a good classic analogy. We could come up with a single hypothesis such as “Scott is great at flipping heads on a coin.” From that, we could create a prediction that says, “If the hypothesis is true, Scott will be able to flip 10 heads in a row.” Then we can perform a simple experiment to test that hypothesis. If I can flip 10 heads in a row it actually doesn’t prove the hypothesis. However if I can’t accomplish this feat it definitely disproves the hypothesis. As we do repeated experiments which fail to disprove the hypothesis, then our confidence in its truth grows.

That’s the right way to do it. However, what if we had come up with 1,000 hypotheses instead of just the one about me being a good coin flipper? We could make the same hypothesis about 1,000 different people…me, Ed, Cindy, Bill, Sam, etc. Ok, now let’s test our multiple hypotheses. We ask all 1000 people to flip a coin. There will probably be about 500 who flip heads. Everyone else can go home. Now we ask those 500 people to flip again, and this time about 250 will flip heads. On the third flip about 125 people flip heads, on the fourth about 63 people are left, and on the fifth flip there are about 32. These 32 people are all pretty amazing aren’t they? They’ve all flipped five heads in a row! If we flip five more times and eliminate half the people each time on average, we will end up with 16, then 8, then 4, then 2 and finally one person left who has flipped ten heads in a row. It’s Bill! Bill is a “fantabulous” flipper of coins! Or is he?

Well we really don’t know, and that’s the point. Bill may have won our contest out of pure chance, or he may very well be the best flipper of heads this side of the Andromeda galaxy. By the same token, we don’t know if the 32-day moving average from our example above just performed well in our test by pure chance, or if there is really something special about it. But all we’ve done so far is to find a hypothesis, namely that the 32-day moving average strategy is profitable (or that Bill is a great coin flipper). We haven’t actually tested that hypothesis yet.

So now that we understand that we haven’t really discovered anything significant yet about the 32-day moving average or about Bill’s ability to flip coins, the natural question to ask is what should we do next? As I mentioned above, many traders never realize that there is a next step required at all. Well, in the case of Bill you’d probably ask, “Aha, but can he flip ten heads in a row again?” In the case of the 32-day moving average, we’d want to test it again, but certainly not against the same data sample that we used to choose that hypothesis. We would choose another ten-year period and see if the strategy worked just as well. We could continue to do this experiment as many times as we wanted until our supply of new ten-year periods ran out. We refer to this as “out of sample testing”, and it’s the way to avoid this pitfall. There are various methods of such testing, one of which is “cross validation”, but we won’t get into that much detail here.

Overfitting

Overfitting is really a kind of reversal of the above problem. In the multiple hypothesis example above, we looked at many simple hypotheses and picked the one that performed best in the past. In overfitting we first look at the past and then construct a single complex hypothesis that fits well with what happened. For example if I look at the USD/JPY rate over the past 10 days, I might see that the daily closes did this:

up, up, down, up, up, up, down, down, down, up.

Got it? See the pattern? Yeah, neither do I actually. But if I wanted to use this data to suggest a hypothesis, I might come up with…

My amazing hypothesis:

If the closing price goes up twice in a row then down for one day, or if it goes down for three days in a row we should buy,

but if the closing price goes up three days in a row we should sell,

but if it goes up three days in a row and then down three days in a row we should buy.

Huh? Sounds like a whacky hypothesis right? But if we had used this strategy over the past 10 days, we would have been right on every single trade we made! The “overfitter” uses backtesting and data mining differently than the “multiple hypothesis makers” do. The “overfitter” doesn’t come up with 400 different strategies to backtest. No way! The “overfitter” uses data mining tools to figure out just one strategy, no matter how complex, that would have had the best performance over the backtesting period. Will it work in the future?

Not likely, but we could always keep tweaking the model and testing the strategy in different samples (out of sample testing again) to see if our performance improves. When we stop getting performance improvements and the only thing that’s rising is the complexity of our model, then we know we’ve crossed the line into overfitting.

Conclusion

So in summary, we’ve seen that data mining is a way to use our historical price data to suggest a workable trading strategy, but that we have to be aware of the pitfalls of the multiple hypothesis problem and overfitting. The way to make sure that we don’t fall prey to these pitfalls is to backtest our strategy using a different dataset than the one we used during our data mining exploration. We commonly refer to this as “out of sample testing”.

Scott Percival
October 2006

Scott Percival is the Director of Research for the FOREX Statistical Research Center at Market-geeks.com, a site which is devoted to using mathematics and the scientific method to study the behavior of prices in the FOREX market. Mr. Percival has a degree in Civil Engineering from Northeastern University, and has worked as a Registered Representative and trading instructor at Fidelity Investments. He is currently working toward the goal of becoming a full time FOREX trader.

Market-geeks.com
Now…you have the edge.

FOREX Day Trading, day trading, day trading for dummy

#randomlink#

Using Trend Lines In Technical Analysis

June 30, 2008

What are Trend lines?

Trend lines are lines drawn on the historical price levels that depict general direction of where the marking is heading, and provide indications of support or resistance.

Drawing trend lines is a highly subjective matter. The best test of whether a trend line is a valid one is usually whether it looks like a good line. In an up trend, a trend line should connect the relative low points on the chart. A line connecting the lows in a longer-term rally will be a support line that can provide a floor for partial retracements. The down trend line that connects the relative highs on the chart will similarly act as resistance to shorter moves back higher.

http://www.actionforex.com/images/stories/articles/tut_tech_6_1.gif

http://www.actionforex.com/images/stories/articles/tut_tech_6_2.gif

Any two relative highs or lows will be on the same line, so it is possible to draw a tentative trend line between any two points. Traders can use tentative trend lines as an indication of where support or resistance might be, but until a tentative line holds as support or resistance, it is not yet confirmed as valid.

Of course, the more times a trend line holds, the stronger it will be in the future. If a single line can connect 4 or 5 relative lows, then the chances of the next pullback bouncing off the line are high.

The best trend line?

It is an unusual situation where three points on a chart will exactly coincide with a straight line connecting them. More often, prices will be close to a line, and a best-fit line will have to suffice. This is where trend lines become more art than science. Different traders may draw different trend lines given the same chart or even connecting the same series of relative low points.

Sometimes a trend line will have to be revised as new relative highs or lows appear. Even if the trend line is a very close fit between three or more points, it is important to be flexible and redraw trend lines when necessary.

http://www.actionforex.com/images/stories/articles/tut_tech_6_3.gif

Using High/Low or Close/Open

Often the differences in drawing trend lines depend on whether the high and low prices are used or whether the closing and opening prices are used to determine the line. On a candlestick chart, the question becomes using the wicks of the candlesticks instead of the solid bodies of the candles only.

http://www.actionforex.com/images/stories/articles/tut_tech_6_4.gif

Generally closing prices are more significant points than the intra-day prices on a chart, and if a trend line can be drawn using the body rather than the wick of a candle, the body should be used. Similarly, when drawing a trend line, an intra-day spike through a line should not automatically invalidate it. If there is a candle that closed below the trend line, though, it would be a much more serious breach of the line.

About the Author:

Action Forex provides forex analysis reports, live pivot points on majors and crosses, etc are provided with collection of carefully selected educational articles and free trading ebooks downloads.

FOREX Day Trading, day trading online, momentum day trading

#randomlink#

What Is Forex And Why Should You Trade It?

Although perhaps not as well known as some other markets, the Foreign Exchange (or Forex) market is the largest securities market in the world. Actually, if you combine all of the other markets in the United States together, Forex is 30 times bigger than even that. On average 2 billion dollars are turned over every day in Forex trading. Clearly, then the Forex market is something we should be interested in taking a closer look at.

I am sure you are familiar with the stock exchange where people buy and sell shares in companies. Forex also involves buying and selling but in global currencies rather than stocks. A trade in Forex will involve selling one countries currency in order to buy another’s. For example, I may believe that the Euro is going to strengthen and so I sell some of my US dollars to buy some Euros.

In the stock market, the shares of hundreds of different companies are traded on a daily basis. With Forex, the situation is a little bit simpler in that around 85% of the daily trading involves a small set of major currencies. These are the US Dollar, British Pound, Euro, Japanese Yen, Swiss Franc and the Canadian and Australian Dollars. These currencies are the most liquid which means there should always be a buyer available to accommodate a seller and vice-versa.

Trading in Forex begins in the morning in Sydney and progress across the world over a period of 24 hours before arriving back to start again in Sydney the next morning. This is a further benefit of trading in Forex as traders are able to take advantage of any important fluctuations and changes at any time of the day.

Andrew McNaught is a successful webmaster and publisher of Forex World Online where you can find out everything you need to know about Forex trading.

FOREX Day Trading, daytrading, day in online trading works

#randomlink#

« Previous