Steps in evaluating/testing a system

Jimmy · Post by **Jimmy** » Mon Apr 28, 2003 6:51 pm

Hi all,

I'd appreciate everyone's comments on the necessary steps to fully test a system. Assuming I have accurate end of day futures data for the last 20 years and I'd like to test a long term trend following system, here are the steps I think I need to walk through...
1) test base system and assess:

- returns (monthly, annual, and average per month)
- drawdowns (monthly, annual, max for monthly and annual)
- win/loss ratio
- average winner and average loser
- risk adjusted return (Sharpe and/or MAR ratio)

2) Optimize different levers (entry/exit criteria, different money mgmt criteria) within system to increase risk adjusted return(this is important to me as beginning equity will be small). Here is where I have the most questions. I've read about curve fitting a system and the negative outcomes of implementing a curve fitted system with real $. How do I avoid over-optimizing and/or curve fitting a system? Is this step similar to conducting sensitivity analysis where you leave all components constant and vary one component at a time to see the effects of the change? Then, once you determine the "best" inputs for each component, you move onto step 3?
3) Re-run test with the best inputs from step 2 and re-assess the same outputs as in step 1.

I know this is a very broad based question, but I'd like to learn what others are doing to test systems.

Thx,
Jimmy

Forum Mgmnt · Post by **Forum Mgmnt** » Tue Apr 29, 2003 1:36 pm

I'm surprised you didn't get a response earlier. Perhaps it's because this question is so hard to answer with a short response.

I'll address one aspect of your question: curve fitting and over optimization.

There are two separate and distinct dangers here:

Curve-Fitting - this is optimization to the extent that you get results that you are very unlikely to achieve in the real world. Many people talk about curve fitting in terms of the number of degrees of freedom or parameters in the system. Some state that a system with more parameters is worse than a system with less parameters.

I don't agree with this. While there certainly are greater opportunities for curve-fitting with many parameters, and therefore a system with more parameters is more susceptible to curve fitting, it is very possible to have a system with quite a few parameters that are not curve fit. The Turtle System represents one of these systems. More rules can yield better systems.

I prefer look at curve fitting by using my brain to figure out whether or not a parameter or rule is susceptible to curve fitting. If you look at the results of tests across a range of a given parameter, you can pretty easily determine whether or not the parameter is robust (and not susceptible to curve fitting), not robust, or somewhere in between. If the results across a range of values for a given parameter represent a fairly smooth curve, this is indicative of a parameter that is robust and does not indicate a curve fit. A curve that is spikey or that moves quickly up and down means there is a problem. I've found that parameters that have spikey outputs across a range of values usually mean a bug in my programming logic.

In general, the best way to get smooth curves is to have a lot of data and not have bugs in your logic. This means not optimizing for individual markets and using as much historical data as you have. About 20 years or so is pretty good for most systems.
Over Optimization - I differentiate this from curve-fitting because this is more of a mental problem than a testing one. If you optimize even robust parameters to their best values for risk-adjusted returns, you are not likely to duplicate those risk-adjusted results in the future.

The idea that the "best" values in the test are likely to be the "best" in actual trading is being overconfident and demonstrates a lack of understanding of basic statistics. Your sample (the historical test) while reasonably representative of the future is not an exact predictor.

If you want to understand what the most likely outcome of actual trading will be you are better off finding the average of the top portion of a curve for parameters rather than the maximum value of that curve.

So if you have a parameter that across incrementing values results in returns of 24%, 25%, 26%, 30%, 27%, 26%. You are safer if you assume results in the 26% to 27% range in actual trading since that value will hold even if the future changes the optimal state of the parameter in either direction.

Well, that's a couple of subject areas. Your question is a very good one and I hope that others pick up on some of the other important issues to keep in mind when testing.

Jimmy · Post by **Jimmy** » Tue Apr 29, 2003 8:54 pm

Thanks, c.f.! I appreciate the thorough response. I realize it is a tough question and everyone has their own unique methods for testing systems. I just wanted to ensure that I wasn't grossly out of line when I am testing different systems.

Sir G · Post by **Sir G** » Tue Apr 29, 2003 10:03 pm

Hi Jimmy-

I’ll take a stab at it also… isn’t it great to engage with c.f. & read his posts? This is almost like having season tickets to the Chicago Bulls when MJ played, or watching Karl Malone & John Stockon play together. We are privileged.

How do I avoid over-optimizing and/or curve fitting a system? Is this step similar to conducting sensitivity analysis where you leave all components constant and vary one component at a time to see the effects of the change? Then, once you determine the "best" inputs for each component, you move onto step 3?

What I do is something along those lines. I always start with the concept, let’s call it a market map. Then my job is to break apart that map in as many ways as I can and test out each part to see if it makes more then just theoretical sense.

The best way to deal with problems is to resolve problems. By this bottom up approach, I believe it removes a lot of issues that might cause problems later on… the main issue I believe we have as system designers have to aviod is lack of Robustness. This can be seen in Over-Optimizing or simply short changing our tests with limited data. Don’t think of it as a “system,” but think of it as parts to the whole. It is not a welfare system, every part must contribute to the greater good.

The best way around the over optimizing is throwing a literal ton of data at the system. Across all markets for as far as you can go. Another thing I personally like to see is activity of trades. I want things to occur early and often. And a nice calmness to the results when looking at neighboring variables. Hint: You just don’t want the neighbors to behave, you want the community to behave!

I also test out everything on its own merits, there are some creative ways to do this and they are easily conceived. Say for example an exit. I like to test this not just during an open trade, but I’ll turn it around and make it an entry and run it as such. If your system is only in the market 20% of the time and then you code and run an exit on it, you are missing the chance to run it on the other 80% of the data. Among other things, exits are to prevent further losses, rotate that statement a bit, squint your eyes, and you will see that it is the inverted definition of the entry as in the entries are to encourage further gains. So flip the code and make it into an entry, what are the strengths? What are the weaknesses?

When I find an exit that actually makes money when applied as an entry across the universe of data. I will gladly employ that as a component to the system. Do this to each part of the system. But keep the standards high.

Once you have components that are sturdy, then bring them together and then begin to buff out their brilliance. By looking at it in this way, you shouldn’t have any problems in the future in regards to lack of confidence in the system and losses that you didn’t see coming.

Or the one I love… the system stopped working! Yea, it stopped working the moment you stopped testing it.

The object isn’t to just build a system, but to build something that will work & last.

You need to know how the pieces fit together and what the peices are. once you know that, it is rather easy.

Hope it helps.

Kiwi · Post by **Kiwi** » Tue Apr 29, 2003 10:05 pm

Jimmy,

The reason I (and I suspect others) wasnt brave enough to reply was that the question is so big. The best reply might be to suggest that you read Stridsman's book "Trading Systems that Work". Although I disagree with his conclusions about how to roll contracts he does cover all of the material that you're intested in.

Other alternative's are in the reading list that Mark Johnson suggested.

John

damian · Post by **damian** » Tue Apr 29, 2003 11:09 pm

When I find an exit that actually makes money when applied as an entry across the universe of data. I will gladly employ that as a component to the system.

Now that makes sense. So much so that I appear to have never thought of it. Thanks SirG

damian

edward kim · Post by **edward kim** » Wed Apr 30, 2003 3:27 am

Hi Sir G,

"The best way around the over optimizing is throwing a literal ton of data at the system. Across all markets for as far as you can go. Another thing I personally like to see is activity of trades. I want things to occur early and often. And a nice calmness to the results when looking at neighboring variables. Hint: You just donâ€™t want the neighbors to behave, you want the community to behave!"

What does "neighboring" mean in this sense?

Great post! I never thought of rule inversion, so I will try it! I wonder if the contrapositive should always be true in system testing?

Edward

blueberrycake · Post by **blueberrycake** » Wed Apr 30, 2003 3:44 am

eck wrote: Great post! I never thought of rule inversion, so I will try it! I wonder if the contrapositive should always be true in system testing?

I am rather sceptical on this point. While a good exit may be a fine entry, as evidenced by various stop and reverse systems, there is no compelling reason for it to be so.

You may exit a long position not just because you think the price will collapse, but because the trend is no longer there, or the risk/reward ratio is just not high enough to justify maintaining the position.

I guess what I am saying is that the end of a trend does not always imply a trend reversal, so just because an exit does not perform well as a reverse entry does not mean that the exit is wrong.

-bbc

bloom · Post by **bloom** » Wed Apr 30, 2003 5:36 am

I think most curve-fitting comes from filters and complicated exit rules.
It all comes back to the underlying assumptions you are making about the behavior of the market. Adding filter is same as assuming the market won't provide any oppuntunities when the criteria is not met. I think this assumption by itself is very limiting and the filter can be easily curve-fitted to the market. Same thing with exit rules, exit rules for trend-following system is similar to assumptions about when the trend will end.
Complicated exits with many paramters is often just curve-fitting to the size, duration or choppiness of the trend. The end result being not letting profits run long enough or getting shake out of a major trend.
All this rules and filters being inherently contracdictory to the trend-following philosophy about never missing a move or letting your profit run.

I think it's just best to be a unassuming guy...

Jimmy · Post by **Jimmy** » Wed Apr 30, 2003 9:36 am

Incredible! Thanks to everyone for their comments! It's like c.f. opened the flood gates and everyone's great ideas just started gushing out...now, picture me as a sponge at the end of this flow of information absorbing as much as I can. I've read each response to my inquiry numerous times and I'm sure I will be referring to them as I continue with my testing.

Thanks again,
Jimmy

Sir G · Post by **Sir G** » Wed Apr 30, 2003 10:47 am

Hi BBC-

I am rather sceptical on this point. While a good exit may be a fine entry, as evidenced by various stop and reverse systems, there is no compelling reason for it to be so.

You may exit a long position not just because you think the price will collapse, but because the trend is no longer there, or the risk/reward ratio is just not high enough to justify maintaining the position.

I guess what I am saying is that the end of a trend does not always imply a trend reversal, so just because an exit does not perform well as a reverse entry does not mean that the exit is wrong.

Being skeptical can be a very good thing as it can encourage further understanding on everyone’s part.

I would say that, I did prefix my statement with “Among other things” as in.. “Among other things, exits are to prevent further losses, rotate that statement a bit, squint your eyes…” Exits do have a different meaning for different people. The way I see and do things is not going to be right for everybody else.

From the moment my trading became profitable I can not think of one exit that I have ever employed or plan on employing that has not shown signs of profitability when unleashed from the confines of the entry. Doing so would run counter to my beliefs and my concept of common sense.

This logic was also employed in the Turtle Rules, about a decade prior to me using it.

Sys1 Entry: 20 day Breakout.
Sys2 Entry: 55 day Breakout.
Sys1 Exit: 10 day Breakout.
Sys2 Exit: 20 day Breakout.

Testing the exit in this way is not in anyway looking to tap into the Stop & Reverse mode of trading. It is simply used to see if that component of the system is capable of doing what it is designed do.

I hope anything I say here is taken with a grain of salt. Everyone should be skeptical, one needs to beat this stuff around for themselves and find its strengths and weaknesses. And if you are inclined please report back your findings as I enjoy viewing things from different angles and I’ll shift my paradigms very quickly to something better!

Sir G

Sir G · Post by **Sir G** » Wed Apr 30, 2003 10:55 am

Hi eck-

What does "neighboring" mean in this sense?

Let’s imagine the Turtle Rules Entry1, which historically was set at a 20 day breakout. Let’s say you test it as a variable of 10 to 30 incrementing 1. You will have 21 tests 10,11,12 and so on up to 30. You want to see the neighbors and the neighborhood being profitable. If the value 20 is the only one that made money, or showed 80% of the overall profits of all 21 tests, you have an anomaly and you should pass. Because you have 21 tests each variable has the opportunity to represent 4.76% of the total and you see that one variable represents 80% of the total, that is the same as c.f. called a spike, you should pass in it..no matter how good it might look and feel to you.

A more visual way of seeing this, have you ever seen 3d landscape or Surface maps? They look like a mountain range, the peaks and valleys are actually your data points ebbing and flowing. What you want to see is not a mountain of sharp peaks but a nice short stubby mountain. That means that the “peak” is surrounded with other values that are almost as high.

Great post! I never thought of rule inversion, so I will try it! I wonder if the contrapositive should always be true in system testing?

After you try it out and if you think it's a worthwhile endeavor, start a thread on it with your findings and let's explore the issue. Let's see how far we can take it!

Sir G

tomb · Post by **tomb** » Wed Apr 30, 2003 4:06 pm

Hi Kiwi,

Although I disagree with his conclusions about how to roll contracts he does cover all of the material that you're intested in.

Can you elaborate more specifically about your disagreement?

Are you talking about his preference for proportional back adjusting?

Thanks,
Tom B

Kiwi · Post by **Kiwi** » Wed Apr 30, 2003 7:54 pm

I read the book when it first came out and can't remember all the details but he has a big thing about using "proportional back adjusting". His reasoning being (along the lines) that if you adjust additively then the proportion of events like the 1987 stock market crash are wrong. So he proposes adjusting multiplicatively instead.

I rolled 20 proportional contracts and tested all my systems on proportional instead of conventionally back adjusted contracts. Most of them performed better. I spent a lot of time checking the actual trades and the conclusion I came to (not scientifically rigorously) was that his approach didnt increase the accuracy of how you tested the systems because systems were entering & exiting because of either:
- a breakout
- a large volatility increase proportional to recent days
- acceleration which wasnt affected by proportional adjusting

So my disagreement wasn't that Thomas was particularly wrong. It was that his "new thing" was unneccessary and that it was less conservative than the old thing. It also meant that you would have to change your robobrokers over with resulting higher risk of error. I think there were some other small issues but I felt that his overall thrust was well worth reading.