I'd like to share some thoughts about concern over data
vendors whose data contain errors. I do not know how bad some data is, but in
general, I believe the following principles apply.
For historical testing, although using perfectly correct
data might be ideal for perfect optimization, I do not believe historical data
needs to be this good to develop a dependable and profitable trading system for
real-world use. Assuming the errors in data are relatively infrequent and not
obviously absurd, there should be little concern in using this kind of data for
system development and testing.
Since markets tend to look "continuous" on charts
with occasional "common-sense" appearing discontinuities, you can
generally spot data that is grossly in error, and estimated corrections to this
kind of error can be made. Errors of a few ticks on either highs or lows of a
daily range (or other sampling period) may be considered "noise."
Errors in opens or closes within high-low ranges may or may not be
"common-sense" detectable, so this data "noise" may be of
most concern. However, although market behaviors do tend to repeat, rarely, if
ever, do patterns of behavior duplicate themselves tick-for-tick.
Therefore, to assume that a trading-decision strategy must
be based on high precision tick-range patterns or indicators is asking for
trouble -- this would be indicative of over-optimization. Shallow-sensitivity
"robust" optimization, in my opinion, is quite desirable, but
steep-sensitivity optimization is likely to be disastrous. (Here,
"sensitivity" refers to the change in simulation results as the
characteristics of a market change over time, and shallow/steep refer to
abruptness of the change.)
My presumption also takes into account the total number of
trades that a strategy may generate over its useful lifetime. Fewer trades
imply longer trending durations and, therefore, the "noise" relative
to the magnitudes of the moves will be relatively small and insignificant. As
the number of trades increases for a given lifetime, the trend durations
shorten, moves generally become much less, and relative "noise"
becomes more significant.
However, assuming that a sufficient number of trades are
generated both in historical simulation and in real-trading so that,
statistically, no single trade dominates the overall results, a robust strategy
that produces consistent "small advantages" (like the casino examples
in recent CTCN issues) will by design, be inherently "noise immune."
Bottom line: So what if the historical data is somewhat in
error -- the future is likely to produce data that differs somewhat from the
past anyway, so a profitable trading strategy for any given market should be
tolerant of some reasonable variation in data, whether that data be historical
or yet-to-occur as the future develops. A "good" trading system
should be reasonably "noise" immune, and data that is somewhat
"noisy" can be quite adequate for trading strategy development
purposes.
Having said all this, would I, or could I, trust using
potentially flaky recent data to create real-time trading orders, for either
day-trades or position trades? If I did not want to take time to look over data
for obvious gross errors before mechanically (blindly) generating trading
orders, using unreliable data for this purpose could likely result in some very
expensive losing trades. (There could also be some serendipitous profitable
trades, but I wouldn't hold my breath!) So, in this context, having reliably
accurate data is imperative and I would definitely want to use a vendor whose
data I could trust.
Understanding the strengths, weaknesses, and underlying
design of one's trading strategy coupled with the emotional considerations of
trust, confidence, and belief in that trading model would dictate the comfort
level of using data that could have sporadic errors in certain ways. Even if I
were willing to take time to carefully examine all data for
"common-sense" correctness, I might not be too comfortable using data
that would require my constant vigilance, even though my "noise
immune" trading strategy would probably produce reliably profitable
results over the longer term. Bottom line: In real-time trading, for
peace-of-mind, get the most reliable data available.