Hard problems force innovative approaches and attention todetail, their exploration often contributing beyond the areainitially attempted. This thesis investigates the data miningprocess resulting in a predictor for numerical series.Theseries experimented with come from financial data - usuallyhard to forecast.
One approach to prediction is to spot patterns in the past,when we already know what followed them, and to test on morerecent data. If a pattern is followed by the same outcomefrequently enough, we can gain confidence that it is a genuinerelationship.
Because this approach does not assume any special knowledgeor form of the regularities, the method is quite general -applicable to other time series, not just financial. However,the generality puts strong demands on the patterndetection - asto notice regularities in any of the many possible forms.
The thesis' quest for an automated pattern-spotting involvesnumerous data mining and optimization techniques: neuralnetworks, decision trees, nearest neighbors, regression,genetic algorithms and other. Comparison of their performanceon a stock exchange index data is one of the contributions.
As no single technique performed sufficiently well, a numberof predictors have been put together, forming a votingensemble. The vote is diversified not only by differenttraining data - as usually done - but also by a learning methodand its parameters. An approach is also proposed how tospeed-up a predictor fine-tuning.
The algorithm development goes still further: A predictioncan only be as good as the training data, therefore the needfor good data preprocessing. In particular, new multivariatediscretization and attribute selection algorithms arepresented.
The thesis also includes overviews of prediction pitfallsand possible solutions, as well as of ensemble-building forseries data with financial characteristics, such as noise andmany attributes.
The Ph.D. thesis consists of an extended background onfinancial prediction, 7 papers, and 2 appendices.
Kista: Data- och systemvetenskap , 2003. , viii, 44 p.