Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Emerging Patterns (EPs) are itemsets (characteristics) whose supports change significantly from one dataset to another. They have been proposed for a very long time to capture multi-attribute contrasts between data classes or trends over time. A study carried out in this work shows that Emerging Patterns, as it is formulated to date, has several deficiencies and limitations to face classification problems. Different approaches based on this previous and deficient formulation of Emerging Patterns have been proposed in the literature. These different approaches have been created showing that, despite these limitations, have very high predictive power. These approaches range from classifiers directly built on Emerging Patterns to instance-weighting schemes for weighted Support Vector Machines. In this work, a new formulation for Emerging Patterns, which is completely aimed at dealing with classification problems, is proposed.A new classifier and a new instance-weighting scheme have also been created based on the novel formulation. They have been created to prove the advantages of this novel formulation handling classification problems over the previous formulation. An empirical study carried out on benchmark datasetsfrom the UCI Machine Learning Repository shows that the proposed classifieris superior to other state-of-the-art classification methods such as C4.5, NaiveBayes. It has also shown to be superior to all of the most representative EPbased classifiers, based on the previous and deficient formulation, in terms of overall predictive accuracy in almost all of the used databases. The created instance-weighting scheme has been also empirically compared with the previous related works outperforming them in most of the cases.
In addition to these empirical studies about the predictive power of the new formulation,a second set of studies has also been carried out. This second set of studies was made to show some other interesting features of the novel formulation.These other interesting features are, for instance, the number of patterns required to do a decent classification job or the robustness of the created classifierbased on the new version of Emerging Patterns. These other features, in addition to the overall predictive accuracy, could also be determinant in the selection of the appropriate classifier for some specific classification problems. It is this way because, in very typical situations, there could be some classification constraints such as the available computational power or, for instance, the time to classify a test instance could be limited. The raise in overall predictive accuracy as well as the other results could be considered as clear proofs of the advantages of the novel formulation handling classification problems. To finish, drafts of some possible future works based on the proposed formulation of Emerging Patterns are also given, describing them in some detail. These possible future works, if they were successful, could be of great importance and could be seen as new tools to handle classification problems.