kth.sePublications
Change search
Refine search result
1234 1 - 50 of 151
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Ahlberg, Ernst
    et al.
    Predictive Compound ADME & Safety, Drug Safety & Metabolism, AstraZeneca IMED Biotech Unit, Mölndal, Sweden.
    Winiwarter, Susanne
    Predictive Compound ADME & Safety, Drug Safety & Metabolism, AstraZeneca IMED Biotech Unit, Mölndal, Sweden.
    Boström, Henrik
    Department of Computer and Systems Sciences, Stockholm University, Sweden.
    Linusson, Henrik
    Department of Information Technology, University of Borås, Sweden.
    Löfström, Tuve
    Högskolan i Jönköping, JTH. Forskningsmiljö Datavetenskap och informatik.
    Norinder, Ulf
    Swetox, Karolinska Institutet, Unit of Toxicology Sciences, Sweden.
    Johansson, Ulf
    Högskolan i Jönköping, JTH, Datateknik och informatik.
    Engkvist, Ola
    External Sciences, Discovery Sciences, AstraZeneca IMED Biotech Unit, Mölndal, Sweden.
    Hammar, Oscar
    Quantitative Biology, Discovery Sciences, AstraZeneca IMED Biotech Unit, Mölndal, Sweden.
    Bendtsen, Claus
    Quantitative Biology, Discovery Sciences, AstraZeneca IMED Biotech Unit, Cambridge, UK.
    Carlsson, Lars
    Quantitative Biology, Discovery Sciences, AstraZeneca IMED Biotech Unit, Mölndal, Sweden.
    Using conformal prediction to prioritize compound synthesis in drug discovery2017In: Proceedings of Machine Learning Research: Volume 60: Conformal and Probabilistic Prediction and Applications, 13-16 June 2017, Stockholm, Sweden / [ed] Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, and Harris Papadopoulos, 2017, p. 174-184Conference paper (Refereed)
    Abstract [en]

    The choice of how much money and resources to spend to understand certain problems is of high interest in many areas. This work illustrates how computational models can be more tightly coupled with experiments to generate decision data at lower cost without reducing the quality of the decision. Several different strategies are explored to illustrate the trade off between lowering costs and quality in decisions.

    AUC is used as a performance metric and the number of objects that can be learnt from is constrained. Some of the strategies described reach AUC values over 0.9 and outperforms strategies that are more random. The strategies that use conformal predictor p-values show varying results, although some are top performing.

    The application studied is taken from the drug discovery process. In the early stages of this process compounds, that potentially could become marketed drugs, are being routinely tested in experimental assays to understand the distribution and interactions in humans.

    Download full text (pdf)
    FULLTEXT01
  • 2.
    Aler, Ricardo
    et al.
    Univ Carlos III Madrid, Avda Univ 30, Leganes 28911, Spain..
    Valls, Jose M.
    Univ Carlos III Madrid, Avda Univ 30, Leganes 28911, Spain..
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Study of Hellinger Distance as a splitting metric for Random Forests in balanced and imbalanced classification datasets2020In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 149, article id 113264Article in journal (Refereed)
    Abstract [en]

    Hellinger Distance (HD) is a splitting metric that has been shown to have an excellent performance for imbalanced classification problems for methods based on Bagging of trees, while also showing good performance for balanced problems. Given that Random Forests (RF) use Bagging as one of two fundamental techniques to create diversity in the ensemble, it could be expected that HD is also effective for this ensemble method. The main aim of this article is to carry out an extensive investigation on important aspects about the use of HD in RF, including handling of multi-class problems, hyper-parameter optimization, metrics comparison, probability estimation, and metrics combination. In particular, HD is compared to other commonly used splitting metrics (Gini and Gain Ratio) in several contexts: balanced/imbalanced and two-class/multi-class. Two aspects related to classification problems are assessed: classification itself and probability estimation. HD is defined for two-class problems, but there are several ways in which it can be extended to deal with multi-class and this article studies the performance of the available options. Finally, even though HD can be used as an alternative to other splitting metrics, there is no reason to limit RF to use just one of them. Therefore, the final study of this article is to determine whether selecting the splitting metric using cross-validation on the training data can improve results further. Results show HD to be a robust measure for RF, with some weakness for balanced multi-class datasets (especially for probability estimation). Combination of metrics is able to result in a more robust performance. However, experiments of HD with text datasets show Gini to be more suitable than HD for this kind of problems.

  • 3.
    Alkhatib, Amr
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Ennadir, Sofiane
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Johansson, Ulf
    Dept. of Computing, Jönköping University, Sweden.
    Approximating Score-based Explanation Techniques Using Conformal Regression2023In: Proceedings of the 12th Symposium on Conformal and Probabilistic Prediction with Applications, COPA 2023, ML Research Press , 2023, p. 450-469Conference paper (Refereed)
    Abstract [en]

    Score-based explainable machine-learning techniques are often used to understand the logic behind black-box models. However, such explanation techniques are often computationally expensive, which limits their application in time-critical contexts. Therefore, we propose and investigate the use of computationally less costly regression models for approximating the output of score-based explanation techniques, such as SHAP. Moreover, validity guarantees for the approximated values are provided by the employed inductive conformal prediction framework. We propose several non-conformity measures designed to take the difficulty of approximating the explanations into account while keeping the computational cost low. We present results from a large-scale empirical investigation, in which the approximate explanations generated by our proposed models are evaluated with respect to efficiency (interval size). The results indicate that the proposed method can significantly improve execution time compared to the fast version of SHAP, TreeSHAP. The results also suggest that the proposed method can produce tight intervals, while providing validity guarantees. Moreover, the proposed approach allows for comparing explanations of different approximation methods and selecting a method based on how informative (tight) are the predicted intervals.

  • 4.
    Alkhatib, Amr
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Johansson, Ulf
    Dept. of Computing, Jönköping University, Sweden.
    Assessing Explanation Quality by Venn Prediction2022In: Proceedings of the 11th Symposium on Conformal and Probabilistic Prediction with Applications, COPA 2022, ML Research Press , 2022, p. 42-54Conference paper (Refereed)
    Abstract [en]

    Rules output by explainable machine learning techniques naturally come with a degree of uncertainty, as the complex functionality of the underlying black-box model often can be difficult to approximate by a single, interpretable rule. However, the uncertainty of these approximations is not properly quantified by current explanatory techniques. The use of Venn prediction is here proposed and investigated as a means to quantify the uncertainty of the explanations and thereby also allow for competing explanation techniques to be evaluated with respect to their relative uncertainty. A number of metrics of rule explanation quality based on uncertainty are proposed and discussed, including metrics that capture the tendency of the explanations to predict the correct outcome of a black-box model on new instances, how informative (tight) the produced intervals are, and how certain a rule is when predicting one class. An empirical investigation is presented, in which explanations produced by the state-of-the-art technique Anchors are compared to explanatory rules obtained from association rule mining. The results suggest that the association rule mining approach may provide explanations with less uncertainty towards the correct label, as predicted by the black-box model, compared to Anchors. The results also show that the explanatory rules obtained through association rule mining result in tighter intervals and are closer to either one or zero compared to Anchors, i.e., they are more certain towards a specific class label.

  • 5.
    Alkhatib, Amr
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Vazirgiannis, Michalis
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Explaining Predictions by Characteristic Rules2023In: Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2022, Part I / [ed] Amini, MR Canu, S Fischer, A Guns, T Novak, PK Tsoumakas, G, Springer Nature , 2023, Vol. 13713, p. 389-403Conference paper (Refereed)
    Abstract [en]

    Characteristic rules have been advocated for their ability to improve interpretability over discriminative rules within the area of rule learning. However, the former type of rule has not yet been used by techniques for explaining predictions. A novel explanation technique, called CEGA (Characteristic Explanatory General Association rules), is proposed, which employs association rule mining to aggregate multiple explanations generated by any standard local explanation technique into a set of characteristic rules. An empirical investigation is presented, in which CEGA is compared to two state-of-the-art methods, Anchors and GLocalX, for producing local and aggregated explanations in the form of discriminative rules. The results suggest that the proposed approach provides a better trade-off between fidelity and complexity compared to the two state-of-the-art approaches; CEGA and Anchors significantly outperform GLocalX with respect to fidelity, while CEGA and GLocalX significantly outperform Anchors with respect to the number of generated rules. The effect of changing the format of the explanations of CEGA to discriminative rules and using LIME and SHAP as local explanation techniques instead of Anchors are also investigated. The results show that the characteristic explanatory rules still compete favorably with rules in the standard discriminative format. The results also indicate that using CEGA in combination with either SHAP or Anchors consistently leads to a higher fidelity compared to using LIME as the local explanation technique.

  • 6.
    Asker, Lars
    et al.
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Karlsson, Isak
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Zhao, Jing
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Mining Candidates for Adverse Drug Interactions in Electronic Patient Records2014In: PETRA '14 Proceedings of the 7th International Conference on Pervasive Technologies Related to Assistive Environments, PETRA’14, New York: ACM Press, 2014, article id 22Conference paper (Refereed)
    Abstract [en]

    Electronic patient records provide a valuable source of information for detecting adverse drug events. In this paper, we explore two different but complementary approaches to extracting useful information from electronic patient records with the goal of identifying candidate drugs, or combinations of drugs, to be further investigated for suspected adverse drug events. We propose a novel filter-and-refine approach that combines sequential pattern mining and disproportionality analysis. The proposed method is expected to identify groups of possibly interacting drugs suspected for causing certain adverse drug events. We perform an empirical investigation of the proposed method using a subset of the Stockholm electronic patient record corpus. The data used in this study consists of all diagnoses and medications for a group of patients diagnoses with at least one heart related diagnosis during the period 2008--2010. The study shows that the method indeed is able to detect combinations of drugs that occur more frequently for patients with cardiovascular diseases than for patients in a control group, providing opportunities for finding candidate drugs that cause adverse drug effects through interaction.

  • 7.
    Asker, Lars
    et al.
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Persson, Hans
    Identifying Factors for the Effectiveness of Treatment of Heart Failure: A Registry Study2016In: IEEE 29th International Symposiumon Computer-Based Medical Systems: CBMS 2016, IEEE Computer Society , 2016Conference paper (Refereed)
    Abstract [en]

    An administrative health register containing health care data for over 2 million patients will be used to search for factors that can affect the treatment of heart failure. In the study, we will measure the effects of employed treatment for various groups of heart failure patients, using different measures of effectiveness. Significant deviations in effectiveness of treatments of the various patient groups will be reported and factors that may help explaining the effect of treatment will be analyzed. Identification of the most important factors that may help explain the observed deviations between the different groups will be derived through generation of predictive models, for which variable importance can be calculated. The findings may affect recommended treatments as well as high-lighting deviations from national guidelines.

  • 8.
    Asker, Lars
    et al.
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Learning from Swedish Healthcare Data2016In: Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments, Association for Computing Machinery (ACM), 2016, Vol. 29, article id 47Conference paper (Refereed)
    Abstract [en]

    We present two ongoing projects aimed at learning from health care records. The first project, DADEL, is focusing on high-performance data mining for detrecting adverse drug events in healthcare, and uses electronic patient records covering seven years of patient record data from the Stockholm region in Sweden. The second project is focusing on heart failure and on understanding the differences in treatment between various groups of patients. It uses a Swedish administrative health register containing health care data for over two million patients.

  • 9.
    Boström, Henrik
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Calibrating Random Forests2008In: Proceedings of the Seventh International Conference on Machine Learning and Applications (ICMLA'08), IEEE Computer Society, 2008, p. 121-126, article id 4724964Conference paper (Refereed)
    Abstract [en]

     When using the output of classifiers to calculate the expected utility of different alternatives in decision situations, the correctness of predicted class probabilities may be of crucial importance. However, even very accurate classifiers may output class probabilities of rather poor quality. One way of overcoming this problem is by means of calibration, i.e., mapping the original class probabilities to more accurate ones. Previous studies have however indicated that random forests are difficult to calibrate by standard calibration methods. In this work, a novel calibration method is introduced, which is based on a recent finding that probabilities predicted by forests of classification trees have a lower squared error compared to those predicted by forests of probability estimation trees (PETs). The novel calibration method is compared to the two standard methods, Platt scaling and isotonic regression, on 34 datasets from the UCI repository. The experiment shows that random forests of PETs calibrated by the novel method significantly outperform uncalibrated random forests of both PETs and classification trees, as well as random forests calibrated with the two standard methods, with respect to the squared error of predicted class probabilities.

     

     

  • 10.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Concurrent Learning of Large-Scale Random Forests2011In: Scandinavian Conference on Artificial Intelligence, IOS Press , 2011Conference paper (Refereed)
    Abstract [en]

    The random forest algorithm belongs to the class of ensemble learning methods that are embarassingly parallel, i.e., the learning task can be straightforwardly divided into subtasks that can be solved independently by concurrent processes. A parallel version of the random forest algorithm has been implemented in Erlang, a concurrent programming language originally developed for telecommunication applications. The implementation can be used for generating very large forests, or handling very large datasets, in a reasonable time frame. This allows for investigating potential gains in predictive performance from generating large-scale forests. An empirical investigation on 34 datasets from the UCI repository shows that forests of 1000 trees significantly outperform forests of 100 trees with respect to accuracy, area under ROC curve (AUC) and Brier score. However, increasing the forest sizes to 10 000 or 100 000 trees does not give any further significant performance gains.

    Download full text (pdf)
    fulltext
  • 11.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    crepes: a Python Package for Generating Conformal Regressors and Predictive Systems2022In: Proceedings of the 11th Symposium on Conformal and Probabilistic Prediction with Applications, COPA 2022, ML Research Press , 2022, p. 24-41Conference paper (Refereed)
    Abstract [en]

    The recently released Python package crepes can be used to generate both conformal regressors, which transform point predictions into prediction intervals for specified levels of confidence, and conformal predictive systems, which transform the point predictions into cumulative distribution functions (conformal predictive distributions). The crepes package implements standard, normalized and Mondrian conformal regressors and predictive systems, and is completely model-agnostic, using only the residuals for the calibration instances, possibly together with difficulty estimates and Mondrian categories as input, when forming the conformal regressors and predictive systems. This allows the user to easily incorporate and evaluate novel difficulty estimates and ways of forming Mondrian categories, as well as combinations thereof. Examples from using the package are given, illustrating how to incorporate some standard options for difficulty estimation, forming Mondrian categories and the use of out-of-bag predictions for calibration, through helper functions defined in a separate module, called crepes.fillings. The relation to other software packages for conformal regression is also pointed out.

  • 12.
    Boström, Henrik
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Estimating class probabilities in random forests2007In: Proceedings - 6th International Conference on Machine Learning and Applications, ICMLA 2007, IEEE Computer Society, 2007, p. 211-216Conference paper (Refereed)
    Abstract [en]

    For both single probability estimation trees (PETs) and ensembles of such trees, commonly employed class probability estimates correct the observed relative class frequencies in each leaf to avoid anomalies caused by small sample sizes. The effect of such corrections in random forests of PETs is investigated, and the use of the relative class frequency is compared to using two corrected estimates, the Laplace estimate and the m-estimate. An experiment with 34 datasets from the UCI repository shows that estimating class probabilities using relative class frequency clearly outperforms both using the Laplace estimate and the m-estimate with respect to accuracy, area under the ROC curve (AUC) and Brier score. Hence, in contrast to what is commonly employed for PETs and ensembles of PETs, these results strongly suggest that a non-corrected probability estimate should be used in random forests of PETs. The experiment further shows that learning random forests of PETs using relative class frequency significantly outperforms learning random forests of classification trees (i.e., trees for which only an unweighted vote on the most probable class is counted) with respect to both accuracy and AUC, but that the latter is clearly ahead of the former with respect to Brier score.

  • 13.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Example-Based Explanations of Random Forest Predictions2024In: Advances in Intelligent Data Analysis XXII - 22nd International Symposium on Intelligent Data Analysis, IDA 2024, Proceedings, Springer Science and Business Media Deutschland GmbH , 2024, Vol. 14642, p. 185-196Conference paper (Refereed)
    Abstract [en]

    A random forest prediction can be computed by the scalar product of the labels of the training examples and a set of weights that are determined by the leafs of the forest into which the test object falls; each prediction can hence be explained exactly by the set of training examples for which the weights are non-zero. The number of examples used in such explanations is shown to vary with the dimensionality of the training set and hyperparameters of the random forest algorithm. This means that the number of examples involved in each prediction can to some extent be controlled by varying these parameters. However, for settings that lead to a required predictive performance, the number of examples involved in each prediction may be unreasonably large, preventing the user from grasping the explanations. In order to provide more useful explanations, a modified prediction procedure is proposed, which includes only the top-weighted examples. An investigation on regression and classification tasks shows that the number of examples used in each explanation can be substantially reduced while maintaining, or even improving, predictive performance compared to the standard prediction procedure.

  • 14.
    Boström, Henrik
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Feature vs. classifier fusion for predictive data mining - A case study in pesticide classification2007In: FUSION 2007 - 2007 10th International Conference on Information Fusion, Institute of Electrical and Electronics Engineers (IEEE), 2007, p. 1-7, article id 4408024Conference paper (Refereed)
    Abstract [en]

    Two strategies for fusing information from multiple sources when generating predictive models in the domain of pesticide classification are investigated: i) fusing different sets of features (molecular descriptors) before building a model and ii) fusing the classifiers built from the individual descriptor sets. An empirical investigation demonstrates that the choice of strategy can have a significant impact on the predictive performance. Furthermore, the experiment shows that the best strategy is dependent on the type of predictive model considered. When generating a decision tree for pesticide classification, a statistically significant difference in accuracy is observed in favor of combining predictions from the individual models compared to generating a single model from the fused set of molecular descriptors. On the other hand, when the model consists of an ensemble of decision trees, a statistically significant difference in accuracy is observed in favor of building the model from the fused set of descriptors compared to fusing ensemble models built from the individual sources.

  • 15.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Forests of probability estimation trees2012In: International journal of pattern recognition and artificial intelligence, ISSN 0218-0014, Vol. 26, no 2, article id 1251001Article in journal (Refereed)
    Abstract [en]

    Probability estimation trees (PETs) generalize classification trees in that they assign class probability distributions instead of class labels to examples that are to be classified. This property has been demonstrated to allow PETs to outperform classification trees with respect to ranking performance, as measured by the area under the ROC curve (AUC). It has further been shown that the use of probability correction improves the performance of PETs. This has lead to the use of probability correction also in forests of PETs. However, it was recently observed that probability correction may in fact deteriorate performance of forests of PETs. A more detailed study of the phenomenon is presented and the reasons behind this observation are analyzed. An empirical investigation is presented, comparing forests of classification trees to forests of both corrected and uncorrected PETS on 34 data sets from the UCI repository. The experiment shows that a small forest (10 trees) of probability corrected PETs gives a higher AUC than a similar-sized forest of classification trees, hence providing evidence in favor of using forests of probability corrected PETs. However, the picture changes when increasing the forest size, as the AUC is no longer improved by probability correction. For accuracy and squared error of predicted class probabilities (Brier score), probability correction even leads to a negative effect. An analysis of the mean squared error of the trees in the forests and their variance, shows that although probability correction results in trees that are more correct on average, the variance is reduced at the same time, leading to an overall loss of performance for larger forests. The main conclusions are that probability correction should only be employed in small forests of PETs, and that for larger forests, classification trees and PETs are equally good alternatives.

  • 16.
    Boström, Henrik
    KTH, Superseded Departments (pre-2005), Computer and Systems Sciences, DSV. Stockholms universitet, Institutionen för data- och systemvetenskap.
    Maximizing the Area under the ROC Curve using Incremental Reduced Error Pruning2005In: Proceedings of the ICML 2005 Workshop on ROC Analysis in Machine Learning, Bonn: AMC Press , 2005Conference paper (Refereed)
    Download full text (pdf)
    fulltext
  • 17.
    Boström, Henrik
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Maximizing the Area under the ROC Curve with Decision Lists and Rule Sets2007In: Proceedings of the 7th SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, 2007, p. 27-34Conference paper (Refereed)
    Abstract [en]

    Decision lists (or ordered rule sets) have two attractive properties compared to unordered rule sets: they require a simpler classi¯cation procedure and they allow for a more compact representation. However, it is an open question what effect these properties have on the area under the ROC curve (AUC). Two ways of forming decision lists are considered in this study: by generating a sequence of rules, with a default rule for one of the classes, and by imposing an order upon rules that have been generated for all classes. An empirical investigation shows that the latter method gives a significantly higher AUC than the former, demonstrating that the compactness obtained by using one of the classes as a default is indeed associated with a cost. Furthermore, by using all applicable rules rather than the first in an ordered set, an even further significant improvement in AUC is obtained, demonstrating that the simple classification procedure is also associated with a cost. The observed gains in AUC for unordered rule sets compared to decision lists can be explained by that learning rules for all classes as well as combining multiple rules allow for examples to be ranked according to a more fine-grained scale compared to when applying rules in a fixed order and providing a default rule for one of the classes.

  • 18.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Method for efficiently checking coverage of rules derived from a logical theory2003Patent (Other (popular science, discussion, etc.))
    Abstract [en]

    The method is used in a computer and includes the steps of providing a logical theory (12, 30) that has clauses. A rule (14) is generated that is a resolvent of clauses in the logical theory. An example (16) is retrieved. A proof tree (18, 40) is generated from the example (16) using the logical theory (12, 30). The proof tree (18, 40) is transformed into a database (20, 42) of a coverage check apparatus (28). The rule (14) is converted into a partial proof tree (60) that has nodes (62, 54, 66). The partial proof tree is transformed into a database query (22) of the coverage check apparatus (28). The query (22, 72) is executed to identify tuples in the database (20, 42) that correspond to the nodes of the partial proof tree.

  • 19.
    Boström, Henrik
    KTH, Superseded Departments (pre-2005), Computer and Systems Sciences, DSV. Stockholms universitet, Institutionen för data- och systemvetenskap.
    Pruning and Exclusion Criteria for Unordered Incremental Reduced Error Pruning2004In: Proceedings of the Workshop on Advances in Rule Learning at 15th European Conference on Machine Learning, 2004Conference paper (Refereed)
    Download full text (pdf)
    fulltext
  • 20.
    Boström, Henrik
    et al.
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Andler, Sten F.
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Brohede, Marcus
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Johansson, Ronnie
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Karlsson, Alexander
    Högskolan i Skövde, Institutionen för kommunikation och information.
    van Laere, Joeri
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Niklasson, Lars
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Nilsson, Marie
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Persson, Anne
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Ziemke, Tom
    Högskolan i Skövde, Institutionen för kommunikation och information.
    On the Definition of Information Fusion as a Field of Research2007Report (Other academic)
    Abstract [en]

    A more precise definition of the field of information fusion can be of benefit to researchers within the field, who may use uch a definition when motivating their own work and evaluating the contribution of others. Moreover, it can enable researchers and practitioners outside the field to more easily relate their own work to the field and more easily understand the scope of the techniques and methods developed in the field. Previous definitions of information fusion are reviewed from that perspective, including definitions of data and sensor fusion, and their appropriateness as definitions for the entire research field are discussed. Based on strengths and weaknesses of existing definitions, a novel definition is proposed, which is argued to effectively fulfill the requirements that can be put on a definition of information fusion as a field of research.

    Download full text (pdf)
    FULLTEXT01
  • 21.
    Boström, Henrik
    et al.
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Dalianis, Hercules
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    De-identifying health records by means of active learning2012In: ICML 2012 workshop on Machine Learning for Clinical Data Analysis 2012, 2012Conference paper (Refereed)
    Abstract [en]

    An experiment on classifying words in Swedish health records as belonging to one of eight protected health information (PHI) classes, or to the non-PHI class, by means of active learning has been conducted, in which three selection strategies were evaluated in conjunction with random forests; the commonly employed approach of choosing the most uncertain examples, choosing randomly, and choosing the most certain examples. Surprisingly, random selection outperformed choosing the most uncertain examples with respect to ten considered performance metrics. Moreover, choosing the most certain examples outperformed random selection with respect to nine out of ten metrics.

  • 22.
    Boström, Henrik
    et al.
    KTH, School of Information and Communication Technology (ICT). Stockholms universitet, Institutionen för data- och systemvetenskap.
    Gurung, Ram Bahadur
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Asker, Lars
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Karlsson, Isak
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Lindgren, Tony
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Papapetrou, Panagiotis
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Conformal prediction using random survival forests2017In: Proceedings - 16th IEEE International Conference on Machine Learning and Applications, ICMLA 2017, Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 812-817Conference paper (Refereed)
    Abstract [en]

    Random survival forests constitute a robust approach to survival modeling, i.e., predicting the probability that an event will occur before or on a given point in time. Similar to most standard predictive models, no guarantee for the prediction error is provided for this model, which instead typically is empirically evaluated. Conformal prediction is a rather recent framework, which allows the error of a model to be determined by a user specified confidence level, something which is achieved by considering set rather than point predictions. The framework, which has been applied to some of the most popular classification and regression techniques, is here for the first time applied to survival modeling, through random survival forests. An empirical investigation is presented where the technique is evaluated on datasets from two real-world applications; predicting component failure in trucks using operational data and predicting survival and treatment of heart failure patients from administrative healthcare data. The experimental results show that the error levels indeed are very close to the provided confidence levels, as guaranteed by the conformal prediction framework, and that the error for predicting each outcome, i.e., event or no-event, can be controlled separately. The latter may, however, lead to less informative predictions, i.e., larger prediction sets, in case the class distribution is heavily imbalanced.

  • 23.
    Boström, Henrik
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Höglund, P.
    Junker, S. -O
    Öberg, A.-S.
    Sparr, M.
    Explaining multivariate time series forecasts: An application to predicting the Swedish GDP?2020In: CEUR Workshop Proceedings, CEUR-WS , 2020Conference paper (Refereed)
    Abstract [en]

    Various approaches to explaining predictions of black box models have been proposed, including model-agnostic techniques that measure feature importance (or effect) by presenting modified test instances to the underlying black-box model. These modifications rely on choosing feature values from the complete range of observed values. However, when applying machine learning algorithms to the task of forecasting from multivariate time-series, it is suggested that the temporal aspect should be taken into account when analyzing the feature effect. A modification of individual conditional expectation (ICE) plots is proposed, called ICE-T plots, which displays the prediction change for temporally ordered feature values. Results are presented from a case study on predicting the Swedish gross domestic product (GDP) based on a comprehensive set of indicator and prognostic variables. The effect of calculating feature effect with and without temporal constraints is demonstrated, as well as the impact of transformations and forecast horizons on what features are found to have a large effect, and the use of ICE-T plots as a complement to ICE plots. 

  • 24.
    Boström, Henrik
    et al.
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Johansson, Ronnie
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Karlsson, Alexander
    Högskolan i Skövde, Institutionen för kommunikation och information.
    On Evidential Combination Rules for Ensemble Classifiers2008In: Proceedings of the 11th International Conference on Information Fusion, IEEE, 2008, p. 553-560, article id 4632259Conference paper (Refereed)
    Abstract [en]

    Ensemble classifiers are known to generally perform better than each individual classifier of which they consist. One approach to classifier fusion is to apply Shafer’s theory of evidence. While most approaches have adopted Dempster’s rule of combination, a multitude of combination rules have been proposed. A number of combination rules as well as two voting rules are compared when used in conjunction with a specific kind of ensemble classifier, known as random forests, w.r.t. accuracy, area under ROC curve and Brier score on 27 datasets. The empirical evaluation shows that the choice of combination rule can have a significant impact on the performance for a single dataset, but in general the evidential combination rules do not perform better than the voting rules for this particular ensemble design. Furthermore, among the evidential rules, the associative ones appear to have better performance than the non-associative.

  • 25.
    Boström, Henrik
    et al.
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Linusson, Henrik
    Löfström, Tuve
    Johansson, Ulf
    Accelerating difficulty estimation for conformal regression forests2017In: Annals of Mathematics and Artificial Intelligence, ISSN 1012-2443, E-ISSN 1573-7470, Vol. 81, no 1-2, p. 125-144Article in journal (Refereed)
    Abstract [en]

    The conformal prediction framework allows for specifying the probability of making incorrect predictions by a user-provided confidence level. In addition to a learning algorithm, the framework requires a real-valued function, called nonconformity measure, to be specified. The nonconformity measure does not affect the error rate, but the resulting efficiency, i.e., the size of output prediction regions, may vary substantially. A recent large-scale empirical evaluation of conformal regression approaches showed that using random forests as the learning algorithm together with a nonconformity measure based on out-of-bag errors normalized using a nearest-neighbor-based difficulty estimate, resulted in state-of-the-art performance with respect to efficiency. However, the nearest-neighbor procedure incurs a significant computational cost. In this study, a more straightforward nonconformity measure is investigated, where the difficulty estimate employed for normalization is based on the variance of the predictions made by the trees in a forest. A large-scale empirical evaluation is presented, showing that both the nearest-neighbor-based and the variance-based measures significantly outperform a standard (non-normalized) nonconformity measure, while no significant difference in efficiency between the two normalized approaches is observed. The evaluation moreover shows that the computational cost of the variance-based measure is several orders of magnitude lower than when employing the nearest-neighbor-based nonconformity measure. The use of out-of-bag instances for calibration does, however, result in nonconformity scores that are distributed differently from those obtained from test instances, questioning the validity of the approach. An adjustment of the variance-based measure is presented, which is shown to be valid and also to have a significant positive effect on the efficiency. For conformal regression forests, the variance-based nonconformity measure is hence a computationally efficient and theoretically well-founded alternative to the nearest-neighbor procedure.

  • 26.
    Boström, Henrik
    et al.
    Department of Computer and Systems Sciences, Stockholm University, Kista, Sweden.
    Linusson, Henrik
    Department of Information Technology, University of Borås, Borås, Sweden.
    Löfström, Tuve
    Department of Information Technology, University of Borås, Borås, Sweden.
    Johansson, Ulf
    Högskolan i Jönköping, JTH, Datateknik och informatik.
    Evaluation of a variance-based nonconformity measure for regression forests2016In: 5th International Symposium on Conformal and Probabilistic Prediction with Applications, COPA 2016, Springer, 2016, Vol. 9653, p. 75-89Conference paper (Refereed)
    Abstract [en]

    In a previous large-scale empirical evaluation of conformal regression approaches, random forests using out-of-bag instances for calibration together with a k-nearest neighbor-based nonconformity measure, was shown to obtain state-of-the-art performance with respect to efficiency, i.e., average size of prediction regions. However, the use of the nearest-neighbor procedure not only requires that all training data have to be retained in conjunction with the underlying model, but also that a significant computational overhead is incurred, during both training and testing. In this study, a more straightforward nonconformity measure is investigated, where the difficulty estimate employed for normalization is based on the variance of the predictions made by the trees in a forest. A large-scale empirical evaluation is presented, showing that both the nearest-neighbor-based and the variance-based measures significantly outperform a standard (non-normalized) nonconformity measure, while no significant difference in efficiency between the two normalized approaches is observed. Moreover, the evaluation shows that state-of-theart performance is achieved by the variance-based measure at a computational cost that is several orders of magnitude lower than when employing the nearest-neighbor-based nonconformity measure. 

  • 27.
    Boström, Henrik
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Linusson, Henrik
    Ekkono Solutions AB, Sweden.
    Vesterberg, Anders
    Scania CV AB, Sweden.
    Mondrian Predictive Systems for Censored Data2023In: Proceedings of the 12th Symposium on Conformal and Probabilistic Prediction with Applications, COPA 2023, ML Research Press , 2023, p. 399-412Conference paper (Refereed)
    Abstract [en]

    Conformal predictive systems output predictions in the form of well-calibrated cumulative distribution functions (conformal predictive distributions). In this paper, we apply conformal predictive systems to the problem of time-to-event prediction, where the conformal predictive distribution for a test object may be used to obtain the expected time until an event occurs, as well as p-values for an event to take place earlier (or later) than some specified time points. Specifically, we target right-censored time-to-event prediction tasks, i.e., situations in which the true time-to-event for a particular training example may be unknown due to observation of the example ending before any event occurs. By leveraging the Kaplan-Meier estimator, we develop a procedure for constructing Mondrian predictive systems that are able to produce well-calibrated cumulative distribution functions for right-censored time-to-event prediction tasks. We show that the proposed procedure is guaranteed to produce conservatively valid predictive distributions, and provide empirical support using simulated censoring on benchmark data. The proposed approach is contrasted with established techniques for survival analysis, including random survival forests and censored quantile regression forests, using both synthetic and non-synthetic censoring.

  • 28.
    Boström, Henrik
    et al.
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Norinder, Ulf
    Utilizing Information on Uncertainty for In Silico Modeling using Random Forests2009In: Proceedings of the 3rd Skövde Workshop on Information Fusion Topics (SWIFT 2009), University of Skövde , 2009, p. 59-62Conference paper (Refereed)
    Abstract [en]

    Information on uncertainty of measurements or estimates of molecular properties are rarely utilized by in silico predictive models. In this study, different approaches to handling uncertain numerical features are explored when using the stateof- the-art random forest algorithm for generating predictive models. Two main approaches are considered: i) sampling from probability distributions prior to tree generation, which does not require any change to the underlying tree learning algorithm, and ii) adjusting the algorithm to allow for handling probability distributions, similar to how missing values typically are handled, i.e., partitions may include fractions of examples. An experiment with six datasets concerning the prediction of various chemical properties is presented, where 95% confidence intervals are included for one of the 92 numerical features. In total, five approaches to handling uncertain numeric features are compared: ignoring the uncertainty, sampling from distributions that are assumed to be uniform and normal respectively, and adjusting tree learning to handle probability distributions that are assumed to be uniform and normal respectively. The experimental results show that all approaches that utilize information on uncertainty indeed outperform the single approach ignoring this, both with respect to accuracy and area under ROC curve. A decomposition of the squared error of the constituent classification trees shows that the highest variance is obtained by ignoring the information on uncertainty, but that this also results in the highest mean squared error of the constituent trees.

  • 29. Carlsson, Lars
    et al.
    Ahlberg, Ernst
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Johansson, Ulf
    Linusson, Henrik
    Modifications to p-Values of Conformal Predictors2015In: Statistical Learning and Data Sciences: Third International Symposium, SLDS 2015, Egham, UK, April 20-23, 2015, Proceedings / [ed] Alexander Gammerman, Vladimir Vovk, Harris Papadopoulos, Springer, 2015, Vol. 9047, p. 251-259Conference paper (Refereed)
    Abstract [en]

    The original definition of a p-value in a conformal predictor can sometimes lead to too conservative prediction regions when the number of training or calibration examples is small. The situation can be improved by using a modification to define an approximate p-value. Two modified p-values are presented that converges to the original p-value as the number of training or calibration examples goes to infinity.

    Numerical experiments empirically support the use of a p-value we call the interpolated p-value for conformal prediction. The interpolated p-value seems to be producing prediction sets that have an error rate which corresponds well to the prescribed significance level.

  • 30.
    Dalianis, Hercules
    et al.
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Releasing a Swedish Clinical Corpus after Removing all Words – De-identification Experiments with Conditional Random Fields and Random Forests2012In: Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012), 2012, p. 45-48Conference paper (Refereed)
    Abstract [en]

    Patient records contain valuable information in the form of both structured data and free text; however this information is sensitive since it can reveal the identity of patients. In order to allow new methods and techniques to be developed and evaluated on real world clinical data without revealing such sensitive information, researchers could be given access to de-identified records without protected health information (PHI), such as names, telephone numbers, and so on. One approach to minimizing the risk of revealing PHI when releasing text corpora from such records is to include only features of the words instead of the words themselves. Such features may include parts of speech, word length, and so on from which the sensitive information cannot be derived. In order to investigate what performance losses can be expected when replacing specific words with features, an experiment with two state-of-the-art machine learning methods, conditional random fields and random forests, is presented, comparing their ability to support de-identification, using the Stockholm EPR PHI corpus as a benchmark test. The results indicate severe performance losses when the actual words are removed, leading to the conclusion that the chosen features are not sufficient for the suggested approach to be viable.

  • 31.
    Deegalla, Sampath
    et al.
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Classification of Microarrays with kNN: Comparison of Dimensionality Reduction Methods2007In: Intelligent Data Engineering and Automated Learning - IDEAL 2007 / [ed] Hujun Yin, Peter Tino, Emilio Corchado, Will Byrne, Xin Yao, Berlin, Heidelberg: Springer Verlag , 2007, p. 800-809Conference paper (Refereed)
    Abstract [en]

    Dimensionality reduction can often improve the performance of the k-nearest neighbor classifier (kNN) for high-dimensional data sets, such as microarrays. The effect of the choice of dimensionality reduction method on the predictive performance of kNN for classifying microarray data is an open issue, and four common dimensionality reduction methods, Principal Component Analysis (PCA), Random Projection (RP), Partial Least Squares (PLS) and Information Gain(IG), are compared on eight microarray data sets. It is observed that all dimensionality reduction methods result in more accurate classifiers than what is obtained from using the raw attributes. Furthermore, it is observed that both PCA and PLS reach their best accuracies with fewer components than the other two methods, and that RP needs far more components than the others to outperform kNN on the non-reduced dataset. None of the dimensionality reduction methods can be concluded to generally outperform the others, although PLS is shown to be superior on all four binary classification tasks, but the main conclusion from the study is that the choice of dimensionality reduction method can be of major importance when classifying microarrays using kNN.

  • 32.
    Deegalla, Sampath
    et al.
    Dept. of Computer and Systems Sciences, Stockholm University, Sweden.
    Boström, Henrik
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Fusion of Dimensionality Reduction Methods: a Case Study in Microarray Classification2009In: Proceedings of the 12th International Conference on Information Fusion, ISIF , 2009, p. 460-465, article id 5203771Conference paper (Refereed)
    Abstract [en]

    Dimensionality reduction has been demonstrated to improve the performance of the k-nearest neighbor (kNN) classifier for high-dimensional data sets, such as microarrays. However, the effectiveness of different dimensionality reduction methods varies, and it has been shown that no single method constantly outperforms the others. In contrast to using a single method, two approaches to fusing the result of applying dimensionality reduction methods are investigated: feature fusion and classifier fusion. It is shown that by fusing the output of multiple dimensionality reduction techniques, either by fusing the reduced features or by fusing the output of the resulting classifiers, both higher accuracy and higher robustness towards the choice of number of dimensions is obtained.

  • 33.
    Deegalla, Sampath
    et al.
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Improving Fusion of Dimensionality Reduction Methods for Nearest Neighbor Classification2009In: 8th International Conference on Machine Learning and Applications, ICMLA 2009, IEEE Computer Society, 2009, p. 771-775Conference paper (Refereed)
    Abstract [en]

    In previous studies, performance improvement of nearest neighbor classification of high dimensional data, such as microarrays, has been investigated using dimensionality reduction. It has been demonstrated that the fusion of dimensionality reduction methods, either by fusing classifiers obtained from each set of reduced features, or by fusing all reduced features are better than than using any single dimensionality reduction method. However, none of the fusion methods consistently outperform the use of a single dimensionality reduction method. Therefore, a new way of fusing features and classifiers is proposed, which is based on searching for the optimal number of dimensions for each considered dimensionality reduction method. An empirical evaluation on microarray classification is presented, comparing classifier and feature fusion with and without the proposed method, in conjunction with three dimensionality reduction methods; Principal Component Analysis (PCA), Partial Least Squares (PLS) and Information Gain (IG). The new classifier fusion method outperforms the previous in 4 out of 8 cases, and is on par with the best single dimensionality reduction method. The novel feature fusion method is however outperformed by the previous method, which selects the same number of features from each dimensionality reduction method. Hence, it is concluded that the idea of optimizing the number of features separately for each dimensionality reduction method can only be recommended for classifier fusion.

  • 34.
    Deegalla, Sampath
    et al.
    KTH, School of Information and Communication Technology (ICT), Computer and Systems Sciences, DSV.
    Boström, Henrik
    KTH, School of Information and Communication Technology (ICT), Computer and Systems Sciences, DSV.
    Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification2006In: Publications of the Finnish Artificial Intelligence Society, 2006, p. 23-30Conference paper (Refereed)
    Abstract [en]

    The computational cost of using nearest neighbor classification often prevents the method from being applied in practice when dealing with high-dimensional data, such as images and micro arrays. One possible solution to this problem is to reduce the dimensionality of the data, ideally without loosing predictive performance. Two different dimensionality reduction methods, principal component analysis (PCA) and random projection (RP), are compared w.r.t. the performance of the resulting nearest neighbor classifier on five image data sets and two micro array data sets. The experimental results show that PCA results in higher accuracy than RP for all the data sets used in this study. However, it is also observed that RP generally outperforms PCA for higher numbers of dimensions. This leads to the conclusion that PCA is more suitable in time-critical cases (i.e., when distance calculations involving only a few dimensions can be afforded), while RP can be more suitable when less severe dimensionality reduction is required. In 6 respectively 4 cases out of 7, the use of PCA and RP even outperform using the non-reduced feature set, hence not only resulting in more efficient, but also more effective, nearest neighbor classification.

  • 35.
    Deegalla, Sampath
    et al.
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Walgama, Keerthi
    Choice of Dimensionality Reduction Methods for Feature and Classifier Fusion with Nearest Neighbor Classifiers2012In: 15th International Conference on Information Fusion, IEEE Computer Society, 2012, p. 875-881, article id 6289894Conference paper (Refereed)
    Abstract [en]

    Often high dimensional data cause problems for currently used learning algorithms in terms of efficiency and effectiveness. One solution for this problem is to apply dimensionality reduction by which the original feature set could be reduced to a small number of features while gaining improved accuracy and/or efficiency of the learning algorithm. We have investigated multiple dimensionality reduction methods for nearest neighbor classification in high dimensions. In previous studies, we have demonstrated that fusion of different outputs of dimensionality reduction methods, either by combining classifiers built on reduced features, or by combining reduced features and then applying the classifier, may yield higher accuracies than when using individual reduction methods. However, none of the previous studies have investigated what dimensionality reduction methods to choose for fusion, when outputs of multiple dimensionality reduction methods are available. Therefore, we have empirically investigated different combinations of the output of four dimensionality reduction methods on 18 medicinal chemistry datasets. The empirical investigation demonstrates that fusion of nearest neighbor classifiers obtained from multiple reduction methods in all cases outperforms the use of individual dimensionality reduction methods, while fusion of different feature subsets is quite sensitive to the choice of dimensionality reduction methods.

  • 36.
    Deegalla, Sampath
    et al.
    Stockholm Univ, Dept Comp & Syst Sci, Post Box 7003, SE-16440 Kista, Sweden.;Univ Peradeniya, Fac Engn, Dept Comp Engn, Peradeniya 20400, Sri Lanka..
    Walgama, Keerthi
    Univ Peradeniya, Fac Engn, Dept Engn Math, Peradeniya 20400, Sri Lanka..
    Papapetrou, Panagiotis
    Stockholm Univ, Dept Comp & Syst Sci, Post Box 7003, SE-16440 Kista, Sweden..
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Random subspace and random projection nearest neighbor ensembles for high dimensional data2022In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 191, article id 116078Article in journal (Refereed)
    Abstract [en]

    The random subspace and the random projection methods are investigated and compared as techniques for forming ensembles of nearest neighbor classifiers in high dimensional feature spaces. The two methods have been empirically evaluated on three types of high-dimensional datasets: microarrays, chemoinformatics, and images. Experimental results on 34 datasets show that both the random subspace and the random projection method lead to improvements in predictive performance compared to using the standard nearest neighbor classifier, while the best method to use depends on the type of data considered; for the microarray and chemoinformatics datasets, random projection outperforms the random subspace method, while the opposite holds for the image datasets. An analysis using data complexity measures, such as attribute to instance ratio and Fisher's discriminant ratio, provide some more detailed indications on what relative performance can be expected for specific datasets. The results also indicate that the resulting ensembles may be competitive with state-of-the-art ensemble classifiers; the nearest neighbor ensembles using random projection perform on par with random forests for the microarray and chemoinformatics datasets.

  • 37. Dudas, C.
    et al.
    Ng, A.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Information Extraction in Manufacturing using Data Mining Techniques2008In: Proceedings of Swedish Production Symposium, 2008Conference paper (Refereed)
    Download full text (pdf)
    fulltext
  • 38.
    Dudas, Catarina
    et al.
    Högskolan i Skövde, Forskningscentrum för Virtuella system.
    Boström, Henrik
    Högskolan i Skövde, Forskningscentrum för Informationsteknologi.
    Using Uncertain Chemical and Thermal Data to Predict Product Quality in a Casting Process2009In: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data / [ed] Jian Pei; Lise Getoor; Ander De Keijzer, ACM Press, 2009, p. 57-61Conference paper (Refereed)
    Abstract [en]

    Process and casting data from different sources have been collected and merged for the purpose of predicting, and determining what factors affect, the quality of cast products in a foundry. One problem is that the measurements cannot be directly aligned, since they are collected at different points in time, and instead they have to be approximated for specific time points, hence introducing uncertainty. An approach for addressing this problem is investigated, where uncertain numeric features values are represented by intervals and random forests are extended to handle such intervals. A preliminary experiment shows that the suggested way of forming the intervals, together with the extension of random forests, results in higher predictive performance compared to using single (expected) values for the uncertain features together with standard random forests.

  • 39.
    Dudas, Catarina
    et al.
    University of Skövde.
    Ng, Amos
    University of Skövde.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Information extraction from solution set of simulation-based multi-objective optimization using data mining2009In: Proceedings of Industrial Simulation Conference (ISC) 2009, Eurosis , 2009, p. 65-69Conference paper (Refereed)
    Abstract [en]

    In this work, we investigate ways of extracting information from simulations, in particular from simulation-based multi-objective optimisation, in order to acquire information that can support human decision makers that aim for optimising manufacturing processes. Applying data mining for analyzing data generated using simulation is a fairly unexplored area. With the observation that the obtained solutions from a simulation-based multi-objective optimisation are all optimal (or close to the optimal Pareto front) so that they are bound to follow and exhibit certain relationships among variables vis-à-vis objectives, it is argued that using data mining to discover these relationships could be a promising procedure. The aim of this paper is to provide the empirical results from two simulation case studies to support such a hypothesis.

  • 40.
    Dudas, Catarina
    et al.
    Högskolan i Skövde, Institutionen för teknik och samhälle.
    Ng, Amos
    Högskolan i Skövde, Institutionen för teknik och samhälle.
    Boström, Henrik
    Högskolan i Skövde, Institutionen för kommunikation och information.
    Knowledge Extraction in Manufacturing using Data Mining Techniques2008In: Proceedings of the Swedish Production Symposium 2008, Stockholm, Sweden, November 18-20, 2008, 2008, p. 8 sidor-Conference paper (Refereed)
    Abstract [en]

    Nowadays many production companies collect and store production and process data in large databases. Unfortunately the data is rarely used in the most value generating way, i.e.,  finding  patterns  of  inconsistencies  and  relationships  between  process  settings  and quality  outcome.  This  paper  addresses  the  benefits  of  using  data  mining  techniques  in manufacturing  applications.  Two  different  applications  are  being  laid  out  but  the  used technique  and  software  is  the  same  in  both  cases.  The  first  case  deals  with  how  data mining  can  be  used  to  discover  the  affect  of  process  timing  and  settings  on  the  quality outcome in the casting industry. The result of a multi objective optimization of a camshaft process  is  being  used  as  the  second  case.  This  study  focuses  on  finding  the  most appropriate dispatching rule settings in the buffers on the line.  The  use  of  data  mining  techniques  in  these  two  cases  generated  previously  unknown knowledge. For example, in order to maximize throughput in the camshaft production, let the dispatching rule for the most severe bottleneck be of type Shortest Processing Time (SPT) and for the second bottleneck use any but Most Work Remaining (MWKR).

  • 41.
    Dudas, Catarina
    et al.
    Högskolan i Skövde, Forskningscentrum för Virtuella system.
    Ng, Amos H.C.
    Högskolan i Skövde, Institutionen för ingenjörsvetenskap.
    Pehrsson, Leif
    Högskolan i Skövde, Institutionen för ingenjörsvetenskap.
    Boström, Henrik
    Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden.
    Integration of data mining and multi-objective optimisation for decision support in production system development2014In: International journal of computer integrated manufacturing (Print), ISSN 0951-192X, E-ISSN 1362-3052, Vol. 27, no 9, p. 824-839Article in journal (Refereed)
    Abstract [en]

    Multi-objective optimisation (MOO) is a powerful approach for generating a set of optimal trade-off (Pareto) design alternatives that the decision-maker can evaluate and then choose the most-suitable configuration, based on some high-level strategic information. Nevertheless, in practice, choosing among a large number of solutions on the Pareto front is often a daunting task, if proper analysis and visualisation techniques are not applied. Recent research advancements have shown the advantages of using data mining techniques to automate the post-optimality analysis of Pareto-optimal solutions for engineering design problems. Nonetheless, it is argued that the existing approaches are inadequate for generating high-quality results, when the set of the Pareto solutions is relatively small and the solutions close to the Pareto front have almost the same attributes as the Pareto-optimal solutions, of which both are commonly found in many real-world system problems. The aim of this paper is therefore to propose a distance-based data mining approach for the solution sets generated from simulation-based optimisation, in order to address these issues. Such an integrated data mining and MOO procedure is illustrated with the results of an industrial cost optimisation case study. Particular emphasis is paid to showing how the proposed procedure can be used to assist decision-makers in analysing and visualising the attributes of the design alternatives in different regions of the objective space, so that informed decisions can be made in production systems development.

  • 42. Dudas, Catarina
    et al.
    Ng, Amosh. C.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Post-analysis of multi-objective optimization solutions using decision trees2015In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 19, no 2, p. 259-278Article in journal (Refereed)
    Abstract [en]

    Evolutionary algorithms are often applied to solve multi-objective optimization problems. Such algorithms effectively generate solutions of wide spread, and have good convergence properties. However, they do not provide any characteristics of the found optimal solutions, something which may be very valuable to decision makers. By performing a post-analysis of the solution set from multi-objective optimization, relationships between the input space and the objective space can be identified. In this study, decision trees are used for this purpose. It is demonstrated that they may effectively capture important characteristics of the solution sets produced by multi-objective optimization methods. It is furthermore shown that the discovered relationships may be used for improving the search for additional solutions. Two multi-objective problems are considered in this paper; a well-studied benchmark function problem with on a beforehand known optimal Pareto front, which is used for verification purposes, and a multi-objective optimization problem of a real-world production system. The results show that useful relationships may be identified by employing decision tree analysis of the solution sets from multi-objective optimizations.

  • 43.
    Ennadir, Sofiane
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Abbahaddou, Yassine
    DaSciM, LIX, Ecole Polytechnique, Institut Polytechnique de Paris, France.
    Lutzeyer, Johannes F.
    DaSciM, LIX, Ecole Polytechnique, Institut Polytechnique de Paris, France.
    Vazirgiannis, Michalis
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. DaSciM, LIX, Ecole Polytechnique, Institut Polytechnique de Paris, France.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    A Simple and Yet Fairly Effective Defense for Graph Neural Networks2024In: AAAI Technical Track on Safe, Robust and Responsible AI Track, Association for the Advancement of Artificial Intelligence (AAAI) , 2024, Vol. 38, p. 21063-21071, article id 19Conference paper (Refereed)
    Abstract [en]

    Graph Neural Networks (GNNs) have emerged as the dominant approach for machine learning on graph-structured data. However, concerns have arisen regarding the vulnerability of GNNs to small adversarial perturbations. Existing defense methods against such perturbations suffer from high time complexity and can negatively impact the model's performance on clean graphs. To address these challenges, this paper introduces NoisyGNNs, a novel defense method that incorporates noise into the underlying model's architecture. We establish a theoretical connection between noise injection and the enhancement of GNN robustness, highlighting the effectiveness of our approach. We further conduct extensive empirical evaluations on the node classification task to validate our theoretical findings, focusing on two popular GNNs: the GCN and GIN. The results demonstrate that NoisyGNN achieves superior or comparable defense performance to existing methods while minimizing added time complexity. The NoisyGNN approach is model-agnostic, allowing it to be integrated with different GNN architectures. Successful combinations of our NoisyGNN approach with existing defense techniques demonstrate even further improved adversarial defense results. Our code is publicly available at: https://github.com/Sennadir/NoisyGNN.

  • 44.
    Ennadir, Sofiane
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Alkhatib, Amr
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Vazirgiannis, Michalis
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Conformalized Adversarial Attack Detection for Graph Neural Networks2023In: Proceedings of the 12th Symposium on Conformal and Probabilistic Prediction with Applications, COPA 2023, ML Research Press , 2023, p. 311-323Conference paper (Refereed)
    Abstract [en]

    Graph Neural Networks (GNNs) have achieved remarkable performance on diverse graph representation learning tasks. However, recent studies have unveiled their susceptibility to adversarial attacks, leading to the development of various defense techniques to enhance their robustness. In this work, instead of improving the robustness, we propose a framework to detect adversarial attacks and provide an adversarial certainty score in the prediction. Our framework evaluates whether an input graph significantly deviates from the original data and provides a well-calibrated p-value based on this score through the conformal paradigm, therby controlling the false alarm rate. We demonstrate the effectiveness of our approach on various benchmark datasets. Although we focus on graph classification, the proposed framework can be readily adapted for other graph-related tasks, such as node classification.

  • 45.
    Ennadir, Sofiane
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Alkhatib, Amr
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Nikolentzos, Giannis
    LIX, Ecole Polytechnique, Paris, France.
    Vazirgiannis, Michalis
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. LIX, Ecole Polytechnique, Paris, France.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    UnboundAttack: Generating Unbounded Adversarial Attacks to Graph Neural Networks2024In: Complex Networks and Their Applications XII - Proceedings of The 12th International Conference on Complex Networks and their Applications: COMPLEX NETWORKS 2023 Volume 1, Springer Nature , 2024, Vol. 1141, p. 100-111Conference paper (Refereed)
    Abstract [en]

    Graph Neural Networks (GNNs) have demonstrated state-of-the-art performance in various graph representation learning tasks. Recently, studies revealed their vulnerability to adversarial attacks. While the available attack strategies are based on applying perturbations on existing graphs within a specific budget, proposed defense mechanisms successfully guard against this type of attack. This paper proposes a new perspective founded on unrestricted adversarial examples. We propose to produce adversarial attacks by generating completely new data points instead of perturbing existing ones. We introduce a framework, so-called UnboundAttack, leveraging the advancements in graph generation to produce graphs preserving the semantics of the available training data while misleading the targeted classifier. Importantly, our method does not assume any knowledge about the underlying architecture. Finally, we validate the effectiveness of our proposed method in a realistic setting related to molecular graphs.

  • 46.
    Gammerman, Alexander
    et al.
    Royal Holloway Univ London, Egham, Surrey, England..
    Vovk, Vladimir
    Royal Holloway Univ London, Egham, Surrey, England..
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Carlsson, Lars
    Stena Line AB, Gothenburg, Sweden..
    Conformal and probabilistic prediction with applications: editorial2019In: Machine Learning, ISSN 0885-6125, E-ISSN 1573-0565, Vol. 108, no 3, p. 379-380Article in journal (Other academic)
  • 47.
    Gauraha, Niharika
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Investigating the Contribution of Privileged Information in Knowledge Transfer LUPI by Explainable Machine Learning2023In: Proceedings of the 12th Symposium on Conformal and Probabilistic Prediction with Applications, COPA 2023, ML Research Press , 2023, p. 470-484Conference paper (Refereed)
    Abstract [en]

    Learning Under Privileged Information (LUPI) is a framework that exploits information that is available during training only, i.e., the privileged information (PI), to improve the classification of objects for which this information is not available. Knowledge transfer LUPI (KT-LUPI) extends the framework by inferring PI for the test objects through separate predictive models. Although the effectiveness of the framework has been thoroughly demonstrated, current investigations have provided limited insights only regarding what parts of the transferred PI contribute to the improved performance. A better understanding of this could not only lead to computational savings but potentially also to novel strategies for exploiting PI. We approach the problem by exploring the use of explainable machine learning through the state-of-the-art technique SHAP, to analyze the contribution of the transferred privileged information. We present results from experiments with five classification and three regression datasets, in which we compare the Shapley values of the PI computed in two different settings; one where the PI is assumed to be available during both training and testing, hence representing an ideal scenario, and a second setting, in which the PI is available during training only but is transferred to test objects, through KT-LUPI. The results indicate that explainable machine learning indeed has the potential as a tool to gain insights regarding the effectiveness of KT-LUPI.

  • 48. Gurung, R. B.
    et al.
    Lindgren, T.
    Boström, H.
    KTH, School of Information and Communication Technology (ICT).
    Learning random forest from histogram data using split specific axis rotation2018In: International Journal of Machine Learning and Computing, ISSN 2010-3700, Vol. 8, no 1, p. 74-79Article in journal (Refereed)
    Abstract [en]

    Machine learning algorithms for data containing histogram variables have not been explored to any major extent. In this paper, an adapted version of the random forest algorithm is proposed to handle variables of this type, assuming identical structure of the histograms across observations, i.e., the histograms for a variable all use the same number and width of the bins. The standard approach of representing bins as separate variables, may lead to that the learning algorithm overlooks the underlying dependencies. In contrast, the proposed algorithm handles each histogram as a unit. When performing split evaluation of a histogram variable during tree growth, a sliding window of fixed size is employed by the proposed algorithm to constrain the sets of bins that are considered together. A small number of all possible set of bins are randomly selected and principal component analysis (PCA) is applied locally on all examples in a node. Split evaluation is then performed on each principal component. Results from applying the algorithm to both synthetic and real world data are presented, showing that the proposed algorithm outperforms the standard approach of using random forests together with bins represented as separate variables, with respect to both AUC and accuracy. In addition to introducing the new algorithm, we elaborate on how real world data for predicting NOx sensor failure in heavy duty trucks was prepared, demonstrating that predictive performance can be further improved by adding variables that represent changes of the histograms over time. 

  • 49.
    Gurung, Ram B.
    et al.
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Lindgren, Tony
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Learning Decision Trees from Histogram Data2015In: Proceedings of the 2015 International Conference on Data Mining: DMIN 2015 / [ed] Robert Stahlbock, Gary M. Weiss, AAAI Press, 2015, p. 139-145Conference paper (Refereed)
    Abstract [en]

    When applying learning algorithms to histogram data, bins of such variables are normally treated as separate independent variables. However, this may lead to a loss of information as the underlying dependencies may not be fully exploited. In this paper, we adapt the standard decision tree learning algorithm to handle histogram data by proposing a novel method for partitioning examples using binned variables. Results from employing the algorithm to both synthetic and real-world data sets demonstrate that exploiting dependencies in histogram data may have positive effects on both predictive performance and model size, as measured by number of nodes in the decision tree. These gains are however associated with an increased computational cost and more complex split conditions. To address the former issue, an approximate method is proposed, which speeds up the learning process substantially while retaining the predictive performance.

  • 50.
    Gurung, Ram B.
    et al.
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Lindgren, Tony
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Boström, Henrik
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Learning Decision Trees from Histogram Data Using Multiple Subsets of Bins2016In: Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Research Society Conference / [ed] Zdravko Markov, Ingrid Russell, AAAI Press , 2016, p. 430-435Conference paper (Refereed)
    Abstract [en]

    The standard approach of learning decision trees from histogram data is to treat the bins as independent variables. However, as the underlying dependencies among the bins might not be completely exploited by this approach, an algorithm has been proposed for learning decision trees from histogram data by considering all bins simultaneously while partitioning examples at each node of the tree. Although the algorithm has been demonstrated to improve predictive performance, its computational complexity has turned out to be a major bottleneck, in particular for histograms with a large number of bins. In this paper, we propose instead a sliding window approach to select subsets of the bins to be considered simultaneously while partitioning examples. This significantly reduces the number of possible splits to consider, allowing for substantially larger histograms to be handled. We also propose to evaluate the original bins independently, in addition to evaluating the subsets of bins when performing splits. This ensures that the information obtained by treating bins simultaneously is an additional gain compared to what is considered by the standard approach. Results of experiments on applying the new algorithm to both synthetic and real world datasets demonstrate positive results in terms of predictive performance without excessive computational cost.

1234 1 - 50 of 151
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf