Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Learning random forest from histogram data using split specific axis rotation
KTH, School of Information and Communication Technology (ICT).
2018 (English)In: International Journal of Machine Learning and Computing, ISSN 2010-3700, Vol. 8, no 1, p. 74-79Article in journal (Refereed) Published
Abstract [en]

Machine learning algorithms for data containing histogram variables have not been explored to any major extent. In this paper, an adapted version of the random forest algorithm is proposed to handle variables of this type, assuming identical structure of the histograms across observations, i.e., the histograms for a variable all use the same number and width of the bins. The standard approach of representing bins as separate variables, may lead to that the learning algorithm overlooks the underlying dependencies. In contrast, the proposed algorithm handles each histogram as a unit. When performing split evaluation of a histogram variable during tree growth, a sliding window of fixed size is employed by the proposed algorithm to constrain the sets of bins that are considered together. A small number of all possible set of bins are randomly selected and principal component analysis (PCA) is applied locally on all examples in a node. Split evaluation is then performed on each principal component. Results from applying the algorithm to both synthetic and real world data are presented, showing that the proposed algorithm outperforms the standard approach of using random forests together with bins represented as separate variables, with respect to both AUC and accuracy. In addition to introducing the new algorithm, we elaborate on how real world data for predicting NOx sensor failure in heavy duty trucks was prepared, demonstrating that predictive performance can be further improved by adding variables that represent changes of the histograms over time. 

Place, publisher, year, edition, pages
International Association of Computer Science and Information Technology , 2018. Vol. 8, no 1, p. 74-79
Keywords [en]
Histogram data, Histogram features, Histogram random forest, Random forest PCA
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-227444DOI: 10.18178/ijmlc.2018.8.1.666Scopus ID: 2-s2.0-85042533994OAI: oai:DiVA.org:kth-227444DiVA, id: diva2:1210283
Note

Export Date: 9 May 2018; Article; Funding details: FFI, Fellowships Fund Incorporated; Funding details: SICORP, Strategic International Collaborative Research Program; Funding text: Manuscript received September 5, 2017; revised January 30, 2018. This work was supported by Scania CV AB and the Vinnova program for Strategic Vehicle Research and Innovation (FFI) Transport Efficiency.; Funding text: ACKNOWLEDGMENT This work has been funded by Scania CV AB and the Vinnova program for Strategic Vehicle Research and Innovation (FFI) Transport Efficiency. QC 20180528

Available from: 2018-05-28 Created: 2018-05-28 Last updated: 2018-05-28Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Boström, H.
By organisation
School of Information and Communication Technology (ICT)
In the same journal
International Journal of Machine Learning and Computing
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 23 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf