kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Random subspace and random projection nearest neighbor ensembles for high dimensional data
Stockholm Univ, Dept Comp & Syst Sci, Post Box 7003, SE-16440 Kista, Sweden.;Univ Peradeniya, Fac Engn, Dept Comp Engn, Peradeniya 20400, Sri Lanka..
Univ Peradeniya, Fac Engn, Dept Engn Math, Peradeniya 20400, Sri Lanka..
Stockholm Univ, Dept Comp & Syst Sci, Post Box 7003, SE-16440 Kista, Sweden..
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0001-8382-0300
2022 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 191, article id 116078Article in journal (Refereed) Published
Abstract [en]

The random subspace and the random projection methods are investigated and compared as techniques for forming ensembles of nearest neighbor classifiers in high dimensional feature spaces. The two methods have been empirically evaluated on three types of high-dimensional datasets: microarrays, chemoinformatics, and images. Experimental results on 34 datasets show that both the random subspace and the random projection method lead to improvements in predictive performance compared to using the standard nearest neighbor classifier, while the best method to use depends on the type of data considered; for the microarray and chemoinformatics datasets, random projection outperforms the random subspace method, while the opposite holds for the image datasets. An analysis using data complexity measures, such as attribute to instance ratio and Fisher's discriminant ratio, provide some more detailed indications on what relative performance can be expected for specific datasets. The results also indicate that the resulting ensembles may be competitive with state-of-the-art ensemble classifiers; the nearest neighbor ensembles using random projection perform on par with random forests for the microarray and chemoinformatics datasets.

Place, publisher, year, edition, pages
Elsevier BV , 2022. Vol. 191, article id 116078
Keywords [en]
Nearest neighbor ensemble, High dimensional data, Random subspace method, Random projection method
National Category
Information Systems
Identifiers
URN: urn:nbn:se:kth:diva-307295DOI: 10.1016/j.eswa.2021.116078ISI: 000736167200011Scopus ID: 2-s2.0-85120741166OAI: oai:DiVA.org:kth-307295DiVA, id: diva2:1630773
Note

QC 20220121

Available from: 2022-01-21 Created: 2022-01-21 Last updated: 2022-06-25Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Boström, Henrik

Search in DiVA

By author/editor
Boström, Henrik
By organisation
Software and Computer systems, SCS
In the same journal
Expert systems with applications
Information Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 126 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf