Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Higher Criticism Testing for Signal Detection in Rare And Weak Models
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics. (Matematisk statistik)
2012 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

examples - we need models for selecting a small subset of useful features from high-dimensional data, where the useful features are both rare and weak, this being crucial for e.g. supervised classfication of sparse high- dimensional data. A preceding step is to detect the presence of useful features, signal detection. This problem is related to testing a very large number of hypotheses, where the proportion of false null hypotheses is assumed to be very small. However, reliable signal detection will only be possible in certain areas of the two-dimensional sparsity-strength parameter space, the phase space.

In this report, we focus on two families of distributions, N and χ2. In the former case, features are supposed to be independent and normally distributed. In the latter, in search for a more sophisticated model, we suppose that features depend in blocks, whose empirical separation strength asymptotically follows the non-central χ2ν-distribution.

Our search for informative features explores Tukey's higher criticism (HC), which is a second-level significance testing procedure, for comparing the fraction of observed signi cances to the expected fraction under the global null.

Throughout the phase space we investgate the estimated error rate,

Err = (#Falsely rejected H0+ #Falsely rejected H1)/#Simulations,

where H0: absence of informative signals, and H1: presence of informative signals, in both the N-case and the χ2ν-case, for ν= 2; 10; 30. In particular, we find, using a feature vector of the approximately same size as in genomic applications, that the analytically derived detection boundary is too optimistic in the sense that close to it, signal detection is still failing, and we need to move far from the boundary into the success region to ensure reliable detection. We demonstrate that Err grows fast and irregularly as we approach the detection boundary from the success region.

In the χ2ν-case, ν > 2, no analytical detection boundary has been derived, but we show that the empirical success region there is smaller than in the N-case, especially as ν increases.

Place, publisher, year, edition, pages
2012. , 28 p.
Series
Trita-MAT, ISSN 1401-2286 ; 25
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:kth:diva-103284OAI: oai:DiVA.org:kth-103284DiVA: diva2:559575
Subject / course
Mathematical Statistics
Educational program
Master of Science in Engineering -Engineering Physics
Uppsok
Physics, Chemistry, Mathematics
Examiners
Available from: 2012-10-09 Created: 2012-10-09 Last updated: 2012-10-13Bibliographically approved

Open Access in DiVA

fulltext(827 kB)246 downloads
File information
File name FULLTEXT02.pdfFile size 827 kBChecksum SHA-512
693da0a023ea9cd8883aad896268d04ef68078419b5bf93f5b1add5ba71bc5d4e12e058c330e71d3344e6f9e224ebf7d414fb885e7166eef65d7916e5370e690
Type fulltextMimetype application/pdf

By organisation
Mathematical Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 246 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 141 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf