Change search
ReferencesLink to record
Permanent link

Direct link
Higher Criticism Testing for Signal Detection in Rare And Weak Models
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics. (Matematisk statistik)
2012 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

examples - we need models for selecting a small subset of useful features from high-dimensional data, where the useful features are both rare and weak, this being crucial for e.g. supervised classfication of sparse high- dimensional data. A preceding step is to detect the presence of useful features, signal detection. This problem is related to testing a very large number of hypotheses, where the proportion of false null hypotheses is assumed to be very small. However, reliable signal detection will only be possible in certain areas of the two-dimensional sparsity-strength parameter space, the phase space.

In this report, we focus on two families of distributions, N and χ2. In the former case, features are supposed to be independent and normally distributed. In the latter, in search for a more sophisticated model, we suppose that features depend in blocks, whose empirical separation strength asymptotically follows the non-central χ2ν-distribution.

Our search for informative features explores Tukey's higher criticism (HC), which is a second-level significance testing procedure, for comparing the fraction of observed signi cances to the expected fraction under the global null.

Throughout the phase space we investgate the estimated error rate,

Err = (#Falsely rejected H0+ #Falsely rejected H1)/#Simulations,

where H0: absence of informative signals, and H1: presence of informative signals, in both the N-case and the χ2ν-case, for ν= 2; 10; 30. In particular, we find, using a feature vector of the approximately same size as in genomic applications, that the analytically derived detection boundary is too optimistic in the sense that close to it, signal detection is still failing, and we need to move far from the boundary into the success region to ensure reliable detection. We demonstrate that Err grows fast and irregularly as we approach the detection boundary from the success region.

In the χ2ν-case, ν > 2, no analytical detection boundary has been derived, but we show that the empirical success region there is smaller than in the N-case, especially as ν increases.

Place, publisher, year, edition, pages
2012. , 28 p.
Trita-MAT, ISSN 1401-2286 ; 25
National Category
Probability Theory and Statistics
URN: urn:nbn:se:kth:diva-103284OAI: diva2:559575
Subject / course
Mathematical Statistics
Educational program
Master of Science in Engineering -Engineering Physics
Physics, Chemistry, Mathematics
Available from: 2012-10-09 Created: 2012-10-09 Last updated: 2012-10-13Bibliographically approved

Open Access in DiVA

fulltext(827 kB)190 downloads
File information
File name FULLTEXT02.pdfFile size 827 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Mathematical Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 190 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 77 hits
ReferencesLink to record
Permanent link

Direct link