Change search
ReferencesLink to record
Permanent link

Direct link
Recovery From Random Samples in a Big Data Set
Sharif University of Technology.ORCID iD: 0000-0002-6247-1217
Sharif University of Technology.
2015 (English)In: IEEE Communications Letters, ISSN 1089-7798, E-ISSN 1558-2558, Vol. 19, no 11, 1929-1932 p.Article in journal (Refereed) Published
Abstract [en]

Consider a collection of files, each of which is a sequence of letters. One of these files is randomly chosen and a random subsequence of the file is revealed. This random subsequence can be the result of a random sampling of the file. The goal is to recover the identity of the file, assuming a simple greedy matching algorithm to search the file collection. We study the fundamental limits on the maximum size of the file collection for reliable recovery in terms of the length of the random subsequence. The sequence of each file is assumed to follow a hidden Markov model (HMM), which is a common model for many data structures such as voice or DNA sequences. The connection between this problem and coding over a deletion channel with greedy decoders is discussed.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2015. Vol. 19, no 11, 1929-1932 p.
Keyword [en]
Hidden Markov model, deletion channel, greedy match, DNA sequences
National Category
Communication Systems
Research subject
Information and Communication Technology
Identifiers
URN: urn:nbn:se:kth:diva-189314DOI: 10.1109/LCOMM.2015.2478815ISI: 000365028900018ScopusID: 2-s2.0-84947724883OAI: oai:DiVA.org:kth-189314DiVA: diva2:945520
Note

QC 20160901

Available from: 2016-07-01 Created: 2016-07-01 Last updated: 2016-09-01Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Molavipour, Sina
In the same journal
IEEE Communications Letters
Communication Systems

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 5 hits
ReferencesLink to record
Permanent link

Direct link