kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Daisy Bloom Filters
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.ORCID iD: 0000-0001-8430-2441
BARC, University of Copenhagen, Denmark.
BARC, University of Copenhagen, Denmark.
2024 (English)In: 19th Scandinavian Symposium on Algorithm Theory, SWAT 2024, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing , 2024, article id 9Conference paper, Published paper (Refereed)
Abstract [en]

A filter is a widely used data structure for storing an approximation of a given set S of elements from some universe U (a countable set). It represents a superset S′ ⊇ S that is “close to S” in the sense that for x ∉ S, the probability that x ∈ S′ is bounded by some ε > 0. The advantage of using a Bloom filter, when some false positives are acceptable, is that the space usage becomes smaller than what is required to store S exactly. Though filters are well-understood from a worst-case perspective, it is clear that state-of-the-art constructions may not be close to optimal for particular distributions of data and queries. Suppose, for instance, that some elements are in S with probability close to 1. Then it would make sense to always include them in S′, saving space by not having to represent these elements in the filter. Questions like this have been raised in the context of Weighted Bloom filters (Bruck, Gao and Jiang, ISIT 2006) and Bloom filter implementations that make use of access to learned components (Vaidya, Knorr, Mitzenmacher, and Krask, ICLR 2021). In this paper, we present a lower bound for the expected space that such a filter requires. We also show that the lower bound is asymptotically tight by exhibiting a filter construction that executes queries and insertions in worst-case constant time, and has a false positive rate at most ε with high probability over input sets drawn from a product distribution. We also present a Bloom filter alternative, which we call the Daisy Bloom filter, that executes operations faster and uses significantly less space than the standard Bloom filter.

Place, publisher, year, edition, pages
Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing , 2024. article id 9
Keywords [en]
Bloom filters, input distribution, learned data structures
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-348287DOI: 10.4230/LIPIcs.SWAT.2024.9Scopus ID: 2-s2.0-85195368017OAI: oai:DiVA.org:kth-348287DiVA, id: diva2:1874655
Conference
19th Scandinavian Symposium on Algorithm Theory, SWAT 2024, Helsinki, Finland, Jun 12 2024 - Jun 14 2024
Note

QC 20240624

Part of ISBN 9783959773188

Available from: 2024-06-20 Created: 2024-06-20 Last updated: 2024-06-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Bercea, Ioana

Search in DiVA

By author/editor
Bercea, Ioana
By organisation
Theoretical Computer Science, TCS
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 51 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf