Daisy Bloom Filters
2024 (English)In: 19th Scandinavian Symposium on Algorithm Theory, SWAT 2024, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing , 2024, article id 9Conference paper, Published paper (Refereed)
Abstract [en]
A filter is a widely used data structure for storing an approximation of a given set S of elements from some universe U (a countable set). It represents a superset S′ ⊇ S that is “close to S” in the sense that for x ∉ S, the probability that x ∈ S′ is bounded by some ε > 0. The advantage of using a Bloom filter, when some false positives are acceptable, is that the space usage becomes smaller than what is required to store S exactly. Though filters are well-understood from a worst-case perspective, it is clear that state-of-the-art constructions may not be close to optimal for particular distributions of data and queries. Suppose, for instance, that some elements are in S with probability close to 1. Then it would make sense to always include them in S′, saving space by not having to represent these elements in the filter. Questions like this have been raised in the context of Weighted Bloom filters (Bruck, Gao and Jiang, ISIT 2006) and Bloom filter implementations that make use of access to learned components (Vaidya, Knorr, Mitzenmacher, and Krask, ICLR 2021). In this paper, we present a lower bound for the expected space that such a filter requires. We also show that the lower bound is asymptotically tight by exhibiting a filter construction that executes queries and insertions in worst-case constant time, and has a false positive rate at most ε with high probability over input sets drawn from a product distribution. We also present a Bloom filter alternative, which we call the Daisy Bloom filter, that executes operations faster and uses significantly less space than the standard Bloom filter.
Place, publisher, year, edition, pages
Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing , 2024. article id 9
Keywords [en]
Bloom filters, input distribution, learned data structures
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-348287DOI: 10.4230/LIPIcs.SWAT.2024.9Scopus ID: 2-s2.0-85195368017OAI: oai:DiVA.org:kth-348287DiVA, id: diva2:1874655
Conference
19th Scandinavian Symposium on Algorithm Theory, SWAT 2024, Helsinki, Finland, Jun 12 2024 - Jun 14 2024
Note
QC 20240624
Part of ISBN 9783959773188
2024-06-202024-06-202024-06-24Bibliographically approved