Roaring Bitmap
What it is
A Roaring bitmap is a compressed data structure for storing sets of integers (e.g., document IDs) with efficient set operations. It alternates between dense and sparse representation depending on cardinality: uses bitmaps for dense regions, arrays for sparse regions. Roaring bitmaps reduce memory by 10–100x vs. uncompressed bitmaps while maintaining fast operations.
[illustrate: Roaring bitmap structure showing dense and sparse containers; comparison to uncompressed bitmap memory usage]
How it works
-
Structure:
- Divide integer space into 64k-sized chunks
- For each chunk (container):
- Dense: bitmap (if many integers)
- Array: sorted list of integers (if few integers)
- Run-length encoding: compressed runs of consecutive integers
-
Set operations:
- AND, OR, XOR, NOT: combine containers
- Each container operation is fast; select best algorithm per container
-
Space efficiency:
- Uncompressed: n integers = 4n bytes (32-bit) or 8n bytes (64-bit)
- Roaring: typically 0.5–2 bytes per integer (depending on density)
Example
Set: {1, 2, 3, 1000, 2000, 3000, 100000}
Container [0-64k]: {1, 2, 3, 1000, 2000, 3000}
Sparse region (low density): array [1, 2, 3]
Dense region (higher cardinality): array [1000, 2000, 3000]
Container [64k-128k]: {100000}
Sparse: array [36000]
Set AND operation:
Set1 ∩ Set2: iterate through containers, AND each pair
Fast: skip empty containers, use SIMD for dense containers
Variants and history
Roaring bitmaps developed by Lemire et al. (2016) for efficient analytics. Variants:
- Run-length encoding for highly compressible regions
- SIMD optimization for fast AND/OR operations
- Immutable versions for cache-friendliness
Used in:
- Elasticsearch (fast range queries)
- Druid (OLAP analytics)
- RoaringBitmap library (Java, C++, Python)
Becomes standard for in-memory set operations in databases.
When to use it
Use Roaring bitmaps when:
- Storing large sets of integers (doc IDs, user IDs)
- Fast set operations (AND, OR) critical
- Memory efficiency important
- Query performance sensitive to I/O
- Column-oriented databases/indexes
Roaring bitmaps are faster and smaller than both uncompressed bitmaps and skip lists for this use case. Standard in modern analytics systems.