Roaring Bitmap

What it is

A Roaring bitmap is a compressed data structure for storing sets of integers (e.g., document IDs) with efficient set operations. It alternates between dense and sparse representation depending on cardinality: uses bitmaps for dense regions, arrays for sparse regions. Roaring bitmaps reduce memory by 10–100x vs. uncompressed bitmaps while maintaining fast operations.

[illustrate: Roaring bitmap structure showing dense and sparse containers; comparison to uncompressed bitmap memory usage]

How it works

  1. Structure:

    • Divide integer space into 64k-sized chunks
    • For each chunk (container):
      • Dense: bitmap (if many integers)
      • Array: sorted list of integers (if few integers)
      • Run-length encoding: compressed runs of consecutive integers
  2. Set operations:

    • AND, OR, XOR, NOT: combine containers
    • Each container operation is fast; select best algorithm per container
  3. Space efficiency:

    • Uncompressed: n integers = 4n bytes (32-bit) or 8n bytes (64-bit)
    • Roaring: typically 0.5–2 bytes per integer (depending on density)

Example

Set: {1, 2, 3, 1000, 2000, 3000, 100000}

Container [0-64k]:  {1, 2, 3, 1000, 2000, 3000}
  Sparse region (low density): array [1, 2, 3]
  Dense region (higher cardinality): array [1000, 2000, 3000]

Container [64k-128k]: {100000}
  Sparse: array [36000]

Set AND operation:
Set1 ∩ Set2: iterate through containers, AND each pair
Fast: skip empty containers, use SIMD for dense containers

Variants and history

Roaring bitmaps developed by Lemire et al. (2016) for efficient analytics. Variants:

  • Run-length encoding for highly compressible regions
  • SIMD optimization for fast AND/OR operations
  • Immutable versions for cache-friendliness

Used in:

  • Elasticsearch (fast range queries)
  • Druid (OLAP analytics)
  • RoaringBitmap library (Java, C++, Python)

Becomes standard for in-memory set operations in databases.

When to use it

Use Roaring bitmaps when:

  • Storing large sets of integers (doc IDs, user IDs)
  • Fast set operations (AND, OR) critical
  • Memory efficiency important
  • Query performance sensitive to I/O
  • Column-oriented databases/indexes

Roaring bitmaps are faster and smaller than both uncompressed bitmaps and skip lists for this use case. Standard in modern analytics systems.

See also