Succinct Data Structure

What it is

Succinct data structures represent data in space close to information-theoretic lower bounds while supporting efficient queries. For example, a set of n items from universe U requires at least log_2(|U| choose n) bits; succinct structures achieve this + o(n) overhead. Key insight: can compress data and query it without full decompression.

[illustrate: Space comparison: uncompressed vs. compressed vs. succinct; operation examples showing how queries work on compressed representation]

How it works

  1. Information-theoretic bounds:

    • n items from universe U require ≥ log(C(|U|, n)) bits
    • Example: bitmaps of n set bits from 2^m positions need ≥ log(C(2^m, n)) bits
  2. Succinct representation:

    • Use asymptotically optimal space: log(C(|U|, n)) + o(n) bits
    • Support rank/select queries in O(log n) or O(1)
  3. Trade-off: Time complexity increases vs. uncompressed structures

Example

Uncompressed bitmap: 1,000,000 bits
Information-theoretic lower bound: ~500,000 bits (if only 500k bits set)

Succinct bitmap: ~500,500 bits (near-optimal + small overhead)
Query "rank(position)" (count 1s up to position): O(log n) or O(1)
Query "select(k)" (find k-th 1): O(log n)

# Memory savings: 2–10x compression while maintaining fast operations

Variants and history

Succinct structures emerged in algorithmics (1990s). Jacobson’s succinct bitmaps (1989) pioneered this. Wavelet trees and FM-indexes build on succinct principles. Modern compressors (gzip, bzip2) achieve compression but require decompression; succinct structures compress AND support queries. Applications: compressed IR indexes, bioinformatics (DNA sequence search), graph compression.

When to use it

Use succinct structures when:

  • Space is extremely limited (billions of items)
  • Query speed on compressed data is important
  • Information-theoretic bounds are relevant
  • Index/structure creation is offline (queries online)
  • Examples: full-text indexes (FM-index), compressed graphs

Succinct structures are sophisticated; practical tradeoffs often favor simpler compression (gzip) + decompression unless space is critical.

See also