Succinct Data Structure
What it is
Succinct data structures represent data in space close to information-theoretic lower bounds while supporting efficient queries. For example, a set of n items from universe U requires at least log_2(|U| choose n) bits; succinct structures achieve this + o(n) overhead. Key insight: can compress data and query it without full decompression.
[illustrate: Space comparison: uncompressed vs. compressed vs. succinct; operation examples showing how queries work on compressed representation]
How it works
-
Information-theoretic bounds:
- n items from universe U require ≥ log(C(|U|, n)) bits
- Example: bitmaps of n set bits from 2^m positions need ≥ log(C(2^m, n)) bits
-
Succinct representation:
- Use asymptotically optimal space: log(C(|U|, n)) + o(n) bits
- Support rank/select queries in O(log n) or O(1)
-
Trade-off: Time complexity increases vs. uncompressed structures
Example
Uncompressed bitmap: 1,000,000 bits
Information-theoretic lower bound: ~500,000 bits (if only 500k bits set)
Succinct bitmap: ~500,500 bits (near-optimal + small overhead)
Query "rank(position)" (count 1s up to position): O(log n) or O(1)
Query "select(k)" (find k-th 1): O(log n)
# Memory savings: 2–10x compression while maintaining fast operations
Variants and history
Succinct structures emerged in algorithmics (1990s). Jacobson’s succinct bitmaps (1989) pioneered this. Wavelet trees and FM-indexes build on succinct principles. Modern compressors (gzip, bzip2) achieve compression but require decompression; succinct structures compress AND support queries. Applications: compressed IR indexes, bioinformatics (DNA sequence search), graph compression.
When to use it
Use succinct structures when:
- Space is extremely limited (billions of items)
- Query speed on compressed data is important
- Information-theoretic bounds are relevant
- Index/structure creation is offline (queries online)
- Examples: full-text indexes (FM-index), compressed graphs
Succinct structures are sophisticated; practical tradeoffs often favor simpler compression (gzip) + decompression unless space is critical.