Wavelet Tree

What it is

A wavelet tree is a succinct data structure supporting efficient rank and select queries on sequences. It recursively partitions alphabet using a binary tree of bitmaps, enabling compression (near-optimal space) with fast queries O(log σ) where σ = alphabet size. Wavelet trees underpin FM-indexes and enable compressed full-text search.

[illustrate: Wavelet tree structure showing binary partitioning of alphabet; path from root to leaf showing character encoding]

How it works

  1. Construction:

    • Partition alphabet into two halves recursively
    • At each level, create bitmap (1 if character in upper half)
    • Recursively build subtrees
    • Height: log σ where σ = alphabet size
  2. Rank query rank_c(position):

    • Count occurrences of character c up to position
    • Navigate tree: follow path determined by c’s binary encoding
    • O(log σ) time
  3. Select query select_c(k):

    • Find position of k-th occurrence of character c
    • Use binary search + rank queries
    • O(log σ × log n) time

Example

Sequence: "abracadabra" (11 characters)
Alphabet: {a, b, c, d, r} (5 symbols)

Wavelet tree (binary partitioning):
Level 0: {a, b, c} vs {d, r}
  bitmap: 11010101010 (a,b,c are in lower half)

Level 1a ({a,b,c}): {a} vs {b,c}
  bitmap: 10100010100

Level 1b ({d,r}): {d} vs {r}
  bitmap: 10

rank_a(position 5) = count('a' in first 5 chars)
  Navigate: 'a' → lower at level 0, left at level 1a
  Follow path, count 1s in relevant bitmaps
  Answer: 3

Space: ~|seq| × log σ bits (highly compressible)

Variants and history

Wavelet trees invented by Grossi et al. (2003) for compressed text indexing. Huffman-shaped wavelet trees reduce space further by using optimal-length codes. Multiary wavelet trees generalize binary structure. Foundation of FM-indexes for full-text search without decompression. Modern applications: compressed IR, genomic sequence indexing, graph compression.

When to use it

Use wavelet trees when:

  • Need fast rank/select on large sequences
  • Space efficiency critical
  • Full-text search on compressed data
  • Building FM-indexes
  • Alphabet size moderate (≤ 256 typical)

Wavelet trees are sophisticated but well-implemented in libraries (Succinct, SDSL). Trade-off: faster than decompressing entire data, slower than uncompressed arrays.

See also