Finite State Transducer
What it is
A Finite State Transducer (FST) is a data structure mapping input strings to output values via a directed acyclic graph of states and transitions. It generalizes a trie by sharing not just common prefixes but also common suffixes, achieving dramatically lower memory usage — roughly 8 bytes per entry versus 100+ bytes for a hash map on million-entry dictionaries. Lookup is O(m) where m is the key length, independent of dictionary size.
[illustrate: FST state diagram showing transitions labeled with input/output; example mapping words to IDs]
FSA, DAWG, and FST: the hierarchy
These three terms are closely related and often confused:
- FSA (Finite State Automaton) — maps strings to boolean: a key is either a member of the set or it isn’t. Also called a DAWG (Directed Acyclic Word Graph) when built for dictionary compression. The BurntSushi
fstcrate calls this typeSet. Use it for membership testing: spell-check dictionaries, stopword filters, blocklists. - FST (Finite State Transducer) — maps strings to values (integers, offsets, strings). The
fstcrate calls this typeMap. Use it when you need to look up a term and retrieve an associated value, like a posting-list offset in a search index.
An FSA is a degenerate FST where all output values are 1. In practice, if you only need membership testing, prefer the FSA: it’s smaller because it stores no output at all.
How it works
Structure
An FST is a directed acyclic graph where each edge carries an input label and an output value (often an integer). A path from the start state to an accepting state spells out an input key; the output values accumulated along that path sum to the key’s associated value.
Shared suffix states — the key compression insight
A trie shares prefixes but not suffixes. An FST shares both. Consider the four keys cat, cats, dog, dogs:
Trie (suffixes NOT shared):
c → a → t → [accept]
↓
s → [accept]
d → o → g → [accept]
↓
s → [accept]
FST (both "t" suffix path AND "s" suffix state are shared):
c → a ↘
t → [accept]
d → o ↗ ↓
s → [accept]
The single s state is shared by both cats and dogs. This is what makes an FST more compact than a trie: structurally identical suffixes collapse into a single state rather than being duplicated for every prefix that leads to them.
Output accumulation (Mealy machine semantics)
FSTs are typically implemented as Mealy machines — outputs live on transitions, not on states. During a lookup you follow transitions and accumulate (sum) the output values encountered along the path.
This enables additional compression. If cat → 100 and cats → 101, the FST doesn’t store 100 and 101 separately at their final states. Instead it stores 100 on the shared path to cat and 1 on the s transition. Any key that ends with s only needs to record the delta from its prefix, not the full value.
Transition outputs for the cat/cats/dog/dogs example:
c/0 → a/0 → t/100 → [accept, output=0]
↓
s/1 → [accept, output=0]
d/0 → o/0 → g/134 → [accept, output=0]
↑ (same s state reused)
Lookup "cats": accumulate 0 + 0 + 100 + 1 = 101 ✓
Lookup "dogs": accumulate 0 + 0 + 134 + 1 = 235 ✓
The alternative — Moore machine, where outputs are on states — would require storing a separate value at every accepting state even when values differ only slightly, eliminating much of the compression benefit.
Construction
Input must be sorted. This is a hard requirement, not an implementation detail. The incremental minimization algorithm (Daciuk et al., 2000) processes keys one at a time and freezes states as soon as it knows no future key can revisit them. That guarantee holds only when input arrives in lexicographic order. Passing unsorted input produces either a crash or a silently incorrect automaton depending on the implementation.
In practice this means: sort your (key, value) pairs before calling the builder, or pipe a pre-sorted source (e.g., a sorted file from disk) directly into the builder without buffering the whole thing in memory.
Lookup
Follow transitions matching each character of the query, accumulating output values as you go. If you reach an accepting state after consuming all input, the query is a hit and the accumulated sum is the value. Total cost: O(m) where m is query length. Dictionary size has no effect on lookup time.
Example — term dictionary
Term-to-ID mapping:
"cat" → 100
"cats" → 101
"dog" → 234
"dogs" → 235
FST structure (conceptual):
start --c/0--> s1 --a/0--> s2 --t/100--> s3 (accept)
|
s/1 |
v
start --d/0--> s4 --o/0--> s5 --g/134--> s3 (same accept)
|
(same s state)
Memory: ~8 bytes per entry
Lookup: O(m), independent of dictionary size
The accepting state after s is the same node for both cats and dogs — the suffix is physically shared in memory, not copied.
Levenshtein automata intersection
One of the most powerful and least obvious FST capabilities: fuzzy matching without scanning every entry.
A Levenshtein automaton is an FSA that accepts all strings within edit distance k of a query string. It can be constructed in O(|query| · k) time. You then intersect this automaton with your dictionary FST. The intersection traverses only states reachable by both automata simultaneously, pruning any branch where no string within edit distance k can be completed. Result: all dictionary keys within edit distance k of the query, in O(k · m) time, independent of dictionary size.
This is how tantivy’s fuzzy search works, and it’s why fuzzy autocomplete on large dictionaries can be fast. The fst crate exposes this via the levenshtein module:
use fst::{Set, IntoStreamer, Streamer};
use fst::automaton::Levenshtein;
let keys = vec!["cat", "cats", "dog", "dogs", "dot"];
let set = Set::from_iter(keys).unwrap();
// Find all keys within edit distance 1 of "cot"
let lev = Levenshtein::new("cot", 1).unwrap();
let mut stream = set.search(lev).into_stream();
while let Some(key) = stream.next() {
println!("{}", std::str::from_utf8(key).unwrap());
}
// Prints: cat, cot (if present), dot
The same intersection mechanism works with any automaton — prefix automata for autocomplete, regex automata for pattern matching.
Implementation: BurntSushi fst crate (Rust)
The fst crate is the reference FST implementation outside of Lucene. It provides both Set (FSA) and Map (FST) types, supports mmap for zero-deserialization-cost querying, and handles range queries and prefix iteration natively.
Key design constraints:
- Input must be sorted — the builder panics if keys arrive out of order.
- The builder streams entries to avoid loading everything into memory — you can build from a sorted file without buffering.
- The binary format is designed for
mmap: once written to disk, the file can be memory-mapped and queried without any deserialization step. The full dictionary lives in the OS page cache, not in your process heap. - Range queries and prefix iteration are first-class, not afterthoughts. This is impossible with a hash map.
use fst::{MapBuilder, Map, IntoStreamer, Streamer};
// Input MUST be sorted
let mut build = MapBuilder::memory();
build.insert("cat", 100).unwrap();
build.insert("cats", 101).unwrap();
build.insert("dog", 234).unwrap();
build.insert("dogs", 235).unwrap();
let bytes = build.into_inner().unwrap();
let map = Map::new(bytes).unwrap();
// Exact lookup
assert_eq!(map.get("cat"), Some(100));
assert_eq!(map.get("dogs"), Some(235));
assert_eq!(map.get("cow"), None);
// Range query — returns all keys between "cat" and "dog" inclusive
// Impossible with a hash map; trivial with an FST
let mut stream = map.range().ge("cat").le("dog").into_stream();
while let Some((key, val)) = stream.next() {
println!("{}: {}", std::str::from_utf8(key).unwrap(), val);
}
// cat: 100
// cats: 101
// dog: 234
// Prefix query
let mut stream = map.range().ge("cat").lt("cau").into_stream();
// Returns: cat, cats
The canonical deep-dive is Andrew Gallant’s blog post “Index 1,600,000,000 Keys with Automata and Rust”, which explains the design decisions behind the crate in detail.
Variants and history
FSTs formalize Mealy/Moore machine theory (Rabin & Scott, 1959; Mealy, 1955). Daciuk et al. (2000) introduced the incremental minimization algorithm that makes construction from sorted input practical.
The decisive application to information retrieval was Lucene 4.0 (2012), which replaced its B-tree term dictionary with an FST-based structure. The paper “Finite-state transducers in the Apache Lucene search engine” (Białecki, Muir, Targosz-Korecka) describes this change — it’s why modern search engines have such compact on-disk indexes and why Lucene’s term dictionary scales to billions of terms.
OpenFST (Allauzen et al., 2007) generalized FSTs to semiring-weighted tropical operations, enabling applications in speech recognition, phonetic conversion, and morphological analysis beyond what a simple integer-output transducer supports.
DAWG (Directed Acyclic Word Graph) is the FSA (set) variant, used for dictionary compression and spell-checking since the 1980s.
When to use it
Use an FST when:
- Memory efficiency is critical and keys are strings (millions of terms, mobile/embedded systems)
- You need O(m) lookup independent of dictionary size
- You need ordered iteration, prefix queries, or range queries — a hash map cannot provide these
- The key set is static or rebuilt infrequently (nightly index builds, shipped binaries)
- You want the dictionary to live on disk and be mmap-ed with zero deserialization
When not to use it:
- Key set is small — a hash map is simpler, fast enough, and mutable
- Keys aren’t strings — numeric keys want a B-tree or sorted array with binary search
- You need mutable updates — FSTs are immutable; adding a single key means rebuilding the entire structure
- You need substring search, not prefix search — an inverted index is the right tool there
- You need the fastest possible point lookups on hot paths — a hash map’s O(1) beats O(m) when m is large and the dictionary fits in memory anyway
Comparison with related structures
| Structure | Lookup | Space | Mutable | Ordered iteration |
|---|---|---|---|---|
| Hash map | O(1) avg | High | Yes | No |
| B-tree | O(log n) | Medium | Yes | Yes |
| Trie | O(m) | Medium | Yes | Yes |
| DAWG / FSA | O(m) | Very low | No | Yes |
| FST / Map | O(m) | Very low | No | Yes |
The immutability constraint is the most important practical difference. Engineers familiar with hash maps or B-trees often assume they can insert into an FST after construction — they cannot. Design your pipeline around a rebuild-on-write model (build offline, query online) rather than incremental updates.