NLP Citations

The algorithms behind search and language — explained visually

NLP and information retrieval are full of ideas that are easy to state but hard to hold in your head. This reference covers the terms, techniques, and papers that practising engineers actually encounter — with step-through animations built for the moment an algorithm finally clicks.

200+ citations
40+ visual explanations
8 topic areas

An inverted index merge, a dynamic programming traceback, a sliding n-gram window — each one clicks into place the moment you see it move. Every concept that benefits from animation has a step-through you can pause, rewind, and replay.

Written for engineers building search systems, working with language models, or filling the gaps that most NLP courses skip over.

Browse by topic

Postings list merge

An inverted index stores, for each term, a sorted list of the documents it appears in — together with positions when phrase search is needed. Executing a query means walking two or more of those lists simultaneously, advancing whichever pointer is behind. Step through the animation to see how an AND merge resolves a phrase query: the two cursors leap-frog through the postings until they land on the same document at adjacent positions, emitting a match only when both conditions hold.

Sliding window tokenisation

Many NLP tasks — fuzzy matching, near-duplicate detection, language identification — rely on breaking text into fixed-size character or word sequences. A window of width n moves one step at a time across the input, and every position produces one token. Step through the animation to watch each trigram peel off the word “colour” and see how overlapping windows capture every local context.

Levenshtein edit distance

Measuring how similar two strings are — for spell correction, record linkage, or fuzzy search — comes down to counting the cheapest sequence of single-character edits that transforms one into the other. The Levenshtein algorithm fills a dynamic programming matrix one cell at a time, each cell recording the minimum cost to align the prefixes that meet at that corner. Step through to watch the matrix fill, then follow the highlighted traceback path from the bottom-right back to the origin to read off exactly which insertions, deletions, and substitutions were chosen.

Recent Publications