Chunking

What it is

Chunking (or shallow parsing) groups adjacent tokens into syntactic phrases without building full parse trees. Chunks typically include noun phrases (“the quick brown fox”), verb phrases (“will run”), and prepositional phrases (“over the hill”). Chunking is simpler than full parsing and computationally efficient, useful for information extraction and text analysis.

[illustrate: Text with chunks marked: NP (noun phrase), VP (verb phrase), PP (prepositional phrase)]

How it works

Tag sequences: Typically use IOB or BIO encoding
- [NP The quick brown fox] [VP runs] [PP over the hill]
- IOB: B-NP, I-NP, I-NP, I-NP, B-VP, B-PP, I-PP, I-PP
Chunk types: Common phrases
- NP (noun phrase): articles, adjectives, nouns
- VP (verb phrase): verbs, auxiliaries, particles
- PP (prepositional phrase): prepositions + NP
- ADJP (adjective phrase): adjectives
- ADVP (adverbial phrase): adverbs
Neural chunking:
- Encode tokens with BERT or BiLSTM
- Classify each token’s chunk type
- IOB decoding to extract phrase spans

Example

Sentence: "The quick brown fox runs over the hill quickly"

Chunks:
[NP The quick brown fox] [VP runs] [PP over the hill] [ADVP quickly]

IOB tagging:
The/B-NP quick/I-NP brown/I-NP fox/I-NP runs/B-VP
over/B-PP the/I-PP hill/I-PP quickly/B-ADVP

Variants and history

Chunking emerged in the 1990s as a task intermediate between POS tagging and full parsing. Rule-based chunkers used POS patterns. Machine learning chunking (SVMs, CRFs, 2000s) improved robustness. Neural chunking (BiLSTM-CRF, BERT-based, 2015+) achieved high accuracy. Modern systems often joint-train chunking with POS tagging or NER for efficiency.

When to use it

Use chunking for:

Quick noun phrase extraction
Lightweight syntactic analysis
Information extraction without full parsing
Text preprocessing for other tasks
Handling languages where full parsing is expensive

Chunking is efficient and interpretable. For complex syntactic phenomena, full dependency or constituency parsing may be necessary.