Part-of-Speech Tagging

What it is

Part-of-Speech (POS) tagging is a sequence labeling task assigning each word in text a grammatical category (noun, verb, adjective, pronoun, etc.). POS tags are used downstream for parsing, entity recognition, and linguistic analysis. Most modern POS taggers are neural (BERT-based) and achieve 97%+ accuracy on standard benchmarks.

[illustrate: Text with POS tags above each token: “The/DT cat/NN sat/VBD on/IN the/DT mat/NN”]

How it works

  1. Tag sets: Standard schemes like Penn Treebank (48 tags) or Universal POS (17 tags)

    • NOUN, VERB, ADJ, ADV, PRON, DET, ADP, CCONJ, SCONJ, PUNCT, etc.
  2. Tagging approaches:

    • Rule-based: Hand-crafted rules (rare now)
    • Statistical: HMM, CRF with hand-engineered features
    • Neural: BiLSTM or Transformer fine-tuned for tagging
  3. Modern approach:

    • Encode tokens with BERT or similar
    • Classify each token’s POS tag
    • Often combined with other tasks (NER, lemmatization)

Example

Sentence: "The quick brown fox jumps over the lazy dog"

POS tags (Universal POS):
The/DET quick/ADJ brown/ADJ fox/NOUN jumps/VERB over/ADP the/DET lazy/ADJ dog/NOUN

Penn Treebank (more fine-grained):
The/DT quick/JJ brown/JJ fox/NN jumps/VBZ over/IN the/DT lazy/JJ dog/NN

Variants and history

POS tagging dates to the 1960s with rule-based systems. HMM taggers (1980s–90s) enabled probabilistic approaches. CRF models (2000s) improved with structured predictions. Neural POS tagging (BiLSTM, 2016+) and BERT-based POS (2018+) achieved near-human accuracy. Contextual nature of POS (homonymy: “bank” as noun vs. verb) makes bidirectional context crucial.

When to use it

Use POS tagging for:

  • Parsing and syntax analysis
  • Lemmatization and stemming
  • Named entity recognition
  • Information extraction
  • Text analysis and corpus linguistics
  • Language learning systems

POS tagging is typically a preprocessing step, not end task. Most modern systems do joint tagging (POS + NER + lemmatization) for efficiency.

See also