Named Entity Recognition

What it is

Named Entity Recognition (NER) is a sequence labeling task that identifies and classifies named entities (proper nouns) into categories like PERSON, LOCATION, ORGANIZATION, DATE, etc. NER is fundamental to information extraction, knowledge base construction, and question-answering systems.

[illustrate: Text with named entities highlighted and tagged (PER, LOC, ORG); example “John Smith works at Google in San Francisco” with tagging]

How it works

  1. Input: Text sequence (sentence or paragraph)

  2. Tagging scheme: Common schemes include:

    • BIO: Begin-Inside-Outside
      • “B-PER John B-ORG Google” for multi-token entities
    • BIOES: Begin-Inside-Outside-End-Single (adds end markers)
    • IOB2: Variant with more explicit boundaries
  3. Model: Typically a BERT or CRF (Conditional Random Field) on top:

    • Encode tokens with transformer
    • Classify each token’s label
    • Decode to extract spans
  4. Output: List of (entity, type, span) tuples

Example

Text: "Apple Inc. was founded by Steve Jobs in Cupertino, California."

BIO tagging:
Apple(B-ORG) Inc.(I-ORG) was founded by Steve(B-PER) Jobs(I-PER)
in Cupertino(B-LOC) , California(I-LOC) .

Extracted entities:
- Apple Inc. (ORGANIZATION)
- Steve Jobs (PERSON)
- Cupertino, California (LOCATION)

Variants and history

NER emerged in the 1990s using rule-based patterns and HMMs. CRF models (2006+) became standard. Neural sequence labeling (BiLSTM-CRF) improved over hand-crafted features. BERT for NER (2018+) achieved state-of-the-art by fine-tuning on task data. Modern variants include cross-domain NER (transfer learning), zero-shot NER (generalize to unseen entity types), and nested NER (entities containing entities).

When to use it

Use NER for:

  • Information extraction from documents
  • Knowledge graph construction
  • Question-answering systems
  • Text summarization and topic modeling
  • Enterprise search and indexing
  • Data cleaning and enrichment

NER accuracy varies by domain; medical and scientific text often require specialized models. Pre-trained models (BERT-NER) transfer well but fine-tuning on domain data typically helps.

See also