Named Entity Recognition

Ner Named-Entity-Recognition Tagging Sequence-Labeling Nlp Needs-Review

What it is

Named Entity Recognition (NER) is a sequence labeling task that identifies and classifies named entities (proper nouns) into categories like PERSON, LOCATION, ORGANIZATION, DATE, etc. NER is fundamental to information extraction, knowledge base construction, and question-answering systems.

[illustrate: Text with named entities highlighted and tagged (PER, LOC, ORG); example “John Smith works at Google in San Francisco” with tagging]

How it works

Input: Text sequence (sentence or paragraph)
Tagging scheme: Common schemes include:
- BIO: Begin-Inside-Outside
  - “B-PER John B-ORG Google” for multi-token entities
- BIOES: Begin-Inside-Outside-End-Single (adds end markers)
- IOB2: Variant with more explicit boundaries
Model: Typically a BERT or CRF (Conditional Random Field) on top:
- Encode tokens with transformer
- Classify each token’s label
- Decode to extract spans
Output: List of (entity, type, span) tuples

Example

Text: "Apple Inc. was founded by Steve Jobs in Cupertino, California."

BIO tagging:
Apple(B-ORG) Inc.(I-ORG) was founded by Steve(B-PER) Jobs(I-PER)
in Cupertino(B-LOC) , California(I-LOC) .

Extracted entities:
- Apple Inc. (ORGANIZATION)
- Steve Jobs (PERSON)
- Cupertino, California (LOCATION)

Variants and history

NER emerged in the 1990s using rule-based patterns and HMMs. CRF models (2006+) became standard. Neural sequence labeling (BiLSTM-CRF) improved over hand-crafted features. BERT for NER (2018+) achieved state-of-the-art by fine-tuning on task data. Modern variants include cross-domain NER (transfer learning), zero-shot NER (generalize to unseen entity types), and nested NER (entities containing entities).

When to use it

Use NER for:

Information extraction from documents
Knowledge graph construction
Question-answering systems
Text summarization and topic modeling
Enterprise search and indexing
Data cleaning and enrichment

NER accuracy varies by domain; medical and scientific text often require specialized models. Pre-trained models (BERT-NER) transfer well but fine-tuning on domain data typically helps.