Named Entity Recognition
What it is
Named Entity Recognition (NER) is a sequence labeling task that identifies and classifies named entities (proper nouns) into categories like PERSON, LOCATION, ORGANIZATION, DATE, etc. NER is fundamental to information extraction, knowledge base construction, and question-answering systems.
[illustrate: Text with named entities highlighted and tagged (PER, LOC, ORG); example “John Smith works at Google in San Francisco” with tagging]
How it works
-
Input: Text sequence (sentence or paragraph)
-
Tagging scheme: Common schemes include:
- BIO: Begin-Inside-Outside
- “B-PER John B-ORG Google” for multi-token entities
- BIOES: Begin-Inside-Outside-End-Single (adds end markers)
- IOB2: Variant with more explicit boundaries
- BIO: Begin-Inside-Outside
-
Model: Typically a BERT or CRF (Conditional Random Field) on top:
- Encode tokens with transformer
- Classify each token’s label
- Decode to extract spans
-
Output: List of (entity, type, span) tuples
Example
Text: "Apple Inc. was founded by Steve Jobs in Cupertino, California."
BIO tagging:
Apple(B-ORG) Inc.(I-ORG) was founded by Steve(B-PER) Jobs(I-PER)
in Cupertino(B-LOC) , California(I-LOC) .
Extracted entities:
- Apple Inc. (ORGANIZATION)
- Steve Jobs (PERSON)
- Cupertino, California (LOCATION)
Variants and history
NER emerged in the 1990s using rule-based patterns and HMMs. CRF models (2006+) became standard. Neural sequence labeling (BiLSTM-CRF) improved over hand-crafted features. BERT for NER (2018+) achieved state-of-the-art by fine-tuning on task data. Modern variants include cross-domain NER (transfer learning), zero-shot NER (generalize to unseen entity types), and nested NER (entities containing entities).
When to use it
Use NER for:
- Information extraction from documents
- Knowledge graph construction
- Question-answering systems
- Text summarization and topic modeling
- Enterprise search and indexing
- Data cleaning and enrichment
NER accuracy varies by domain; medical and scientific text often require specialized models. Pre-trained models (BERT-NER) transfer well but fine-tuning on domain data typically helps.