Text Classification

What it is

Text classification assigns one or more categorical labels to documents or sentences. Common tasks include sentiment analysis (positive/negative), topic classification (sports/politics/tech), spam detection, and intent recognition (for chatbots). Classification is one of the most common NLP tasks, solved efficiently with modern neural models.

[illustrate: Text document → BERT encoder → classification head → probability distribution over classes; example showing sentiment scores]

How it works

  1. Input: Text document or sentence

  2. Encoding: Represent text as vector(s)

    • Bag-of-words
    • TF-IDF
    • Word embedding average
    • BERT [CLS] token
  3. Classification:

    • Linear layer: embedding → logits
    • Softmax: logits → probabilities
    • Argmax or threshold: probabilities → classes
  4. Output: Class labels (often with confidence scores)

Example

# Sentiment analysis (binary: positive/negative)
Text: "This movie was amazing!"
BERT encoding → [CLS] token embedding
Classification head:
  logits = W_class @ [CLS] + b
  probs = softmax(logits)
  output: positive (prob=0.95)

# Multi-class: topic classification
Text: "The Lakers won the championship"
Classes: [sports, politics, tech, entertainment]
Output: sports (prob=0.92)

# Multi-label: genre tagging
Text: "A romantic comedy with sci-fi elements"
Output: [romance (0.88), comedy (0.85), sci-fi (0.72)]

Variants and history

Text classification is foundational, emerging in the 1990s. Early methods used Naive Bayes and SVMs with bag-of-words. Neural text classification (CNN, RNN, 2014+) improved over hand-crafted features. Transfer learning with BERT (2018+) achieved strong results with minimal task-specific training. Variants include hierarchical classification (category hierarchy), zero-shot classification (unseen classes), and few-shot learning (small training sets).

When to use it

Use text classification for:

  • Sentiment analysis (reviews, social media)
  • Topic classification (news routing)
  • Intent recognition (chatbots, assistants)
  • Spam/abuse detection
  • Language identification
  • Content moderation

Classification is efficient and reliable. Most systems fine-tune BERT or use instruction-tuned LLMs. For simple baseline or interpretability, logistic regression on TF-IDF still effective.

See also