DSSM

What it is

DSSM (Deep Structured Semantic Model, Huang et al., 2013) is the original neural retrieval model. It uses two MLP towers — one for queries, one for documents — trained on click-through data to produce embeddings that are compared by cosine similarity. Predates transformers by six years but introduced the dual-encoder blueprint that DPR and nearly every modern dense retrieval model follows.

[illustrate: Query and document word-hash inputs → separate MLP towers → cosine similarity score; training signal from click-through data]

How it works

  1. Word hashing:

    • Characters are split into letter trigrams (e.g., “cat” → “#ca”, “cat”, “at#”)
    • Map to a 30,000-dimensional binary vector (collisions are rare)
    • Avoids out-of-vocabulary problem without large embedding tables
  2. MLP encoding:

    • Multiple fully-connected layers with tanh activation
    • Produces a 128-dimensional semantic vector
  3. Scoring:

    • Cosine similarity between query and document embeddings
    • Softmax over clicked and 4 randomly sampled non-clicked documents
  4. Training signal:

    • Click-through logs from search engines
    • Clicked documents treated as positives

Variants and history

DSSM (2013, Microsoft Research) is primarily historical significance — it defined the dual-encoder paradigm. CDSSM (Convolutional DSSM) replaced MLP with CNNs over character trigrams for better local feature capture. The architecture was superseded by transformer-based models (DPR, Sentence-BERT) but the training objective and dual-encoder structure remain foundational.

When to use it

DSSM is primarily a historical reference. In practice:

  • Use DPR or Sentence-BERT for transformer-based dual encoders
  • DSSM remains relevant for understanding the origins of neural retrieval
  • Character trigram hashing is still useful for OOV-robust tokenization

See also