DSSM

What it is

DSSM (Deep Structured Semantic Model, Huang et al., 2013) is the original neural retrieval model. It uses two MLP towers — one for queries, one for documents — trained on click-through data to produce embeddings that are compared by cosine similarity. Predates transformers by six years but introduced the dual-encoder blueprint that DPR and nearly every modern dense retrieval model follows.

[illustrate: Query and document word-hash inputs → separate MLP towers → cosine similarity score; training signal from click-through data]

How it works

Word hashing:
- Characters are split into letter trigrams (e.g., “cat” → “#ca”, “cat”, “at#”)
- Map to a 30,000-dimensional binary vector (collisions are rare)
- Avoids out-of-vocabulary problem without large embedding tables
MLP encoding:
- Multiple fully-connected layers with tanh activation
- Produces a 128-dimensional semantic vector
Scoring:
- Cosine similarity between query and document embeddings
- Softmax over clicked and 4 randomly sampled non-clicked documents
Training signal:
- Click-through logs from search engines
- Clicked documents treated as positives

Variants and history

DSSM (2013, Microsoft Research) is primarily historical significance — it defined the dual-encoder paradigm. CDSSM (Convolutional DSSM) replaced MLP with CNNs over character trigrams for better local feature capture. The architecture was superseded by transformer-based models (DPR, Sentence-BERT) but the training objective and dual-encoder structure remain foundational.

When to use it

DSSM is primarily a historical reference. In practice:

Use DPR or Sentence-BERT for transformer-based dual encoders
DSSM remains relevant for understanding the origins of neural retrieval
Character trigram hashing is still useful for OOV-robust tokenization

DSSM

What it is

How it works

Variants and history

When to use it

See also