DSSM
What it is
DSSM (Deep Structured Semantic Model, Huang et al., 2013) is the original neural retrieval model. It uses two MLP towers — one for queries, one for documents — trained on click-through data to produce embeddings that are compared by cosine similarity. Predates transformers by six years but introduced the dual-encoder blueprint that DPR and nearly every modern dense retrieval model follows.
[illustrate: Query and document word-hash inputs → separate MLP towers → cosine similarity score; training signal from click-through data]
How it works
-
Word hashing:
- Characters are split into letter trigrams (e.g., “cat” → “#ca”, “cat”, “at#”)
- Map to a 30,000-dimensional binary vector (collisions are rare)
- Avoids out-of-vocabulary problem without large embedding tables
-
MLP encoding:
- Multiple fully-connected layers with tanh activation
- Produces a 128-dimensional semantic vector
-
Scoring:
- Cosine similarity between query and document embeddings
- Softmax over clicked and 4 randomly sampled non-clicked documents
-
Training signal:
- Click-through logs from search engines
- Clicked documents treated as positives
Variants and history
DSSM (2013, Microsoft Research) is primarily historical significance — it defined the dual-encoder paradigm. CDSSM (Convolutional DSSM) replaced MLP with CNNs over character trigrams for better local feature capture. The architecture was superseded by transformer-based models (DPR, Sentence-BERT) but the training objective and dual-encoder structure remain foundational.
When to use it
DSSM is primarily a historical reference. In practice:
- Use DPR or Sentence-BERT for transformer-based dual encoders
- DSSM remains relevant for understanding the origins of neural retrieval
- Character trigram hashing is still useful for OOV-robust tokenization