Sentence Embedding
What it is
A sentence embedding is a dense vector representation of a complete sentence, paragraph, or passage in continuous space (typically 384–768 dimensions). Unlike word embeddings, sentence embeddings must capture broader semantic content—meaning, intent, relationships—across multiple tokens while suppressing surface variation.
[illustrate: Two semantically similar sentences mapped to nearby points in 2D embedding space; dissimilar sentences far apart]
How it works
Sentence embeddings are produced by three main approaches:
- Mean pooling: Average word embeddings of all tokens in the sentence
- Learned pooling: Train a neural model (e.g., transformer with [CLS] token) to aggregate token representations
- Contrastive training: Fine-tune encoders on pairs (sentence, similar_sentence, dissimilar_sentence) to maximize similarity for positive pairs
Modern methods use pre-trained models (BERT, RoBERTa) with supervised fine-tuning on sentence pairs from datasets like SNLI, QA pairs, or paraphrase data.
Example
"The cat sat on the mat" → embedding_1
"A feline rested on the rug" → embedding_2
cosine_similarity(embedding_1, embedding_2) = 0.82
"The weather is sunny" → embedding_3
cosine_similarity(embedding_1, embedding_3) = 0.15
Variants and history
Early approaches used simple mean pooling of word embeddings. InferSent (Conneau et al., 2017) trained bidirectional LSTMs on natural language inference. Universal Sentence Encoder (Cer et al., 2018) combined multi-task training. Sentence-BERT (Sbert.net, 2019) fine-tuned BERT with triplet and contrastive objectives, becoming the dominant open-source standard. Modern variants use knowledge distillation, multi-lingual training, and matryoshka representation learning.
When to use it
Use sentence embeddings for:
- Semantic search and retrieval
- Clustering and topic discovery
- Paraphrase detection
- Similarity assessment
- Dense retrieval in RAG systems
- Initial ranking before reranking
Sentence embeddings are fast and interpretable but coarser than fine-grained token representations. For fine-grained semantic tasks (token classification, entity linking), prefer token-level representations.