Sentence Embedding

Embedding Sentence Dense Representation Semantic Needs-Review

What it is

A sentence embedding is a dense vector representation of a complete sentence, paragraph, or passage in continuous space (typically 384–768 dimensions). Unlike word embeddings, sentence embeddings must capture broader semantic content—meaning, intent, relationships—across multiple tokens while suppressing surface variation.

[illustrate: Two semantically similar sentences mapped to nearby points in 2D embedding space; dissimilar sentences far apart]

How it works

Sentence embeddings are produced by three main approaches:

Mean pooling: Average word embeddings of all tokens in the sentence
Learned pooling: Train a neural model (e.g., transformer with [CLS] token) to aggregate token representations
Contrastive training: Fine-tune encoders on pairs (sentence, similar_sentence, dissimilar_sentence) to maximize similarity for positive pairs

Modern methods use pre-trained models (BERT, RoBERTa) with supervised fine-tuning on sentence pairs from datasets like SNLI, QA pairs, or paraphrase data.

Example

"The cat sat on the mat" → embedding_1
"A feline rested on the rug" → embedding_2
cosine_similarity(embedding_1, embedding_2) = 0.82

"The weather is sunny" → embedding_3
cosine_similarity(embedding_1, embedding_3) = 0.15

Variants and history

Early approaches used simple mean pooling of word embeddings. InferSent (Conneau et al., 2017) trained bidirectional LSTMs on natural language inference. Universal Sentence Encoder (Cer et al., 2018) combined multi-task training. Sentence-BERT (Sbert.net, 2019) fine-tuned BERT with triplet and contrastive objectives, becoming the dominant open-source standard. Modern variants use knowledge distillation, multi-lingual training, and matryoshka representation learning.

When to use it

Use sentence embeddings for:

Semantic search and retrieval
Clustering and topic discovery
Paraphrase detection
Similarity assessment
Dense retrieval in RAG systems
Initial ranking before reranking

Sentence embeddings are fast and interpretable but coarser than fine-grained token representations. For fine-grained semantic tasks (token classification, entity linking), prefer token-level representations.