Cloze Task
What it is
The cloze task is a reading comprehension exercise where certain words are masked (replaced with [MASK]) and must be predicted from context. Originally a cognitive psychology task, cloze became a standard pre-training objective for neural language models like BERT. The task encourages learning of semantic and syntactic understanding without requiring labeled data.
[illustrate: Text with masked token; model predicting from context; showing correct and incorrect predictions with probabilities]
How it works
-
Masking: Randomly select tokens (typically 15% of input) and replace with [MASK]
-
Prediction: Model predicts original token from context
- Input: “The [MASK] sat on the mat”
- Target: Predict “cat”
-
Loss: Cross-entropy loss on masked tokens only
-
Scaling: Pre-train on large corpus; fine-tune on downstream tasks
Example
# Pre-training cloze task
Original: "The quick brown fox jumps over the lazy dog"
Masked (15%): "The quick [MASK] fox [MASK] over the lazy dog"
Predictions:
Position 2 (brown): P(brown)=0.8, P(red)=0.05, ...
Position 4 (jumps): P(jumps)=0.75, P(runs)=0.1, ...
Loss = -log P(brown) - log P(jumps) + ...
# Downstream task fine-tuning
Same model, new task: sentiment classification
Variants and history
Cloze task dates to psychology (1953). BERT (Devlin et al., 2018) popularized masked language modeling in NLP. RoBERTa improved masking strategies. ELECTRA used replaced token detection (more challenging than cloze). XLNet used autoregressive masking for directional context. Cloze remains a fundamental pre-training objective, competitive with causal language modeling.
When to use it
Use cloze objective for:
- Pre-training bidirectional encoders (BERT-style)
- Unsupervised learning from unlabeled text
- When bidirectional context is important (understanding, not generation)
- Improving transfer learning to understanding tasks
- Domain-adaptive pre-training
Cloze is useful for understanding tasks but suboptimal for generation (autoregressive objectives better). Hybrid approaches combine cloze and causal objectives.