Cloze Task

What it is

The cloze task is a reading comprehension exercise where certain words are masked (replaced with [MASK]) and must be predicted from context. Originally a cognitive psychology task, cloze became a standard pre-training objective for neural language models like BERT. The task encourages learning of semantic and syntactic understanding without requiring labeled data.

[illustrate: Text with masked token; model predicting from context; showing correct and incorrect predictions with probabilities]

How it works

Masking: Randomly select tokens (typically 15% of input) and replace with [MASK]
Prediction: Model predicts original token from context
- Input: “The [MASK] sat on the mat”
- Target: Predict “cat”
Loss: Cross-entropy loss on masked tokens only
Scaling: Pre-train on large corpus; fine-tune on downstream tasks

Example

# Pre-training cloze task
Original: "The quick brown fox jumps over the lazy dog"

Masked (15%): "The quick [MASK] fox [MASK] over the lazy dog"

Predictions:
  Position 2 (brown): P(brown)=0.8, P(red)=0.05, ...
  Position 4 (jumps): P(jumps)=0.75, P(runs)=0.1, ...

Loss = -log P(brown) - log P(jumps) + ...

# Downstream task fine-tuning
Same model, new task: sentiment classification

Variants and history

Cloze task dates to psychology (1953). BERT (Devlin et al., 2018) popularized masked language modeling in NLP. RoBERTa improved masking strategies. ELECTRA used replaced token detection (more challenging than cloze). XLNet used autoregressive masking for directional context. Cloze remains a fundamental pre-training objective, competitive with causal language modeling.

When to use it

Use cloze objective for:

Pre-training bidirectional encoders (BERT-style)
Unsupervised learning from unlabeled text
When bidirectional context is important (understanding, not generation)
Improving transfer learning to understanding tasks
Domain-adaptive pre-training

Cloze is useful for understanding tasks but suboptimal for generation (autoregressive objectives better). Hybrid approaches combine cloze and causal objectives.