Zero-Shot Learning
What it is
Zero-shot learning is the ability to perform a task based solely on a task description or instruction, without any training examples in the prompt. For example, a language model can translate English to French given only the instruction “Translate English to French:” without any (English, French) pairs. This capability emerges with sufficient pre-training scale.
[illustrate: Model processing only task instruction and test input; no examples shown; generating output]
How it works
-
Pre-training: Model trained on massive diverse data, learning rich representations and instruction-following
-
Task description: Input describes task clearly but shows no examples
- “Translate to French:”
- “Summarize:”
- “Classify sentiment:”
-
Inference: Model applies learned knowledge to generate output for new input
-
No examples: Unlike few-shot, zero-shot has no input-output pairs in prompt
Example
Task description: "Translate English to French:"
Test input: "Good morning"
Model output: "Bon matin" (or "Bonjour")
---
Task: "Classify this movie review as positive or negative:"
Test input: "Amazing film, highly recommended"
Model output: "positive"
---
Task: "Summarize this article in one sentence:"
Test input: "[long article text]"
Model output: "[one-sentence summary]"
Variants and history
Zero-shot learning was demonstrated with BERT on unseen domains (transfer learning). GPT-2/3 showed true zero-shot instruction-following. Instruction-tuned models (InstructGPT, GPT-3.5, Claude) improved zero-shot through explicit instruction-following training. Prompt templates and task formatting influence zero-shot success. Zero-shot is less reliable than few-shot but more convenient.
When to use it
Use zero-shot learning when:
- No labeled data available
- Speed of deployment is critical
- Task description is clear
- You have capable models (GPT-3.5+)
- You can tolerate lower accuracy for convenience
Zero-shot is unreliable for complex tasks but excellent for simple, intuitive ones. Few-shot or fine-tuning dramatically improve results when examples are available.