Zero-Shot Learning

What it is

Zero-shot learning is the ability to perform a task based solely on a task description or instruction, without any training examples in the prompt. For example, a language model can translate English to French given only the instruction “Translate English to French:” without any (English, French) pairs. This capability emerges with sufficient pre-training scale.

[illustrate: Model processing only task instruction and test input; no examples shown; generating output]

How it works

Pre-training: Model trained on massive diverse data, learning rich representations and instruction-following
Task description: Input describes task clearly but shows no examples
- “Translate to French:”
- “Summarize:”
- “Classify sentiment:”
Inference: Model applies learned knowledge to generate output for new input
No examples: Unlike few-shot, zero-shot has no input-output pairs in prompt

Example

Task description: "Translate English to French:"
Test input: "Good morning"
Model output: "Bon matin" (or "Bonjour")

---

Task: "Classify this movie review as positive or negative:"
Test input: "Amazing film, highly recommended"
Model output: "positive"

---

Task: "Summarize this article in one sentence:"
Test input: "[long article text]"
Model output: "[one-sentence summary]"

Variants and history

Zero-shot learning was demonstrated with BERT on unseen domains (transfer learning). GPT-2/3 showed true zero-shot instruction-following. Instruction-tuned models (InstructGPT, GPT-3.5, Claude) improved zero-shot through explicit instruction-following training. Prompt templates and task formatting influence zero-shot success. Zero-shot is less reliable than few-shot but more convenient.

When to use it

Use zero-shot learning when:

No labeled data available
Speed of deployment is critical
Task description is clear
You have capable models (GPT-3.5+)
You can tolerate lower accuracy for convenience

Zero-shot is unreliable for complex tasks but excellent for simple, intuitive ones. Few-shot or fine-tuning dramatically improve results when examples are available.