Few-Shot Learning
What it is
Few-shot learning (or in-context learning) is the ability of a language model to perform a new task after seeing only a few examples (typically 1–10) in the prompt, without any parameter updates. This capability emerges with scale in large models; small models require explicit fine-tuning on many examples.
[illustrate: Prompt with 3 examples (input-output pairs); new test input; model predicting output without retraining]
How it works
-
Prepare examples: Collect 1–5 representative examples of input-output pairs for target task
-
Construct prompt:
- System instruction describing the task
- Example pairs formatted consistently
- New test input in same format
-
Query model: Run prompt through language model; extract prediction
-
No training: No gradient updates; purely inference-time adaptation
Example
Prompt for sentiment classification:
"Classify the sentiment of the following reviews as positive or negative.
Examples:
Review: This movie was great!
Sentiment: positive
Review: Terrible waste of time.
Sentiment: negative
Review: The acting was decent.
Sentiment: "
Model output: "positive" (or "negative" or uncertain)
Variants and history
Few-shot learning was demonstrated with GPT-2 (2019) and became prominent with GPT-3 (2020). Scaling laws showed performance improves predictably with model size and number of examples. Meta-learning (MAML, Prototypical Networks) studied few-shot in traditional supervised learning. In-context learning mechanisms (how transformers use examples) are still under investigation. Retrieval-augmented prompting combines few-shot with retrieved relevant examples.
When to use it
Use few-shot learning when:
- Few labeled examples exist (< 100)
- Quick task adaptation is needed
- You have access to capable models (GPT-3.5+)
- Fine-tuning infrastructure is unavailable
- Task description in natural language is possible
Few-shot learning is unpredictable; some tasks work well, others fail. Task difficulty and example quality matter. For reliable performance, fine-tuning or larger models usually needed.