Few-Shot Learning

What it is

Few-shot learning (or in-context learning) is the ability of a language model to perform a new task after seeing only a few examples (typically 1–10) in the prompt, without any parameter updates. This capability emerges with scale in large models; small models require explicit fine-tuning on many examples.

[illustrate: Prompt with 3 examples (input-output pairs); new test input; model predicting output without retraining]

How it works

Prepare examples: Collect 1–5 representative examples of input-output pairs for target task
Construct prompt:
- System instruction describing the task
- Example pairs formatted consistently
- New test input in same format
Query model: Run prompt through language model; extract prediction
No training: No gradient updates; purely inference-time adaptation

Example

Prompt for sentiment classification:

"Classify the sentiment of the following reviews as positive or negative.

Examples:
Review: This movie was great!
Sentiment: positive

Review: Terrible waste of time.
Sentiment: negative

Review: The acting was decent.
Sentiment: "

Model output: "positive" (or "negative" or uncertain)

Variants and history

Few-shot learning was demonstrated with GPT-2 (2019) and became prominent with GPT-3 (2020). Scaling laws showed performance improves predictably with model size and number of examples. Meta-learning (MAML, Prototypical Networks) studied few-shot in traditional supervised learning. In-context learning mechanisms (how transformers use examples) are still under investigation. Retrieval-augmented prompting combines few-shot with retrieved relevant examples.

When to use it

Use few-shot learning when:

Few labeled examples exist (< 100)
Quick task adaptation is needed
You have access to capable models (GPT-3.5+)
Fine-tuning infrastructure is unavailable
Task description in natural language is possible

Few-shot learning is unpredictable; some tasks work well, others fail. Task difficulty and example quality matter. For reliable performance, fine-tuning or larger models usually needed.