Suffix
What it is
A suffix is a bound morpheme that attaches to the right end of a word stem or root. “Bound” means it cannot stand alone — -ing, -ness, and -tion are suffixes; they only exist as attachments to another form.
Suffixes fall into two functional classes that matter differently for NLP:
Inflectional suffixes express grammatical properties of a word without creating a new lexeme. They change how a word behaves in a sentence — its tense, number, or degree — but leave its core meaning and part of speech intact.
| Suffix | Function | Example |
|---|---|---|
| -s / -es | plural (noun), 3sg present (verb) | dogs, watches |
| -ed | past tense / past participle | walked |
| -ing | present participle | running |
| -er | comparative | faster |
| -est | superlative | fastest |
Derivational suffixes create a new — though related — lexeme, often shifting the part of speech. Happy (adj) + -ness → happiness (noun); nation (noun) + -al → national (adj). The new word has its own dictionary entry.
Common productive derivational suffixes in English:
-tion / -sion, -ness, -ment, -ity, -ise / -ize, -ly, -al, -ful, -less, -able / -ible
Contrast prefix — a bound morpheme at the left end of a stem (un-, re-, pre-). Prefixes exist in English but are far less relevant to IR: they rarely change meaning enough to warrant stripping, and English expresses most of its grammatical and derivational morphology on the right side of the word.
How it works
Why stemmers target suffixes
Every major English stemming algorithm — Porter, Porter2, KStem, Lovins, Paice/Husk — works by stripping or replacing suffixes. Inflectional suffixes are stripped first because they represent the most predictable variation; some algorithms then strip derivational suffixes to a deeper stem.
Suffix ordering and stripping order
When multiple suffixes stack onto a root, they must appear in a fixed morphological order and must be stripped in reverse order. Consider:
general + -ise + -ation + -s → "generalisations"
A stemmer must remove -s first, then handle -ation, then -ise, working inward to the root. Skipping order or applying rules out of sequence produces incorrect stems.
[illustrate: step-by-step suffix stripping of “generalisations” — the word shown at each stage with the active suffix highlighted and removed: generalisations → generalisation → generalise → general; each step labelled with the suffix being stripped]
Stripping vs replacement
Simple suffix stripping removes the suffix entirely. But many stemming rules instead replace a suffix with a shorter form to maintain a valid stem shape. Porter step 2 maps -ational → -ate (not → nothing), so relational → relate, not relat. Replacement preserves more structural information than bare stripping.
Orthographic complications
Adding a suffix to a stem often changes its spelling, and a stemmer must reverse these changes:
- Consonant doubling: run → running — the n doubles before -ing
- E-dropping: make → making — the final e is dropped
- Y-to-I: happy → happiness — y becomes i before -ness
Stemmers encode rules to undo each pattern, typically as part of the same step that removes the suffix.
[illustrate: three side-by-side before/after transformations — “running→run” (un-doubling), “making→make” (e-restoration), “happiness→happy” (i-to-y) — each annotated with the orthographic rule being reversed]
Example
Input: "generalisations"
Root: general
Suffixes: -ise, -ation, -s
Stripping order (outermost first):
generalisations → [strip -s] → generalisation
generalisation → [replace -ation → -ate] → generalise
generalise → [replace -ise → ""] → general
The stem general is shared by generalise, generalisation, generalisations, generalised, and generally — collapsing five index terms into one.
When to use it
Understanding suffix structure matters in three practical contexts:
Writing or debugging stemming rules. If a custom suffix rule fires incorrectly, examine whether it is stripping in the right order and whether it accounts for orthographic changes before the suffix boundary.
Choosing stemming depth. Stripping only inflectional suffixes is conservative and almost always safe. Stripping derivational suffixes increases recall but risks conflating semantically distinct terms (organ and organisation may share a stem). Choose depth based on acceptable precision loss.
Multilingual pipelines. English suffix stripping does not transfer to prefix-dominant or agglutinative languages. Finnish and Turkish stack many suffixes, requiring a full morphological analyser rather than a fixed rule list. Arabic encodes much of its morphology through root-and-pattern templates, not simple suffixes.