NYSIIS

What it is

NYSIIS (New York State Identification and Intelligence System) is a phonetic algorithm that reduces a name to a short code representing how it sounds. Two names that sound alike — or are common spelling variants of each other — should produce the same code, allowing them to be matched even when the spelling differs.

It was developed in 1970 by the New York State law enforcement agency of the same name, originally to match criminal records across agencies with inconsistent name spellings. It remains in use in law enforcement databases and some US healthcare identity systems.

NYSIIS occupies a useful middle ground: more accurate than Soundex for North American names, but simpler to implement than Metaphone.

How it works

NYSIIS produces a code made entirely of letters — unlike Soundex, which uses a letter followed by digits. Vowels are not silently discarded; they are normalised to A, preserving more phonetic structure in the code.

The algorithm proceeds in stages:

  1. Translate the name’s prefix. Several opening sequences are replaced before anything else: MACMK, KNN, KC, PHF, PFF, SCHS.
  2. Translate the name’s suffix. Common endings are normalised: EE / IEY; DT / RT / RD / NT / NDD.
  3. Set the first character of the code to the (now-translated) first character of the name.
  4. Remove leading H and W from the remaining characters.
  5. Map characters left to right using a substitution table: all vowels → A; EVAF; QG; ZS; MN; KNN; KC; SCHS; PHF; H → skip if between two unlike vowels or adjacent to an unlike consonant; W → skip if preceded by a vowel.
  6. Remove adjacent duplicate characters.
  7. Truncate to 6 characters.

Example

Encoding "Johnson" and "Johnsen":

Johnson  → prefix: no change → suffix: no change → map: J-A-N-S-A-N → dedup → JANSAN
Johnsen  → prefix: no change → suffix: EN→Y? no, EN not in suffix table → map: J-A-N-S-A-N → dedup → JANSAN

Both resolve to JANSAN. Likewise, "Thomson" and "Thompson" both collapse to the same code — the suffix -pson and -son produce the same consonant skeleton after mapping.

Variants and history

The original 1970 specification caps the code at 6 characters. Several implementations relax this limit to improve precision when distinguishing longer names. A refined variant published by Pfister and Ricks (2011) adjusted a handful of the substitution rules to reduce false matches while preserving recall for common English name variants.

When to use it

Reach for NYSIIS when:

  • Your dataset is primarily North American English-language names.
  • Soundex is producing too many false matches or missing obvious variants (particularly -son/-sen and -mann/-man suffixes).
  • You want a simple, fast algorithm without the linguistic complexity of Metaphone.

NYSIIS is a poor fit for non-Latin scripts, highly inflected European names, or names with consistent spelling where exact matching suffices.

In Python, jellyfish provides a straightforward implementation:

import jellyfish

jellyfish.nysiis("Thompson")  # → 'TANPSAN'
jellyfish.nysiis("Thomson")   # → 'TANSAN'

In Elasticsearch / OpenSearch, enable it via the analysis-phonetic plugin:

{
  "filter": {
    "nysiis_filter": {
      "type": "phonetic",
      "encoder": "nysiis",
      "replace": true
    }
  }
}

See also

  • Phonetic Encoding — overview of the phonetic encoding family
  • Soundex — the earlier algorithm NYSIIS was designed to improve upon
  • Metaphone — a more linguistically sophisticated alternative