Beider-Morse Phonetic Matching
What it is
Beider-Morse Phonetic Matching (BMPM) is a phonetic encoding algorithm built specifically for Jewish surnames — particularly Ashkenazic names — which appear across dozens of languages and orthographic traditions. It was developed by Alexander Beider and Stephen Morse, with roots in Beider’s 2001 book A Dictionary of Ashkenazic Given Names. The full algorithm and its rules are published openly at stevemorse.org.
Where Soundex reduces a name to four characters and Daitch-Mokotoff Soundex extends that to digit codes tuned for Eastern European names, BMPM goes further still: it models the actual phonological rules of each supported language to generate rich phoneme-sequence codes.
How it works
BMPM operates in three broad stages.
1. Language detection. The algorithm inspects the input string for character patterns, letter combinations, and structural cues to estimate which language (or languages) the name most likely originates from. Supported languages include Yiddish, Hebrew, Russian, Polish, German, French, Hungarian, Romanian, and English, among others.
2. Rule application. Language-specific phonological rules are applied to convert the input into one or more phonetic codes. Each code is a string of phoneme tokens — not digits — representing a plausible pronunciation. Because a name may be ambiguous between languages, the algorithm can produce multiple codes.
3. Matching. Two names match if any of their generated codes overlap. This means the comparison is set-intersection rather than string equality.
[illustrate: BMPM encoding of “Shapiro” — language detection step highlighting letter patterns, then rule application producing multiple phoneme-sequence codes for Polish, Russian, and German branches, with two names matching where their code sets intersect]
BMPM has two primary operating modes:
- Ashkenazic mode — tuned for Eastern European Jewish names; applies the phonological patterns of Yiddish, Russian, Polish, and related languages.
- Generic mode — broader coverage including Sephardic and other name traditions.
Within each mode, rules can be approximate (tolerating phonological variation) or exact (strict phoneme matching).
Example
The surname Jablonsky might appear in records as Yablonski, Jabłoński, or Yablonskiy depending on whether the record was written by a Polish clerk, a Yiddish speaker, or a Russian bureaucrat. BMPM encodes all of these into overlapping phoneme sequences, allowing them to match despite the surface spelling differences.
By contrast, Soundex maps Jablonsky and Yablonski to different codes — no match.
[illustrate: before/after table — four spelling variants of “Jablonsky” mapped to their Soundex codes (no overlap) versus their BMPM code sets (overlapping intersection highlighted)]
Variants and history
Beider published the foundational name research in 2001. The joint Beider-Morse algorithm formalised the computational rules and made them available as open-source rule tables. Implementations exist in PHP (the reference implementation), Python (the phonetics and bmpm packages), and Java.
When to use it
BMPM is the right choice when:
- Your dataset contains Ashkenazic Jewish surnames from historical records spanning multiple countries and languages.
- You need to reconcile names that passed through transliteration from Hebrew, Yiddish, or Cyrillic scripts into Latin orthography.
- You are building or querying a genealogy database (Ancestry, MyHeritage, JRI-Poland, and similar platforms all serve this use case).
It is available in Elasticsearch via the analysis-phonetic plugin:
{
"filter": {
"my_beider_morse": {
"type": "phonetic",
"encoder": "beider_morse",
"name_type": "ASHKENAZI",
"rule_type": "APPROX"
}
}
}
name_type accepts ASHKENAZI, GENERIC, or SEPHARDIC. rule_type accepts APPROX or EXACT.
BMPM is significantly more computationally expensive than Soundex or Double Metaphone. Outside the Jewish genealogy context — general English-language name search, product catalogues, address matching — it is overkill. Double Metaphone handles most Western European name variation with far less overhead.