Module: English::Metaphone

Extended by:
Metaphone
Included in:
Metaphone
Defined in:
lib/gems/english-0.3.1/lib/english/metaphone.rb

Overview

Metaphone encodes names into a phonetic form such that similar-sounding names

have the same or similar Metaphone encodings.

The original system was described by Lawrence Philips in Computer Language
Vol. 7 No. 12, December 1990, pp 39-43.

As there are multiple implementations of Metaphone, each with their own
quirks, I have based this on my interpretation of the algorithm specification.
Even LP's original BASIC implementation appears to contain bugs (specifically
with the handling of CC and MB), when compared to his explanation of the
algorithm.

I have also compared this implementation with that found in PHP's standard
library, which appears to mimic the behaviour of LP's original BASIC
implementation. For compatibility, these rules can also be used by passing
:alternate=>true to the methods.

Constant Summary collapse

RULES =

Metaphone rules. These are simply applied in order.

[
  # Regexp, replacement
  [ /([bcdfhjklmnpqrstvwxyz])\1+/,
                      '\1' ],  # Remove doubled consonants except g.
                              # [PHP] remove c from regexp.
  [ /^ae/,            'E' ],
  [ /^[gkp]n/,        'N' ],
  [ /^wr/,            'R' ],
  [ /^x/,             'S' ],
  [ /^wh/,            'W' ],
  [ /mb$/,            'M' ],  # [PHP] remove $ from regexp.
  [ /(?!^)sch/,      'SK' ],
  [ /th/,             '0' ],
  [ /t?ch|sh/,        'X' ],
  [ /c(?=ia)/,        'X' ],
  [ /[st](?=i[ao])/,  'X' ],
  [ /s?c(?=[iey])/,   'S' ],
  [ /[cq]/,           'K' ],
  [ /dg(?=[iey])/,    'J' ],
  [ /d/,              'T' ],
  [ /g(?=h[^aeiou])/, ''  ],
  [ /gn(ed)?/,        'N' ],
  [ /([^g]|^)g(?=[iey])/,
                    '\1J' ],
  [ /g+/,             'K' ],
  [ /ph/,             'F' ],
  [ /([aeiou])h(?=\b|[^aeiou])/,
                      '\1' ],
  [ /[wy](?![aeiou])/, '' ],
  [ /z/,              'S' ],
  [ /v/,              'F' ],
  [ /(?!^)[aeiou]+/,  ''  ],
]
LP_RULES =

The rules for the ‘buggy’ alternate implementation used by PHP etc.

RULES.dup

Instance Method Summary collapse

Instance Method Details

#metaphone(str, alt = nil) ⇒ Object

Returns the Metaphone representation of a string. If the string contains multiple words, each word in turn is converted into its Metaphone representation. Note that only the letters A-Z are supported, so any language-specific processing should be done beforehand.

If alt is set to true, alternate ‘buggy’ rules are used.



82
83
84
# File 'lib/gems/english-0.3.1/lib/english/metaphone.rb', line 82

def metaphone(str, alt=nil)
  return str.strip.split(/\s+/).map{ |w| metaphone_word(w, alt) }.join(' ')
end