Name Finder

Find names from a know list in a text, taking account of names that may overlap. For example, Waterloo and Waterloo East are separate stations; NameFinder, knowing both, will not give a false match for Waterloo in a text that mentions Waterloo East.

Examples

require "name_finder"

stations = [
  "Bermondsey",
  "South Bermondsey",
  "Southwark",
  "Waterloo",
  "Waterloo East"
]

nf = NameFinder.new
stations.each do |station|
  nf.add station
end

It can find the best matching name even when one name is the same as part of another, whether they overlap at the start:

nf.find_in "Change here for trains from Waterloo East"
# => "Waterloo East"

nf.find_in "This train terminates at Waterloo"
# => "Waterloo"

or at the end:

nf.find_in "Escalator closed at Bermondsey station"
# => "Bermondsey"

nf.find_in "Use South Bermondsey station for Millwall FC"
# => "South Bermondsey"

It can also find all the matching names, without false positives for names that are part of a longer name:

nf.find_all_in "South Bermondsey and Waterloo East"
# => ["South Bermondsey", "Waterloo East"]

Names that are part of a longer name are still found when listed separately, however:

nf.find_all_in "South Bermondsey and Bermondsey"
# => ["South Bermondsey", "Bermondsey"]

Limitations

The present implementation handles only the letters A-Z. This can be customised by subclassing NameFinder and changing the implementation of normalize. The normalize method must use the same delimiter between words as is returned by the delimiter method (normally a single space).