Name Finder
Find names from a know list in a text, taking account of names that may overlap. For example, Waterloo and Waterloo East are separate stations; NameFinder, knowing both, will not give a false match for Waterloo in a text that mentions Waterloo East.
Examples
require "name_finder"
stations = [
"Bermondsey",
"South Bermondsey",
"Southwark",
"Waterloo",
"Waterloo East"
]
nf = NameFinder.new
stations.each do |station|
nf.add station
end
It can find the best matching name even when one name is the same as part of another, whether they overlap at the start:
nf.find_in "Change here for trains from Waterloo East"
# => "Waterloo East"
nf.find_in "This train terminates at Waterloo"
# => "Waterloo"
or at the end:
nf.find_in "Escalator closed at Bermondsey station"
# => "Bermondsey"
nf.find_in "Use South Bermondsey station for Millwall FC"
# => "South Bermondsey"
It can also find all the matching names, without false positives for names that are part of a longer name:
nf.find_all_in "South Bermondsey and Waterloo East"
# => ["South Bermondsey", "Waterloo East"]
Names that are part of a longer name are still found when listed separately, however:
nf.find_all_in "South Bermondsey and Bermondsey"
# => ["South Bermondsey", "Bermondsey"]
Limitations
The present implementation handles only the letters A-Z. This can be customised
by subclassing NameFinder
and changing the implementation of normalize
.
The normalize
method must use the same delimiter between words as is returned
by the delimiter
method (normally a single space).