loose_tight_dictionary
Match things based on string similarity (using the Pair Distance algorithm) and regular expressions.
Quickstart
>> right_records = [ 'seamus', 'andy', 'ben' ]
=> [...]
>> left_record = 'Shamus Heaney'
=> [...]
>> d = LooseTightDictionary.new right_records
=> [...]
>> puts d.left_to_right left_record
=> 'seamus'
Try running the included example file:
$ ruby examples/first_name_matching.rb
Left side (input)
====================
Mr. Seamus
Sr. Andy
Master BenT
Right side (output)
====================
seamus
andy
ben
Results
====================
Left record (input) Right record (output) Prefix used (if any) Score
Mr. Seamus seamus NULL 0.666666666666667
Sr. Andy andy NULL 0.5
Master BenT ben NULL 0.2
Improving dictionaries
Similarity matching will only get you so far.
TODO: regex usage
Note on Patches/Pull Requests
-
Fork the project.
-
Make your feature addition or bug fix.
-
Add tests for it. This is important so I don’t break it in a future version unintentionally.
-
Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
-
Send me a pull request. Bonus points for topic branches.
Copyright
Copyright © 2010 Seamus Abshere. See LICENSE for details.