Class: Amatch::PairDistance

Inherits:
Object
  • Object
show all
Defined in:
ext/amatch_ext.c,
ext/amatch_ext.c

Overview

The pair distance between two strings is based on the number of adjacent character pairs, that are contained in both strings. The similiarity metric of two strings s1 and s2 is

2*|union(pairs(s1), pairs(s2))| / |pairs(s1)| + |pairs(s2)|

If it is 1.0 the two strings are an exact match, if less than 1.0 they are more dissimilar. The advantage of considering adjacent characters, is to take account not only of the characters, but also of the character ordering in the original strings.

This metric is very capable to find similarities in natural languages. It is explained in more detail in Simon White’s article “How to Strike a Match”, located at this url: www.catalysoft.com/articles/StrikeAMatch.html It is also very similar (a special case) to the method described under citeseer.lcs.mit.edu/gravano01using.html in “Using q-grams in a DBMS for Approximate String Processing.”

Instance Method Summary collapse

Constructor Details

#initializeObject

Instance Method Details

#matchObject Also known as: similar

#patternObject

call-seq: pattern -> pattern string

Returns the current pattern string of this instance.

#pattern=Object

call-seq: pattern=(pattern)

Sets the current pattern string of this instance to pattern.