Class: Amatch::PairDistance
- Inherits:
-
Object
- Object
- Amatch::PairDistance
- Defined in:
- ext/amatch.c,
ext/amatch.c
Overview
The pair distance between two strings is based on the number of adjacent character pairs, that are contained in both strings. The similiarity metric of two strings s1 and s2 is
2*|union(pairs(s1), pairs(s2))| / |pairs(s1)| + |pairs(s2)|
If it is 1.0 the two strings are an exact match, if less than 1.0 they are more dissimilar. The advantage of considering adjacent characters, is to take account not only of the characters, but also of the character ordering in the original strings.
This metric is very capable to find similarities in natural languages. It is explained in more detail in Simon White’s article “How to Strike a Match”, located at this url: www.catalysoft.com/articles/StrikeAMatch.html It is also very similar (a special case) to the method described under citeseer.lcs.mit.edu/gravano01using.html in “Using q-grams in a DBMS for Approximate String Processing.”
Instance Method Summary collapse
- #initialize ⇒ Object constructor
- #match ⇒ Object (also: #similar)
-
#pattern ⇒ Object
call-seq: pattern -> pattern string.
-
#pattern= ⇒ Object
call-seq: pattern=(pattern).
Constructor Details
#initialize ⇒ Object
Instance Method Details
#match ⇒ Object Also known as: similar
#pattern ⇒ Object
call-seq: pattern -> pattern string
Returns the current pattern string of this instance.
#pattern= ⇒ Object
call-seq: pattern=(pattern)
Sets the current pattern string of this instance to pattern
.