# Module: Edits::JaroWinkler

Defined in:
lib/edits/jaro_winkler.rb

## Overview

Implements Jaro-Winkler similarity algorithm.

## Constant Summary collapse

WINKLER_PREFIX_WEIGHT =

Prefix scaling factor for jaro-winkler metric. Default is 0.1 Should not exceed 0.25 or metric range will leave 0..1

`0.1`
WINKLER_THRESHOLD =

Threshold for boosting Jaro with Winkler prefix multiplier. Default is 0.7

`0.7`

## Class Method Summary collapse

• Calculate Jaro-Winkler distance.

• Calculate Jaro-Winkler similarity of given strings.

## Class Method Details

### .distance(seq1, seq2, threshold: WINKLER_THRESHOLD, weight: WINKLER_PREFIX_WEIGHT) ⇒ Float

Note:

Not a true distance metric, fails to satisfy triangle inequality.

Calculate Jaro-Winkler distance

Examples:

``````Edits::JaroWinkler.distance("information", "informant")
# => 0.05858585858585863``````

Returns:

• (Float)

distance, from 0.0 (identical) to 1.0 (distant)

 ``` 63 64 65 66 67 68 69``` ```# File 'lib/edits/jaro_winkler.rb', line 63 def self.distance( seq1, seq2, threshold: WINKLER_THRESHOLD, weight: WINKLER_PREFIX_WEIGHT ) 1.0 - similarity(seq1, seq2, threshold: threshold, weight: weight) end```

### .similarity(seq1, seq2, threshold: WINKLER_THRESHOLD, weight: WINKLER_PREFIX_WEIGHT) ⇒ Float

Calculate Jaro-Winkler similarity of given strings

Adds weight to Jaro similarity according to the length of a common prefix of up to 4 letters, where exists. The additional weighting is only applied when the original similarity passes a threshold.

`Sw = Sj + (l * p * (1 - Dj))`

Where `Sj` is Jaro, `l` is prefix length, and `p` is prefix weight

Examples:

``````Edits::JaroWinkler.similarity("information", "informant")
# => 0.9414141414141414``````

Parameters:

• seq1 (String, Array)
• seq2 (String, Array)
• threshold (Float) (defaults to: WINKLER_THRESHOLD)

threshold for applying Winkler prefix weighting

• weight (Float) (defaults to: WINKLER_PREFIX_WEIGHT)

weighting for common prefix, should not exceed 0.25

Returns:

• (Float)

similarity, from 0.0 (none) to 1.0 (identical)

 ``` 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52``` ```# File 'lib/edits/jaro_winkler.rb', line 35 def self.similarity( seq1, seq2, threshold: WINKLER_THRESHOLD, weight: WINKLER_PREFIX_WEIGHT ) sj = Jaro.similarity(seq1, seq2) return sj unless sj > threshold # size of common prefix, max 4 max_bound = seq1.length > seq2.length ? seq2.length : seq1.length max_bound = 4 if max_bound > 4 l = 0 l += 1 until seq1[l] != seq2[l] || l >= max_bound l < 1 ? sj : sj + (l * weight * (1 - sj)) end```