Module: Porter2

Defined in:
lib/porter2stemmer/constants.rb

Overview

Constants for the Porter 2 stemmer

Constant Summary collapse

C =

A non-vowel

"[^aeiouy]"
V =

A vowel: a e i o u y

"[aeiouy]"
CW =

A non-vowel other than w, x, or Y

"[^aeiouywxY]"
Double =

Doubles created when adding a suffix: these are undoubled when stemmed

"(bb|dd|ff|gg|mm|nn|pp|rr|tt)"
Valid_LI =

A valid letter that can come before ‘li’ (or ‘ly’)

"[cdeghkmnrt]"
SHORT_SYLLABLE =

A specification for a short syllable.

A short syllable in a word is either:

  1. a vowel followed by a non-vowel other than w, x or Y and preceded by a non-vowel, or

  2. a vowel at the beginning of the word followed by a non-vowel.

(The original document is silent on whether sequences of two or more non-vowels make a syllable long. But as this specification is only used to find sequences of non-vowel - vowel - non-vowel - end-of-word, this ambiguity does not have an effect.)

"((#{C}#{V}#{CW})|(^#{V}#{C}))"
STEP_2_MAPS =

Suffix transformations used in porter2_step2. (ogi, li endings dealt with in procedure)

{"tional" => "tion",
		 "enci" => "ence",
"anci" => "ance",
"abli" => "able",
"entli" => "ent",
"ization" => "ize",
"izer" => "ize",
"ational" => "ate",
"ation" => "ate",
"ator" => "ate",
"alism" => "al",
"aliti" => "al",
"alli" => "al",
"fulness" => "ful",
"ousli" => "ous",
"ousness" => "ous",
"iveness" => "ive",
"iviti" => "ive",
"biliti" => "ble",
"bli" => "ble",
"fulli" => "ful",
"lessli" => "less" }
STEP_3_MAPS =

Suffix transformations used in porter2_step3. (ative ending dealt with in procedure)

{"tional" => "tion",
"ational" => "ate",
"alize" => "al",
"icate" => "ic",
"iciti" => "ic",
"ical" => "ic",
"ful" => "",
"ness" => "" }
STEP_4_MAPS =

Suffix transformations used in porter2_step4. (ion ending dealt with in procedure)

{"al" => "",
"ance" => "",
"ence" => "",
"er" => "",
"ic" => "",
"able" => "",
"ible" => "",
"ant" => "",
"ement" => "",
"ment" => "",
"ent" => "",
"ism" => "",
"ate" => "",
"iti" => "",
"ous" => "",
"ive" => "",
"ize" => "" }
SPECIAL_CASES =

Special-case stemmings

{"skis" => "ski",
"skies" => "sky",
 
"dying" => "die",
"lying" => "lie",
"tying" => "tie",
"idly" =>  "idl",
"gently" => "gentl",
"ugly" => "ugli",
"early" => "earli",
"only" => "onli",
"singly" =>"singl",
 
"sky" => "sky",
"news" => "news",
"howe" => "howe",
"atlas" => "atlas",
"cosmos" => "cosmos",
"bias" => "bias",
"andes" => "andes" }
STEP_1A_SPECIAL_CASES =

Special case words to stop processing after step 1a.

%w[ inning outing canning herring earring proceed exceed succeed ]