Class: Ferret::Analysis::LetterTokenizer
- Inherits:
-
Object
- Object
- Ferret::Analysis::LetterTokenizer
- Defined in:
- ext/r_analysis.c
Overview
Summary
A LetterTokenizer is a tokenizer that divides text at non-letters. That is to say, it defines tokens as maximal strings of adjacent letters, as defined by the regular expression _/[]+/_ where [:alpha] matches all characters in your local locale.
Example
"Dave's résumé, at http://www.davebalmain.com/ 1234"
=> ["Dave", "s", "résumé", "at", "http", "www", "davebalmain", "com"]