Class: Tokipona::Tokenizer

Inherits:
Object
  • Object
show all
Defined in:
lib/tokipona/tokenizer.rb

Overview

Splits text into tokens (words and punctuations).

Examples:

Tokipona::Tokenizer.tokenize("mi pona anu seme?")
# => ["mi", "pona", "anu", "seme", "?"]

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(text) ⇒ Tokenizer

Returns a new instance of Tokenizer.



15
16
17
# File 'lib/tokipona/tokenizer.rb', line 15

def initialize(text)
  @text = text
end

Class Method Details

.tokenize(text) ⇒ Array<String>

Parameters:

  • text (String)

Returns:

  • (Array<String>)


11
12
13
# File 'lib/tokipona/tokenizer.rb', line 11

def self.tokenize(text)
  new(text).tokenize
end

Instance Method Details

#tokenizeObject



19
20
21
# File 'lib/tokipona/tokenizer.rb', line 19

def tokenize
  @text.scan(/\w+|[^\s]/)
end