Class: RMMSeg::SimpleAlgorithm
- Inherits:
-
Object
- Object
- RMMSeg::SimpleAlgorithm
- Includes:
- Algorithm
- Defined in:
- lib/rmmseg/simple_algorithm.rb
Constant Summary
Constants included from Algorithm
Instance Method Summary collapse
-
#get_cjk_word ⇒ Object
Get the most proper CJK word.
-
#initialize(text, token = Token) ⇒ SimpleAlgorithm
constructor
Create a new SimpleAlgorithm .
Methods included from Algorithm
#basic_latin?, #find_match_words, #get_basic_latin_word, #next_token, #nonword_char?, #segment
Constructor Details
#initialize(text, token = Token) ⇒ SimpleAlgorithm
Create a new SimpleAlgorithm . The only rule used by this algorithm is MMRule .
10 11 12 |
# File 'lib/rmmseg/simple_algorithm.rb', line 10 def initialize(text, token=Token) super end |
Instance Method Details
#get_cjk_word ⇒ Object
Get the most proper CJK word.
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
# File 'lib/rmmseg/simple_algorithm.rb', line 15 def get_cjk_word dic = Dictionary.instance i = Config.max_word_length if i + @index > @chars.length i = @chars.length - @index end chars = @chars[@index, i] word = chars.join while i > 1 && !dic.has_word?(word) i -= 1 word.slice!(-chars[i].size,chars[i].size) # truncate last char end token = @token.new(word, @byte_index, @byte_index+word.size) @index += i @byte_index += word.size return token end |