Module: RMMSeg::Dictionary

Defined in:
lib/rmmseg/dictionary.rb

Class Attribute Summary collapse

Class Method Summary collapse

Class Attribute Details

.dictionariesObject

An array of dictionaries used by RMMSeg. Each entry is of the following form:

[type, path]

where type can either :chars or :words. path is the path to the dictionary file.

The format of :chars dictionary is a collection of lines of the following form:

freq char

Where frequency is a number less than 65535. char is the character. They are spearated by exactly one space.

The format of :words dictionary is similar:

length word

except the first number is not the frequency, but the number of characters (not number of bytes) in the word.

There’s a script (convert.rb) in the tools directory that can be used to convert and normalize dictionaries.



37
38
39
# File 'lib/rmmseg/dictionary.rb', line 37

def dictionaries
  @dictionaries
end

Class Method Details

.add_dictionary(path, type) ⇒ Object

Add a user defined dictionary, type can be :chars or :words. See doc of dictionaries.



41
42
43
# File 'lib/rmmseg/dictionary.rb', line 41

def add_dictionary(path, type)
  @dictionaries << [type, path]
end

.load_dictionariesObject

Load dictionaries. Call this method after set up the path of the dictionaries needed to load and before any Algorithm object is created.



48
49
50
51
52
53
54
55
56
# File 'lib/rmmseg/dictionary.rb', line 48

def load_dictionaries()
  @dictionaries.each do |type, path|
    if type == :chars
      load_chars(path)
    elsif type == :words
      load_words(path)
    end
  end
end