Class: Nimono::Cabocha

Inherits:
FFI::AutoPointer
  • Object
show all
Includes:
CabochaLib, OptionParse
Defined in:
lib/nimono/nimono.rb

Overview

‘Cabocha` is a class providing an interface to the CaboCha library. In this class the arguments supported by CaboCha can be used in almost the same way.

Constant Summary

Constants included from CabochaLib

Nimono::CabochaLib::CABOCHA_PATH

Constants included from OptionParse

OptionParse::SUPPORT_OPTS

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from CabochaLib

cabocha_library, included

Methods included from OptionParse

included

Constructor Details

#initialize(options = {}) ⇒ Cabocha

Initializes the CaboCha with the given ‘options’. options is given as a string (CaboCha command line arguments) or as a Ruby-style hash.

Options supported are:

  • :output_format

  • :input_layer

  • :output_layer

  • :ne

  • :parser_model

  • :chunker_model

  • :ne_model

  • :posset

  • :charset

  • :charset_file

  • :rcfile

  • :mecabrc

  • :mecab_dicdir

  • :mecab_userdic

  • :output

<p>CaboCha command line arguments (-f1) or long (–output-format=1) may be used in addition ot Ruby-style hashs</p>

e.g.<br />

require 'nimono'

nc = Nimono::Cabocha.new(output_format: 1)
or nc = Nimono::Cabocha.new('-f1')

=> #<Nimono::Cabocha:0x6364e48d
  @sparse_tostr=#<Proc:0x74d917f5@/home/foo/nimono/lib/nimono/nimono.rb:54 (lambda)>,
  @libpath="/usr/local/lib/libcabocha.so",
  @options={:output_format=>1},
  @tree=#<FFI::Pointer address=0x7f6ecc2e3790>,
  @parser=#<FFI::Pointer address=0x7f6ecc2e3830>>

puts nc.parse('太郎は花子が読んでいる本を次郎に渡した')
太郎    名詞,固有名詞,人名,名,*,*,太郎,タロウ,タロー
は      助詞,係助詞,*,*,*,*,は,ハ,ワ
* 1 2D 0/1 1.700175
花子    名詞,固有名詞,人名,名,*,*,花子,ハナコ,ハナコ
が      助詞,格助詞,一般,*,*,*,が,ガ,ガ
* 2 3D 0/2 1.825021
読ん    動詞,自立,*,*,五段・マ行,連用タ接続,読む,ヨン,ヨン
で      助詞,接続助詞,*,*,*,*,で,デ,デ
いる    動詞,非自立,*,*,一段,基本形,いる,イル,イル
* 3 5D 0/1 -0.742128
本      名詞,一般,*,*,*,*,本,ホン,ホン
を      助詞,格助詞,一般,*,*,*,を,ヲ,ヲ
* 4 5D 1/2 -0.742128
次      名詞,一般,*,*,*,*,次,ツギ,ツギ
郎      名詞,一般,*,*,*,*,郎,ロウ,ロー
に      助詞,格助詞,一般,*,*,*,に,ニ,ニ
* 5 -1D 0/1 0.000000
渡し    動詞,自立,*,*,五段・サ行,連用形,渡す,ワタシ,ワタシ
た      助動詞,*,*,*,特殊・タ,基本形,た,タ,タ
EOS
=> nil

Parameters:

  • options (Hash, String) (defaults to: {})

    the CaboCha options

Raises:

  • (CabochaError)

    if Cabocha cannot be initialized with the given ‘options’



93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# File 'lib/nimono/nimono.rb', line 93

def initialize(options={})
  @options = self.class.parse_options(options)
  opt_str = self.class.build_options_str(@options)
  @libpath = self.class.cabocha_library

  @parser = self.class.cabocha_new2(opt_str)
  super @parser

  if @parser.address == 0x0
    raise CabochaError.new("Could not initialize CaboCha with options: '#{opt_str}'")
  end
  @tree = self.class.cabocha_sparse_totree(@parser, "")

  if @options[:output_layer]
    self.class.cabocha_tree_set_output_layer(@tree, @options[:output_layer])
  end

  @sparse_tostr = ->(text) {
    begin
      self.class.cabocha_sparse_tostr(@parser, text).force_encoding(Encoding.default_external)
    rescue
      raise CabochaError.new 'Parse Error'
    end
  }
end

Instance Attribute Details

#chunksArray (readonly)

Returns Array of chunk.

Returns:

  • (Array)

    Array of chunk



20
21
22
# File 'lib/nimono/nimono.rb', line 20

def chunks
  @chunks
end

#libpathString (readonly)

Returns absolute file path to CaboCha library.

Returns:

  • (String)

    absolute file path to CaboCha library



18
19
20
# File 'lib/nimono/nimono.rb', line 18

def libpath
  @libpath
end

#optionsHash (readonly)

Returns CaboCha options as Key-Value pairs.

Returns:

  • (Hash)

    CaboCha options as Key-Value pairs



16
17
18
# File 'lib/nimono/nimono.rb', line 16

def options
  @options
end

#tokensArray (readonly)

Returns Array of Token.

Returns:

  • (Array)

    Array of Token



23
24
25
# File 'lib/nimono/nimono.rb', line 23

def tokens
  @tokens
end

Class Method Details

.release(ptr) ⇒ Object



25
26
27
# File 'lib/nimono/nimono.rb', line 25

def self.release(ptr)
  self.class.cabocha_destroy(ptr)
end

Instance Method Details

#parse(text) ⇒ String

Parses the given ‘text`, returning the CaboCha output as a string. At the same time creating #chunks and #tokens.

Parameters:

  • text (String)

    the japanese text to parse

Returns:

  • (String)

    parsing result from CaoboCha

Raises:

  • (CabochaError)

    if the Cabocha cannot parse the given ‘text`

  • (ArgumentError)

    if the given string ‘text` argument is `null`



125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
# File 'lib/nimono/nimono.rb', line 125

def parse(text)
  if text.nil?
    raise CabochaError.new 'Text to parse cannot be nil'
  else
    @result = @sparse_tostr.call(text)
    @tree = self.class.cabocha_sparse_totree(@parser, text)

    @tokens = []
    self.class.cabocha_tree_token_size(@tree).times do |i|
      @tokens << Nimono::Token.new(self.class.cabocha_tree_token(@tree, i))
    end
    @tokens.freeze

    @chunks = []
    @tokens.each {|token| @chunks << token.chunk unless token.chunk.nil?}
    @chunks.each_with_index do |chunk, index|
      tokens = []
      chunk.token_size.times do |i|
        tokens << @tokens[chunk.token_pos + i]
      end
      chunk.instance_variable_set(:@tokens, tokens)
      chunk.instance_variable_set(:@id, index)
    end
    @chunks.freeze

    self.to_s
  end
end

#to_sString

The result of parsing Japanese text

Returns:

  • (String)

    parsing result



156
157
158
# File 'lib/nimono/nimono.rb', line 156

def to_s
  @result
end