Class: Natto::MeCabNode

Inherits:
MeCabStruct show all
Defined in:
lib/natto/struct.rb

Overview

MeCabNode is a wrapper for the struct mecab_node_t structure holding the parsed node.

Values for the MeCab node attributes may be obtained by using the following Symbols as keys to the layout associative array of FFI::Struct members.

  • :prev - pointer to previous node
  • :next - pointer to next node
  • :enext - pointer to the node which ends at the same position
  • :bnext - pointer to the node which starts at the same position
  • :rpath - pointer to the right path; nil if MECAB_ONE_BEST mode
  • :lpath - pointer to the right path; nil if MECAB_ONE_BEST mode
  • :surface - surface string; length may be obtained with length/rlength members
  • :feature - feature string
  • :id - unique node id
  • :length - length of surface form
  • :rlength - length of the surface form including white space before the morph
  • :rcAttr - right attribute id
  • :lcAttr - left attribute id
  • :posid - part-of-speech id
  • :char_type - character type
  • :stat - node status; 0 (NOR), 1 (UNK), 2 (BOS), 3 (EOS), 4 (EON)
  • :isbest - 1 if this node is best node
  • :alpha - forward accumulative log summation, only with marginal probability flag
  • :beta - backward accumulative log summation, only with marginal probability flag
  • :prob - marginal probability, only with marginal probability flag
  • :wcost - word cost
  • :cost - best accumulative cost from bos node to this node

Usage

An instance of MeCabNode is yielded to the block used with MeCab#parse, where the above-mentioned node attributes may be accessed by name.

nm = Natto::MeCab.new

nm.parse('卓球なんて死ぬまでの暇つぶしだよ。') do |n| 
  puts "#{n.surface}\t#{n.cost}" if n.is_nor? 
end
卓球     2874
なんて    4398
死ぬ     9261
まで     9386
       10007
暇つぶし 13324
       15346
       14396
       10194

While it is also possible to use the Symbol for the MeCab node member to index into the FFI::Struct layout associative array, please use the attribute accessors. In the case of :surface and :feature, MeCab returns the raw bytes, so natto will convert that into a string using the default encoding.

Constant Summary collapse

NOR_NODE =

Normal MeCab node defined in the dictionary, c.f. stat.

0
UNK_NODE =

Unknown MeCab node not defined in the dictionary, c.f. stat.

1
BOS_NODE =

Virtual node representing the beginning of the sentence, c.f. stat.

2
EOS_NODE =

Virutual node representing the end of the sentence, c.f. stat.

3
EON_NODE =

Virtual node representing the end of an N-Best MeCab node list, c.f. stat.

4

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods inherited from MeCabStruct

#method_missing

Constructor Details

#initialize(nptr) ⇒ MeCabNode

Initializes this node instance. Sets the MeCab feature value for this node.

Parameters:

  • nptr (FFI::Pointer)

    pointer to MeCab node



245
246
247
248
249
250
251
252
# File 'lib/natto/struct.rb', line 245

def initialize(nptr)
  super(nptr)
  @pointer = nptr

  if self[:feature]
    @feature = self[:feature].force_encoding(Encoding.default_external)
  end
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method in the class Natto::MeCabStruct

Instance Attribute Details

#featureString

Returns corresponding feature value.

Returns:

  • (String)

    corresponding feature value.



204
205
206
# File 'lib/natto/struct.rb', line 204

def feature
  @feature
end

#pointerFFI::Pointer (readonly)

Returns pointer to MeCab node struct.

Returns:

  • (FFI::Pointer)

    pointer to MeCab node struct.



206
207
208
# File 'lib/natto/struct.rb', line 206

def pointer
  @pointer
end

#surfaceString

Returns surface morpheme surface value.

Returns:

  • (String)

    surface morpheme surface value.



202
203
204
# File 'lib/natto/struct.rb', line 202

def surface
  @surface
end

Instance Method Details

#inspectString

Overrides Object#inspect.

Returns:

  • (String)

    encoded object id, stat, surface, and feature

See Also:



274
275
276
# File 'lib/natto/struct.rb', line 274

def inspect
  self.to_s
end

#is_bos?Boolean

Returns true if this is a virtual MeCab node representing the beginning of the sentence.

Returns:

  • (Boolean)


292
293
294
# File 'lib/natto/struct.rb', line 292

def is_bos?
  self.stat == BOS_NODE
end

#is_eon?Boolean

Returns true if this is a virtual MeCab node representing the end of the node list.

Returns:

  • (Boolean)


304
305
306
# File 'lib/natto/struct.rb', line 304

def is_eon?
  self.stat == EON_NODE
end

#is_eos?Boolean

Returns true if this is a virtual MeCab node representing the end of the sentence.

Returns:

  • (Boolean)


298
299
300
# File 'lib/natto/struct.rb', line 298

def is_eos?
  self.stat == EOS_NODE 
end

#is_nor?Boolean

Returns true if this is a normal MeCab node found in the dictionary.

Returns:

  • (Boolean)


280
281
282
# File 'lib/natto/struct.rb', line 280

def is_nor?
  self.stat == NOR_NODE
end

#is_unk?Boolean

Returns true if this is an unknown MeCab node not found in the dictionary.

Returns:

  • (Boolean)


286
287
288
# File 'lib/natto/struct.rb', line 286

def is_unk?
  self.stat == UNK_NODE
end

#to_sString

Returns human-readable details for the MeCab node. Overrides Object#to_s.

  • encoded object id
  • underlying FFI pointer to MeCab Node
  • stat (node type: NOR, UNK, BOS/EOS, EON)
  • surface
  • feature

Returns:

  • (String)

    encoded object id, underlying FFI pointer, stat, surface, and feature



263
264
265
266
267
268
269
# File 'lib/natto/struct.rb', line 263

def to_s
   [ super.chop,
     "@pointer=#{@pointer},",
     "stat=#{self[:stat]},", 
     "@surface=\"#{self.surface}\",",
     "@feature=\"#{self.feature}\">" ].join(' ')
end