chawan
A cup for chasen that provides an easy to use for extracting Japanese
Methods
* Chawan.parse(text)
parse the given text by analyzer, where default analyzer is :mecab
* Chawan.analyzer(xxx) (same as Chawan[xxx], Chawan.xxx)
specify analyzer
Class
* Chawan::Nodes (Chawan.parse returns a Chawan::Nodes)
#noun : scope category with noun
#verb : scope category with verb
#grep : scope category with given pattern
#compact : mix the category-consecutive nodes
* Chawan::Node (Chawan::Nodes has many Chawan::Node(s))
#category : part of speech
#word : text
#attributes : keys and vals hash
Example
text = '登録された利用者'
# 'parse' returns a Chawan::Nodes
Chawan.parse(text)
=> [<名詞: '登録'>, <動詞: 'さ'>, <動詞: 'れ'>, <助動詞: 'た'>, <名詞: '利用'>, <名詞: '者'>]
# Chawan::Nodes is enumerable
Chawan.parse(text).select{|node| node.category == '名詞'}
=> [<名詞: '登録'>, <名詞: '利用'>, <名詞: '者'>]
# gateway interface: noun
Chawan.parse(text).noun
=> [<名詞: '登録'>, <名詞: '利用'>, <名詞: '者'>]
# gateway interface: verb
Chawan.parse(text).verb
=> [<動詞: 'さ'>, <動詞: 'れ'>, <助動詞: 'た'>]
# gateway interface: grep
Chawan.parse(text).grep(/動詞/)
=> [<動詞: 'さ'>, <動詞: 'れ'>, <助動詞: 'た'>]
Chawan.parse(text).grep('動詞')
=> [<動詞: 'さ'>, <動詞: 'れ'>]
# gateway interface: compact
Chawan.parse(text).compact
=> [<名詞: '登録'>, <動詞: 'され'>, <助動詞: 'た'>, <名詞: '利用者'>]
Chawan.parse(text).compact(/動詞/)
=> [<名詞: '登録'>, <動詞: 'された'>, <名詞: '利用'>, <名詞: '者'>]
# gateway interface is chainable
Chawan.parse(text).noun.verb
=> []
# chainable is fun!
Chawan.parse(text).noun
=> [<名詞: '登録'>, <名詞: '利用'>, <名詞: '者'>]
Chawan.parse(text).compact.noun
=> [<名詞: '登録'>, <名詞: '利用者'>]
Chawan.parse(text).noun.compact
=> [<名詞: '登録利用者'>]
Analyzer
Parser engine is defined as 'analyzer'.
Available analyzers are:
* mecab : (default)
* chasen
Chawan[:mecab].parse('test')
=> [<名詞: 'test'>]
# same as
# Chawan.mecab.parse('test')
# Chawan.analyzer(:mecab).parse('test')
# Chawan.parse('test') # default analyzer is :mecab
Chawan[:chasen].parse('test')
=> [<記号: 't'>, <記号: 'e'>, <記号: 's'>, <記号: 't'>]
Required
* UTF-8
* 'mecab' unix command (and its path)
Todo
* use open3 rather than backquote for executing unix commands
Author
maiha@wota.jp