Neologdish::Normalizer for Ruby Check Gem Version

A Japanese text normalization library for Ruby follows the conventions of neologd/mecab-ipadic-neologd, with some performance optimizations, without external dependencies. It is designed to preprocess Japanese text before applying NLP techniques.

The specific rules are documented here: https://github.com/neologd/mecab-ipadic-neologd/wiki/Regexp.ja

Usage

require "neologdish-normalizer"

Neologdish::Normalizer.normalize("南アルプスの 天然水- Sparking* Lemon+ レモン一絞り")
# => 南アルプスの天然水-Sparking*Lemon+レモン一絞り

Benchmark

The performance comparison between the official Ruby example (https://github.com/neologd/mecab-ipadic-neologd/wiki/Regexp.ja#ruby-written-by-kimoto-and-overlast) and this library is as follows:

                           user     system      total        real
original normalizer:   4.200670   0.032004   4.232674 (  4.274573)
this library:          1.158801   0.005238   1.164039 (  1.170226)

The benchmark script is here: ./scripts/benchmark.rb

Installation

Install the gem and add to the application's Gemfile by executing:

bundle add 'neologdish-normalizer'

If bundler is not being used to manage dependencies, install the gem by executing:

gem install 'neologdish-normalizer'

Development

After checking out the repo, run bin/setup to install dependencies. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/moznion/neologdish-normalizer.