Whatlang
Ruby bindings for Whichlang, a natural language detection for Rust.
This is a fork of the original whatlang-rb but this adds an interface to the whichlang library. It's faster and detects languages better (although not as many languages).
Features
Features are derived from original Whichlang library, which includes:
- Throughput above 100 MB/s for short and long strings.
- Good accuracy (99.5% on my validation dataset, but it really depends on the size of your input.)
- Supported languages: Arabic, Dutch, English, French, German, Hindi, Italian, Japanese, Korean, Mandarin, Portuguese, Russian, Spanish, Swedish, Turkish, and Vietnamese.
Installation
Requirements
You need Rust's build environment to install this gem.
For Unix like system, run
% curl https://sh.rustup.rs -sSf | sh
For Windows, download and run installer.
See Rust official installation page for details.
Gem installation
Add this line to your application's Gemfile:
gem 'whichlang'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install whichlang
Usage
require "whichlang"
text = "Благодаря Эсперанто вы обрётете друзей по всему миру!"
info = whichlang.detect(text) # => "rus"
text = "Jen la trinkejo fermitis, ni iras tra mallumo kaj pluvo."
info = whichlang.detect(text) # => "spa"
# blank spaces and nil are ignored
info = whichlang.detect(" ") # => nil
info = whichlang.detect("") # => nil
info = whichlang.detect(nil) # => nil
Development
After checking out the repo, run bundle config set local vendor/bundle && bundle install
to install dependencies. Then, run bundle exec rake test
to run the tests. You can also run bundle exec rake console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in Cargo.toml
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and the created tag, and push the .gem
file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://gitlab.com/bendangelo/whichlang-rb.
License
This RubyGem distributed under the Ruby's license. See COPYING file.