Natsukantou

Natsukantou is a human language translation library for XML documents.

Its strength is to allow users to mix & match different middleware filters and translation services (e.g. DeepL).

Natsukantou (夏柑糖) is a Japanese sweet. It is made by taking out the pulp of an orange, "translating" it into a jelly, and then reinserting it back to the hollow peel.

Installation

Install the gem and add to the application's Gemfile by executing:

$ bundle add natsukantou

If bundler is not being used to manage dependencies, install the gem by executing:

$ gem install natsukantou

Usage

To run it as a command:

$ natsukantou [XML_FILE]

It's a wizard that guides you through setting up a translator configuration.

Then it will translate the XML document.

If you choose to save the config (translator_config.rb), next time you can reuse it by calling

$ natsukantou -c [CONFIG_FILE] [XML_FILE]

Feature

Translators

Middlewares

SubstitudeGlossary

An alternative when glossary isn't supported by the translator. This middleware would substitude glossaries and wrap replaced terms with <skip> tag to mark it as being translated (which is supported by DeepL). It supports glossary file in TSV format.

HandleRubyMarkup

Ruby markup is not about Ruby the programming language, but the HTML feature: annotations in Japanese and Chinese content that are rendered alongside base text.

A phrase in ruby markup are often segmented by characters, making them less translatable. This middleware flattens ruby markup to just the base text to avoid such issue.

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.

Middleware

Natsukantou aims to be flexible by utilizing the middleware pattern. Essentially it allows the user to cherry-pick components (translator or middleware) when needed, to improve translation.

To develop your own component, create a class which responds to call with an env variable.

  • call needs to trigger @app.call(env) to continue to the next component.
  • env is a Natsukantou::Env, which is a special Hash. It provides convenience methods dom, lang_from and lang_to. You are also free to add key/values into it.
    • lang_from and lang_to are Natsukantou::LanguageCode, representing ISO 639-1 language codes. Its is? method allows the following comparison: Natsukantou::LanguageCode.new("en-gb").is?("en") #=> true
  • The wizard to be made aware of a new component by using autoload_and_register:
    • To infer config prompt, initialize needs to accept keyword arguments for all except the first parameter (app), and documented with Yard.

Contributing

Bug reports and pull requests are welcome on GitLab at https://gitlab.com/lulalala/natsukantou.

License

The gem is available as open source under the terms of the MIT License.