Konjac

A Ruby command-line utility for translating files using a YAML wordlist

Homepage

www.brymck.com

Author

Bryan McKelvey

Copyright

© 2012 Bryan McKelvey

License

MIT

Features

  • Fuzzy matching - Konjac can make suggestions for similar words based on their similarity.

  • Whitespace handling - It’s still pretty lazy but at least functional

  • One-way translation - For example, you would always convert full-width letters and numbers in Japanese to their half-width counterparts in English. The converse is not necessarily true.

  • Regular expressions - Stuff like /(\d+)年(\d+)月(\d+)日/\2\/\3\/\1/ (i.e. 1984年11月23日 # => 11/23/1984)

  • Importing/exporting text from/to Office documents - Currently only working on Mac (support for Word planned, but for *nix it’s probably too difficult).

Installation

Stable

With Ruby installed, run the following in your terminal:

gem install konjac

Development

With Ruby, Git and Bundler installed, navigate in your command line to a directory of your choice, then run:

git clone git://github.com/brymck/konjac.git
cd konjac
bundle update
rake install

Usage

Translate all text files in the current directory from Japanese into English:

konjac translate *.txt --from japanese --to english
konjac translate *.txt -f ja -t en

Use multiple dictionaries:

konjac translate financial_report_en.txt --to japanese --using {finance,fluffery}
konjac translate financial_report_en.txt -t ja -u {finance,fluffery}

Extract text from a .docx? document (creates a plain-text test.konjac file from test.docx):

konjac export test.doc
konjac export test.docx

Extract text from a .docx document and process with a dictionary

konjac export test.docx --from japanese --to english --using pre
konjac export test.docx -f ja -t en -u pre

Import tags file back into .docx document (for .doc files, this opens the file in word and imports the changes; for .docx files this outputs a new file named test_imported.docx):

konjac import test.doc
konjac import test.docx

Add a word to your dictionary:

konjac add --original dog --from english --translation 犬 --to japanese
konjac add -o dog -f en -r 犬 -t ja

Translate a word using your dictionary:

konjac translate dog --from english --to japanese --word
konjac translate dog -f en -t ja -w

Suggest a word using your dictionary:

konjac suggest dog --from english --to japanese
konjac suggest dog -f en -t ja

Ruby

Create a Suggestor object:

require "konjac"
Konjac::Dictionary.add_word :from => :en, :original => "word",
                            :to => :ja, :translation => "言葉"
s = Konjac::Suggestor.new(:en, :ja)
s.suggest "word" # => [[1.0, "word", "言葉"]]

Dictionary Format

Store terms in ~/.konjac/dict.yml.

Simple (two-way equivalent terms) - English “I” is equivalent to Spanish “yo”:

-
  en: I
  es: yo

Not as simple - Japanese lacks a plural, therefore both “dog” and “dogs” translate as 犬:

-
  en: dog
  ja:
    ja: 犬
    en: dogs?
    regex: true  # i.e. the regular expression /dogs?/

Documentation

Should be simple enough to generate yourself:

rm -rf konjac
git clone git://github.com/brymck/konjac
cd konjac
bundle update
rake rdoc
rm -rf !(doc)
mv doc/rdoc/* .
rm -rf doc

Supplementary Stuff

Name

Hon’yaku means “translation” in Japanese. This utility relies on a YAML wordlist. Konnyaku (Japanese for “konjac”) rhymes with hon’yaku and is a type of yam. Also, Doraemon had something called a hon’yaku konnyaku that allowed him to speak every language. IIRC it worked with animals too. But I digress.