Byk
Ruby gem for fast transliteration of Serbian Cyrillic ↔ Latin

Installation
Byk can be used as a standalone console utility or as a String
extension in your Ruby programs. It has zero dependencies beyond
vanilla Ruby and the toolchain for building native gems 1.
You can install it directly:
$ gem install byk
or add it as a dependency in your application's Gemfile:
gem "byk"
1 For Windows, you might want to check out DevKit
Usage
As a standalone utility
Here's the help banner with all the available options:
usage: byk [] [files]
options:
-c, --cyrillic convert input to Cyrillic (default)
-l, --latin convert input to Latin
-a, --ascii convert input to "ASCII Latin"
-v, --version show version
Translation goes to stdout so you can redirect it or pipe it as you see fit. Let's take a look at some common scenarios.
To translate files to Cyrillic:
$ byk in1.txt in2.txt > out.txt
To translate files to Latin and search for a phrase:
$ byk -l file.txt | grep stvar
Ad hoc conversion:
$ echo "Вук Стефановић Караџић" | byk -a
Vuk Stefanovic Karadzic
or simply omit args and type away:
$ byk
a u ruke Mandušića Vuka
biće svaka puška ubojita!
^D
а у руке Мандушића Вука
биће свака пушка убојита!
^D being ctrl d.
As a String extension
Unless you're using Bundler, make sure to require the gem in your initializer:
require "byk"
This will extend String with a couple of simple methods:
"Šeširdžija".to_cyrillic # => "Шеширџија"
"Шеширџија".to_latin # => "Šeširdžija"
"Шеширџија".to_ascii_latin # => "Sesirdzija"
These do not modify the receiver. For that, there's a destructive variant of each:
text = "Šeširdžija"
text.to_cyrillic! # => "Шеширџија"
text.to_latin! # => "Šeširdžija"
text.to_ascii_latin! # => "Sesirdzija"
text # => "Sesirdzija"
Note that both latinization methods observe digraph capitalization rules:
"ЉИЉА Љиљановић".to_latin # => "LJILJA Ljiljanović"
"ĐORĐE Đorđević".to_ascii_latin # => "DJORDJE Djordjevic"
Safe require
If you prefer not to monkey patch String, you can do a "safe"
require in your Gemfile:
gem "byk", :require => "byk/safe"
or initializer:
require "byk/safe"
Then, you should rely on module methods:
text = "Жвазбука"
Byk.to_latin(text) # => "Žvazbuka"
text # => "Жвазбука"
Byk.to_latin!(text) # => "Žvazbuka"
text # => "Žvazbuka"
# etc.
How fast is "fast" transliteration?
Here's a quick test:
$ wget https://sr.wikipedia.org/ -O sample
$ du -h sample
128K
$ time byk -l sample > /dev/null
0.08s user 0.04s system 96% cpu 0.126 total
Let's up the ante:
$ for i in {1..800}; do cat sample; done > big
$ du -h big
97M
$ time byk -l big > /dev/null
1.71s user 0.13s system 99% cpu 1.846 total
So, ~100MB in under 2s. Fast enough, I suppose. You can expect it to scale linearly.
Compared to the pure Ruby implementation, it is about 10-30x faster, depending on the input composition and the transliteration method applied.
Testing
To test the gem, clone the repo and run:
$ bundle && bundle exec rake
Compatibility
Byk is supported under MRI 1.9.2+. I might try my hand in writing a JRuby extension in a future release.
License
This gem is released under the MIT License.
Уздравље!