text-hyphen

Description

Text::Hyphen is a Ruby library to hyphenate words in various languages using Ruby-fied versions of TeX hyphenation patterns. It will properly hyphenate various words according to the rules of the language the word is written in. The algorithm is based on that of the TeX typesetting system by Donald E. Knuth.

This is originally based on the Perl implementation of TeX::Hyphen and the Ruby port. The language hyphenation pattern files are based on the sources available from CTAN as of 2004.12.19 and have been manually translated by Austin Ziegler.

This is a small feature release adding Russian language support and fixing a bug in the custom hyphen support introduced last version. This release provides both Ruby 1.8.7 and Ruby 1.9.2 support (but please read below). In short, Ruby 1.8 support is deprecated and I will not be providing any bug fixes related to Ruby 1.8. New features will be developed and tested against Ruby 1.9 only.

Where

Synopsis

require 'text/hyphen'
hh = Text::Hyphen.new(:language => 'en_us', :left => 2, :right => 2)
# Defaults to the above
hh = Text::Hyphen.new

word = "representation"
points = hh.hyphenate(word)  #=> [3, 5, 8, 10]
puts hh.visualize(word)      #=> rep-re-sen-ta-tion

Both visualize and hyphenate_to methods allow choosing a custom hyphen:

puts hh.visualize(word, "­") #=> rep­re­sen­ta­tion

Text::Hyphen is truly multilingual, with 29 languages or language variants supported. As an example, consider the difference between the following:

require 'text/hyphen'
# Using left and right minimum values of 0 ensures that you will see all
# possible hyphenation points, not just those that meet the minimum width
# requirements.
en = Text::Hyphen.new(:left => 0, :right => 0)
fr = Text::Hyphen.new(:language => "fr", :left => 0, :right => 0)

puts en.visualise("organiser")      #=> or-gan-iser
puts fr.visualise("organiser")      #=> or-ga-ni-ser

As you can see, the hyphenation is distinct between the two hyphenators. Additional improvements over TeX::Hyphen include thread safety (except for debug control) and support for UTF-8 under Ruby 1.9.

Install

gem install text-hyphen

A Note About Ruby 1.8 Support

I’ve been saying for a couple of releases that “this is the last major release supporting Ruby 1.8 interpreters. Future versions will only work with Ruby 1.9 or later interpreters.” Let me clarify my position on this, because removing Ruby 1.8 support requires effort that I am not putting in as of yet.

  • I currently use rbenv and this machine has a version with Ruby 1.8.7 installed (and Mac OS X ships with Ruby 1.8.7 anyway).

  • I will continue to run the test suite against all installed versions of Ruby that I have on this machine.

  • If you submit a patch or pull request, I do not require 1.8.7 support. Until I decide to move this project from 1.x to 2.x versioning, features that do not support 1.8 must be guarded. If the feature is sufficiently complex, I can be convinced to switch to 2.x.

  • I will not fix any bugs related to Ruby 1.8.7. I will accept patches or pull requests that fix bugs that others find, however.

  • I strongly recommend that your gem specification for text-hyphen be ‘~> 1.4’.

Developers

After checking out the source, run:

$ rake newb

This task will install any missing dependencies, run the tests/specs, and generate the RDoc.

:include: License.rdoc