Unicode Collation

Unicode sorting is complicated (unicode.org/reports/tr10/), and Ruby doesn't do it correctly. But there is a widely-used implementation of the Unicode collation algorithm in the ICU (International Components for Unicode) libraries. This gem is a simple C wrapper to add the ucol_getSortKey function from the ICU Collation API to Ruby Strings.

Usage:

['cafe', 'cafes', 'caf.A??'].sort
=> ['cafe', 'cafes', 'caf??']

require 'unicode_collation'

['cafe', 'cafes', 'caf??'].sort_by {|s| s.unicode_sort_key}
=> ['cafe', 'caf??', 'cafes']

Install:

You must install ICU first. You can download the source from site.icu-project.org/download, or on Mac, you can install with MacPorts:

sudo port install icu

sudo gem install ninjudd-unicode-collation -s http://gems.github.com

To do:

Add support for locales other than en-US.

License:

Copyright (c) 2009 Justin Balthrop, Geni.com; Published under The MIT License, see LICENSE