URIParser

What?

This is a very simple native gem to Google's chromium URL canonicalization library. Google's code is is BSD-licensed.

This gem requires the ICU library. Under Linux (Ubuntu) the proper version may be installed via apt-get (e.g. apt-get install libicu40). Under Mac OSX it may be installed with homebrew (e.g. brew install icu4c).

Why?

Addressable provides the same functionality (and more!) but is slow. This gem is much faster.

Example

require 'uri_parser'

noncan = 'http://руцентр.рф/Iñtërnâtiônàlizætiøn!?i18n=true'
url = URIParser.new(noncan)
puts <<TO_THE_END
    non-canonicalized: #{noncan}
    canonicalized: #{url.uri}
    scheme: #{url.scheme}
    host: #{url.host}
    port: #{url.port}
    path: #{url.path}
    query: #{url.query}
    valid: #{url.valid?}
TO_THE_END

Note: If a URL is marked as invalid then the state/value of any of its other properties is undefined.

Copyright 2011 SEOmoz. See LICENSE for details.

google-url is copyright Google.