Compact Encoding Detection for Ruby

Ruby bindings for Google's Compact Encoding Detection (CED for short) C++ library

Usage

You will need CMake to build the C++ native extension.

 macOS

You can use Homebrew to install it:

brew install cmake

Then you can install the gem from RubyGems.org.

Either add this to your Gemfile:

gem 'compact_enc_det', '~> 0.1'

or run the following command to install it:

gem install compact_enc_det

Now you can detect the encoding via the CompactEncDet.detect_encoding, which is a thin wrapper around CompactEncDet::DetectEncoding and MimeEncodingName functions from the C++ library.

file = File.read("unknown-encoding.txt", mode: "rb")
result = CompactEncDet.detect_encoding(file)
result.encoding
# => #<Encoding:Windows-1250>
result.bytes_consumed
# => 239
result.is_reliable?
# => true

Contributing

Any contributions are welcome! Feel free to open an issue or a pull request.

Repository

The google/compact_enc_det repository is linked as a Git submodule at ext/compact_enc_det/compact_enc_det.

You need to clone the repository with --recurse-submodules flag:

git clone --recurse-submodules [email protected]:cloudaper/compact_enc_det.git

Or initialize and update the submodule after cloning with the following commands:

git submodule init && git submodule update

Testing

Tests located at tests use the minitest framework.

Run the tests via test Rake task:

rake test

The gem will be compiled to lib/compact_enc_det/compact_enc_det.bundle first.

License

This gem is released under MIT license, while the original Google's Compact Encoding Detection library source code, located at ext/compact_enc_det/compact_enc_det, is under the Apache-2.0 license.