Measurable

A gem to test what metric is best for certain kinds of datasets in machine learning.

Besides the Array class, I also want to support NVector (from NMatrix).

The distance measures will be created in Ruby first. If I see that it's really too slow, I'll write some methods in C (or Java, for JRuby).

This is a fork of the gem Distance Measure, which has a similar objective, but isn't actively maintained and doesn't support NMatrix. Thank you, @reddavis. :)

Install

gem install measurable

I only tested it with 2.0.0 (yes, yes, travis, I'll do it eventually). I want to support JRuby as well.

Distance measures

I'm using the term "distance measure" without much concern for the strict mathematical definition of a metric. If the documentation for one of the methods isn't clear about it being or not a metric, please open an issue.

The following are the similarity measures supported at the moment:

  • Euclidean distance
  • Squared euclidean distance
  • Cosine distance
  • Max-min distance (from "K-Means clustering using max-min distance measure")
  • Jaccard distance
  • Tanimoto distance
  • Haversine distance
  • Minkowski (Cityblock or Manhattan) distance
  • Chebyshev distance
  • Hamming distance

These still need to be implemented:

  • Correlation distance
  • Chi-square distance
  • Kullback-Leibler divergence
  • Jensen-Shannon divergence
  • Mahalanobis distance
  • Squared Mahalanobis distance

I plan to update the specs to reflect that each method is (or isn't) a mathematical metric, but I want to finish implementing them first. Any help is appreciated! :)

How to use

The API I intend to support is something like this:

require "measurable"

u = NVector.ones(2)
v = NVector.zeros(2)
w = [1, 0]
x = [2, 2]

# Calculate the distance between two points in space.
Measurable.euclidean(u, v) # => 1.41421
Measurable.euclidean(w, v) # => 1.00000
Measurable.cosine([1, 2], [2, 3]) # => 0.00772

# Calculate the norm of a vector, i.e. its distance from the origin.
Measurable.euclidean_squared([3, 4]) # => 25

Documentation

RDoc syntax is used to document the project. To build it locally, you'll need to install the Fivefish generator (gem install rdoc-generator-fivefish) and run the following command:

rdoc -f fivefish -m README.md *.md LICENSE lib/

I want to be able to use a Rake task to generate the documentation, thus allowing me to forget the specific command. However, there's a bug in RDoc::Task in which custom generators (like Fivefish) can't be used.

If there's something wrong with an explanation or if there's information missing, please open an issue or send a pull request.

License

See LICENSE for details.

The original distance_measures gem is copyrighted by @reddavis.