Class: Spark::Mllib::NaiveBayes

Inherits:

Object

Object
Spark::Mllib::NaiveBayes

show all

Defined in:: lib/spark/mllib/classification/naive_bayes.rb

Class Method Summary collapse

.train(rdd, lambda = 1.0) ⇒ Object

Trains a Naive Bayes model given an RDD of (label, features) pairs.

Class Method Details

.train(rdd, lambda = 1.0) ⇒ `Object`

Trains a Naive Bayes model given an RDD of (label, features) pairs.

This is the Multinomial NB (tinyurl.com/lsdw6p) which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification. By making every vector a 0-1 vector, it can also be used as Bernoulli NB (tinyurl.com/p7c96j6). The input feature values must be nonnegative.

Arguments:

rdd: RDD of LabeledPoint.
lambda: The smoothing parameter.

# File 'lib/spark/mllib/classification/naive_bayes.rb', line 82

def self.train(rdd, lambda=1.0)
  # Validation
  first = rdd.first
  unless first.is_a?(LabeledPoint)
    raise Spark::MllibError, "RDD should contains LabeledPoint, got #{first.class}"
  end

  labels, pi, theta = Spark.jb.call(RubyMLLibAPI.new, 'trainNaiveBayesModel', rdd, lambda)
  theta = Spark::Mllib::Matrices.dense(theta.size, theta.first.size, theta)

  NaiveBayesModel.new(labels, pi, theta)
end

Class: Spark::Mllib::NaiveBayes

Class Method Summary collapse

Class Method Details

.train(rdd, lambda = 1.0) ⇒ Object

Arguments:

.train(rdd, lambda = 1.0) ⇒ `Object`