Class: Spark::Mllib::NaiveBayes
- Inherits:
-
Object
- Object
- Spark::Mllib::NaiveBayes
- Defined in:
- lib/spark/mllib/classification/naive_bayes.rb
Class Method Summary collapse
-
.train(rdd, lambda = 1.0) ⇒ Object
Trains a Naive Bayes model given an RDD of (label, features) pairs.
Class Method Details
.train(rdd, lambda = 1.0) ⇒ Object
Trains a Naive Bayes model given an RDD of (label, features) pairs.
This is the Multinomial NB (tinyurl.com/lsdw6p) which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification. By making every vector a 0-1 vector, it can also be used as Bernoulli NB (tinyurl.com/p7c96j6). The input feature values must be nonnegative.
Arguments:
- rdd
-
RDD of LabeledPoint.
- lambda
-
The smoothing parameter.
82 83 84 85 86 87 88 89 90 91 92 93 |
# File 'lib/spark/mllib/classification/naive_bayes.rb', line 82 def self.train(rdd, lambda=1.0) # Validation first = rdd.first unless first.is_a?(LabeledPoint) raise Spark::MllibError, "RDD should contains LabeledPoint, got #{first.class}" end labels, pi, theta = Spark.jb.call(RubyMLLibAPI.new, 'trainNaiveBayesModel', rdd, lambda) theta = Spark::Mllib::Matrices.dense(theta.size, theta.first.size, theta) NaiveBayesModel.new(labels, pi, theta) end |