Class: Rubydoop::JobDefinition

Inherits:

Object

Object
Rubydoop::JobDefinition

show all

Defined in:: lib/rubydoop/dsl.rb

Overview

Job configuration DSL.

‘Rubydoop.configure` blocks are run within the context of an instance of this class. These are the methods available in those blocks.

Instance Method Summary collapse

#combiner(cls = nil) ⇒ Object (also: #combiner=)

Sets the combiner class.
#grouping_comparator(cls = nil) ⇒ Object (also: #grouping_comparator=)

Sets a custom grouping comparator.
#initialize(context, job) ⇒ JobDefinition constructor

A new instance of JobDefinition.
#input(paths, options = {}) ⇒ Object

Sets the input paths of the job.
#map_output_key(cls) ⇒ Object

Sets the mapper’s output key type.
#map_output_value(cls) ⇒ Object

Sets the reducer’s output value type.
#mapper(cls = nil) ⇒ Object (also: #mapper=)

Sets the mapper class.
#output(dir = nil, options = {}) ⇒ Object

Sets or gets the output path of the job.
#output_key(cls) ⇒ Object

Sets the reducer’s output key type.
#partitioner(cls = nil) ⇒ Object (also: #partitioner=)

Sets a custom partitioner.
#raw {|job| ... } ⇒ Object

If you need to manipulate the Hadoop job in some that isn’t covered by this DSL, this is the method for you.
#reducer(cls = nil) ⇒ Object (also: #reducer=)

Sets the reducer class.
#set(property, value) ⇒ Object

Sets a job property.
#sort_comparator(cls = nil) ⇒ Object (also: #sort_comparator=)

Sets a custom sort comparator.

Constructor Details

#initialize(context, job) ⇒ `JobDefinition`

# File 'lib/rubydoop/dsl.rb', line 82

def initialize(context, job)
  @context = context
  @job = job
end

Instance Method Details

#combiner(cls = nil) ⇒ `Object` Also known as: combiner=

Sets the combiner class.

The equivalent of calling ‘setCombinerClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

A combiner should implement ‘reduce`, just like reducers.

See Also:

Hadoop's Job#setCombinerClass

# File 'lib/rubydoop/dsl.rb', line 237

def combiner(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::CombinerProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_combiner_class(Rubydoop::CombinerProxy)
    @combiner = cls
  end
  @combiner
end

#grouping_comparator(cls = nil) ⇒ `Object` Also known as: grouping_comparator=

Sets a custom grouping comparator.

The equivalent of calling ‘setGroupingComparatorClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

See Also:

Hadoop's Job#setGroupingComparatorClass

# File 'lib/rubydoop/dsl.rb', line 278

def grouping_comparator(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::GroupingComparatorProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_grouping_comparator_class(Rubydoop::GroupingComparatorProxy)
    @grouping_comparator = cls
  end
  @grouping_comparator
end

#input(paths, options = {}) ⇒ `Object`

Sets the input paths of the job.

Calls ‘setInputFormatClass` on the Hadoop job and uses the static `setInputPaths` on the input format to set the job’s input path.

Options Hash (options):

:format (JavaClass) —

The input format to use, defaults to ‘TextInputFormat`

See Also:

Hadoop's Job#setInputFormatClass

# File 'lib/rubydoop/dsl.rb', line 98

def input(paths, options={})
  paths = paths.join(',') if paths.is_a?(Enumerable)
  format = options.fetch(:format, :text)
  unless format.is_a?(Class)
    class_name = format.to_s.gsub(/^.|_./) {|x| x[-1,1].upcase } + "InputFormat"
    format = Hadoop::Mapreduce::Lib::Input.const_get(class_name)
  end
  unless format <= Hadoop::Mapreduce::InputFormat
    @job.configuration.set(Rubydoop::InputFormatProxy::RUBY_CLASS_KEY, format.name)
    format = Rubydoop::InputFormatProxy
  end
  format.set_input_paths(@job, paths)
  @job.set_input_format_class(format)
end

#map_output_key(cls) ⇒ `Object`

Sets the mapper’s output key type.

See Also:

Hadoop's Job#setMapOutputKeyClass

342	# File 'lib/rubydoop/dsl.rb', line 342 class_setter :map_output_key

#map_output_value(cls) ⇒ `Object`

Sets the reducer’s output value type.

See Also:

Job#setOutputValueClass

351	# File 'lib/rubydoop/dsl.rb', line 351 class_setter :map_output_value

#mapper(cls = nil) ⇒ `Object` Also known as: mapper=

Sets the mapper class.

The equivalent of calling ‘setMapperClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

The class only needs to implement the method ‘map`, which will be called exactly like a Java mapper class’ ‘map` method would be called.

You can optionally implement ‘setup` and `cleanup`, which mirrors the methods of the same name in Java mappers.

See Also:

# File 'lib/rubydoop/dsl.rb', line 190

def mapper(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::MapperProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_mapper_class(Rubydoop::MapperProxy)
    @mapper = cls
  end
  @mapper
end

#output(dir = nil, options = {}) ⇒ `Object`

Sets or gets the output path of the job.

Calls ‘setOutputFormatClass` on the Hadoop job and uses the static `setOutputPath` on the output format to set the job’s output path.

Options Hash (options):

:format (JavaClass) —

The output format to use, defaults to ‘TextOutputFormat`

See Also:

Hadoop's Job#setOutputFormatClass

# File 'lib/rubydoop/dsl.rb', line 123

def output(dir=nil, options={})
  if dir
    if dir.is_a?(Hash)
      options = dir
      if options[:intermediate]
        dir = @job.job_name
      else
        raise ArgumentError, sprintf('neither dir nor intermediate: true was specified')
      end
    end
    dir = sprintf('%s-%010d-%05d', dir, Time.now, rand(1e5)) if options[:intermediate]
    @output_dir = dir
    format = options.fetch(:format, :text)
    unless format.is_a?(Class)
      class_name = format.to_s.gsub(/^.|_./) {|x| x[-1,1].upcase } + "OutputFormat"
      format = Hadoop::Mapreduce::Lib::Output.const_get(class_name)
    end
    format.set_output_path(@job, Hadoop::Fs::Path.new(@output_dir))
    @job.set_output_format_class(format)
    if options[:lazy]
      Hadoop::Mapreduce::Lib::Output::LazyOutputFormat.set_output_format_class(@job, format)
    end
  end
  @output_dir
end

#output_key(cls) ⇒ `Object`

Sets the reducer’s output key type.

See Also:

Hadoop's Job#setOutputKeyClass

360	# File 'lib/rubydoop/dsl.rb', line 360 class_setter :output_key

#partitioner(cls = nil) ⇒ `Object` Also known as: partitioner=

Sets a custom partitioner.

The equivalent of calling ‘setPartitionerClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

The class must implement ‘partition`, which will be called exactly like a Java partitioner would.

See Also:

Hadoop's Job#setPartitionerClass

# File 'lib/rubydoop/dsl.rb', line 259

def partitioner(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::PartitionerProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_partitioner_class(Rubydoop::PartitionerProxy)
    @partitioner = cls
  end
  @partitioner
end

#raw {|job| ... } ⇒ `Object`

If you need to manipulate the Hadoop job in some that isn’t covered by this DSL, this is the method for you. It yields the ‘Job`, letting you do whatever you want with it.

Yield Parameters:

job (Hadoop::Mapreduce::Job) —

The raw Hadoop Job instance

See Also:

Hadoop's Job



314
315
316

# File 'lib/rubydoop/dsl.rb', line 314

def raw(&block)
  yield @job
end

#reducer(cls = nil) ⇒ `Object` Also known as: reducer=

Sets the reducer class.

The equivalent of calling ‘setReducerClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

The class only needs to implement the method ‘reduce`, which will be called exactly like a Java reducer class’ ‘reduce` method would be called.

You can optionally implement ‘setup` and `cleanup`, which mirrors the methods of the same name in Java reducers.

See Also:

# File 'lib/rubydoop/dsl.rb', line 216

def reducer(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::ReducerProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_reducer_class(Rubydoop::ReducerProxy)
    @reducer = cls
  end
  @reducer
end

#set(property, value) ⇒ `Object`

Sets a job property.

Calls ‘set`/`setBoolean`/`setLong`/`setFloat` on the Hadoop Job’s configuration (exact method depends on the type of the value).

See Also:

# File 'lib/rubydoop/dsl.rb', line 161

def set(property, value)
  case value
  when Integer
    @job.configuration.set_long(property, value)
  when Float
    @job.configuration.set_float(property, value)
  when true, false
    @job.configuration.set_boolean(property, value)
  else
    @job.configuration.set(property, value)
  end
end

#sort_comparator(cls = nil) ⇒ `Object` Also known as: sort_comparator=

Sets a custom sort comparator.

The equivalent of calling ‘setSortComparatorClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

See Also:

Hadoop's Job#setSortComparatorClass

# File 'lib/rubydoop/dsl.rb', line 297

def sort_comparator(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::SortComparatorProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_sort_comparator_class(Rubydoop::SortComparatorProxy)
    @sort_comparator = cls
  end
  @sort_comparator
end

Class: Rubydoop::JobDefinition

Overview

Instance Method Summary collapse

Constructor Details

#initialize(context, job) ⇒ JobDefinition

Instance Method Details

#combiner(cls = nil) ⇒ Object Also known as: combiner=

#grouping_comparator(cls = nil) ⇒ Object Also known as: grouping_comparator=

#input(paths, options = {}) ⇒ Object

#map_output_key(cls) ⇒ Object

#map_output_value(cls) ⇒ Object

#mapper(cls = nil) ⇒ Object Also known as: mapper=

#output(dir = nil, options = {}) ⇒ Object

#output_key(cls) ⇒ Object

#partitioner(cls = nil) ⇒ Object Also known as: partitioner=

#raw {|job| ... } ⇒ Object

#reducer(cls = nil) ⇒ Object Also known as: reducer=

#set(property, value) ⇒ Object

#sort_comparator(cls = nil) ⇒ Object Also known as: sort_comparator=

#initialize(context, job) ⇒ `JobDefinition`

#combiner(cls = nil) ⇒ `Object` Also known as: combiner=

#grouping_comparator(cls = nil) ⇒ `Object` Also known as: grouping_comparator=

#input(paths, options = {}) ⇒ `Object`

#map_output_key(cls) ⇒ `Object`

#map_output_value(cls) ⇒ `Object`

#mapper(cls = nil) ⇒ `Object` Also known as: mapper=

#output(dir = nil, options = {}) ⇒ `Object`

#output_key(cls) ⇒ `Object`

#partitioner(cls = nil) ⇒ `Object` Also known as: partitioner=

#raw {|job| ... } ⇒ `Object`

#reducer(cls = nil) ⇒ `Object` Also known as: reducer=

#set(property, value) ⇒ `Object`

#sort_comparator(cls = nil) ⇒ `Object` Also known as: sort_comparator=