Class: Rubydoop::JobDefinition

Inherits:

Object

Object
Rubydoop::JobDefinition

show all

Defined in:: lib/rubydoop/dsl.rb

Overview

Job configuration DSL.

Rubydoop.configure blocks are run within the context of an instance of this class. These are the methods available in those blocks.

Instance Method Summary collapse

#combiner(cls = nil) ⇒ Object (also: #combiner=)

Sets the combiner class.
#grouping_comparator(cls = nil) ⇒ Object (also: #grouping_comparator=)

Sets a custom grouping comparator.
#initialize(context, job) ⇒ JobDefinition constructor

A new instance of JobDefinition.
#input(paths, options = {}) ⇒ Object

Sets the input paths of the job.
#map_output_key(cls) ⇒ Object

Sets the mapper’s output key type.
#map_output_value(cls) ⇒ Object

Sets the reducer’s output value type.
#mapper(cls = nil) ⇒ Object (also: #mapper=)

Sets the mapper class.
#output(dir, options = {}) ⇒ Object

Sets the output path of the job.
#output_key(cls) ⇒ Object

Sets the reducer’s output key type.
#partitioner(cls = nil) ⇒ Object (also: #partitioner=)

Sets a custom partitioner.
#raw {|job| ... } ⇒ Object

If you need to manipulate the Hadoop job in some that isn’t covered by this DSL, this is the method for you.
#reducer(cls = nil) ⇒ Object (also: #reducer=)

Sets the reducer class.
#set(property, value) ⇒ Object

Sets a job property.
#sort_comparator(cls = nil) ⇒ Object (also: #sort_comparator=)

Sets a custom sort comparator.

Constructor Details

#initialize(context, job) ⇒ `JobDefinition`

Returns a new instance of JobDefinition.

# File 'lib/rubydoop/dsl.rb', line 74

def initialize(context, job)
  @context = context
  @job = job
end

Instance Method Details

#combiner(cls = nil) ⇒ `Object` Also known as: combiner=

Sets the combiner class.

The equivalent of calling setCombinerClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

A combiner should implement reduce, just like reducers.

Parameters:

cls (Class) (defaults to: nil) —

The (Ruby) combiner class.

See Also:

Hadoop's Job#setCombinerClass

# File 'lib/rubydoop/dsl.rb', line 209

def combiner(cls=nil)
  if cls
    @job.configuration.set(COMBINER_KEY, cls.name)
    @job.set_combiner_class(@context.proxy_class(:combiner))
    @combiner = cls
  end
  @combiner
end

#grouping_comparator(cls = nil) ⇒ `Object` Also known as: grouping_comparator=

Sets a custom grouping comparator.

The equivalent of calling setGroupingComparatorClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

Parameters:

cls (Class) (defaults to: nil) —

The (Ruby) comparator class.

See Also:

Hadoop's Job#setGroupingComparatorClass

# File 'lib/rubydoop/dsl.rb', line 250

def grouping_comparator(cls=nil)
  if cls
    @job.configuration.set(GROUPING_COMPARATOR_KEY, cls.name)
    @job.set_grouping_comparator_class(@context.proxy_class(:grouping_comparator))
    @grouping_comparator = cls
  end
  @grouping_comparator
end

#input(paths, options = {}) ⇒ `Object`

Sets the input paths of the job.

Calls setInputFormatClass on the Hadoop job and uses the static setInputPaths on the input format to set the job’s input path.

Parameters:

paths (String, Enumerable) —

The input paths, either a comma separated string or an Enumerable of strings (which will be joined with a comma).
options (Hash) (defaults to: {})

Options Hash (options):

:format (JavaClass) —

The input format to use, defaults to TextInputFormat

See Also:

Hadoop's Job#setInputFormatClass

# File 'lib/rubydoop/dsl.rb', line 90

def input(paths, options={})
  paths = paths.join(',') if paths.is_a?(Enumerable)
  format = options.fetch(:format, :text)
  unless format.is_a?(Class)
    class_name = format.to_s.gsub(/^.|_./) {|x| x[-1,1].upcase } + "InputFormat"
    format = Hadoop::Mapreduce::Lib::Input.const_get(class_name)
  end
  format.set_input_paths(@job, paths)
  @job.set_input_format_class(format)
end

#map_output_key(cls) ⇒ `Object`

Sets the mapper’s output key type.

Parameters:

cls (Class) —

The mapper’s output key type

See Also:

Hadoop's Job#setMapOutputKeyClass

314	# File 'lib/rubydoop/dsl.rb', line 314 class_setter :map_output_key

#map_output_value(cls) ⇒ `Object`

Sets the reducer’s output value type.

Parameters:

cls (Class) —

The reducer’s output value type

See Also:

Job#setOutputValueClass

323	# File 'lib/rubydoop/dsl.rb', line 323 class_setter :map_output_value

#mapper(cls = nil) ⇒ `Object` Also known as: mapper=

Sets the mapper class.

The equivalent of calling setMapperClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

The class only needs to implement the method map, which will be called exactly like a Java mapper class’ map method would be called.

You can optionally implement setup and cleanup, which mirrors the methods of the same name in Java mappers.

Parameters:

cls (Class) (defaults to: nil) —

The (Ruby) mapper class.

See Also:

# File 'lib/rubydoop/dsl.rb', line 162

def mapper(cls=nil)
  if cls
    @job.configuration.set(MAPPER_KEY, cls.name)
    @job.set_mapper_class(@context.proxy_class(:mapper))
    @mapper = cls
  end
  @mapper
end

#output(dir, options = {}) ⇒ `Object`

Sets the output path of the job.

Calls setOutputFormatClass on the Hadoop job and uses the static setOutputPath on the output format to set the job’s output path.

Parameters:

dir (String) —

The output path
options (Hash) (defaults to: {})

Options Hash (options):

:format (JavaClass) —

The output format to use, defaults to TextOutputFormat

See Also:

Hadoop's Job#setOutputFormatClass

# File 'lib/rubydoop/dsl.rb', line 111

def output(dir, options={})
  format = options.fetch(:format, :text)
  unless format.is_a?(Class)
    class_name = format.to_s.gsub(/^.|_./) {|x| x[-1,1].upcase } + "OutputFormat"
    format = Hadoop::Mapreduce::Lib::Output.const_get(class_name)
  end
  format.set_output_path(@job, Hadoop::Fs::Path.new(dir))
  @job.set_output_format_class(format)
end

#output_key(cls) ⇒ `Object`

Sets the reducer’s output key type.

Parameters:

cls (Class) —

The reducer’s output key type

See Also:

Hadoop's Job#setOutputKeyClass

332	# File 'lib/rubydoop/dsl.rb', line 332 class_setter :output_key

#partitioner(cls = nil) ⇒ `Object` Also known as: partitioner=

Sets a custom partitioner.

The equivalent of calling setPartitionerClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

The class must implement partition, which will be called exactly like a Java partitioner would.

Parameters:

cls (Class) (defaults to: nil) —

The (Ruby) partitioner class.

See Also:

Hadoop's Job#setPartitionerClass

# File 'lib/rubydoop/dsl.rb', line 231

def partitioner(cls=nil)
  if cls
    @job.configuration.set(PARTITIONER_KEY, cls.name)
    @job.set_partitioner_class(@context.proxy_class(:partitioner))
    @partitioner = cls
  end
  @partitioner
end

#raw {|job| ... } ⇒ `Object`

If you need to manipulate the Hadoop job in some that isn’t covered by this DSL, this is the method for you. It yields the Job, letting you do whatever you want with it.

Yield Parameters:

job (Hadoop::Mapreduce::Job) —

The raw Hadoop Job instance

See Also:

Hadoop's Job



286
287
288

# File 'lib/rubydoop/dsl.rb', line 286

def raw(&block)
  yield @job
end

#reducer(cls = nil) ⇒ `Object` Also known as: reducer=

Sets the reducer class.

The equivalent of calling setReducerClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

The class only needs to implement the method reduce, which will be called exactly like a Java reducer class’ reduce method would be called.

You can optionally implement setup and cleanup, which mirrors the methods of the same name in Java reducers.

Parameters:

cls (Class) (defaults to: nil) —

The (Ruby) reducer class.

See Also:

# File 'lib/rubydoop/dsl.rb', line 188

def reducer(cls=nil)
  if cls
    @job.configuration.set(REDUCER_KEY, cls.name)
    @job.set_reducer_class(@context.proxy_class(:reducer))
    @reducer = cls
  end
  @reducer
end

#set(property, value) ⇒ `Object`

Sets a job property.

Calls set/setBoolean/setLong/setFloat on the Hadoop Job’s configuration (exact method depends on the type of the value).

Parameters:

property (String) —

The property name
value (String, Numeric, Boolean) —

The property value

See Also:

# File 'lib/rubydoop/dsl.rb', line 133

def set(property, value)
  case value
  when Integer
    @job.configuration.set_long(property, value)
  when Float
    @job.configuration.set_float(property, value)
  when true, false
    @job.configuration.set_boolean(property, value)
  else
    @job.configuration.set(property, value)
  end
end

#sort_comparator(cls = nil) ⇒ `Object` Also known as: sort_comparator=

Sets a custom sort comparator.

The equivalent of calling setSortComparatorClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

Parameters:

cls (Class) (defaults to: nil) —

The (Ruby) comparator class.

See Also:

Hadoop's Job#setSortComparatorClass

# File 'lib/rubydoop/dsl.rb', line 269

def sort_comparator(cls=nil)
  if cls
    @job.configuration.set(SORT_COMPARATOR_KEY, cls.name)
    @job.set_sort_comparator_class(@context.proxy_class(:sort_comparator))
    @sort_comparator = cls
  end
  @sort_comparator
end

Class: Rubydoop::JobDefinition

Overview

Instance Method Summary collapse

Constructor Details

#initialize(context, job) ⇒ JobDefinition

Instance Method Details

#combiner(cls = nil) ⇒ Object Also known as: combiner=

#grouping_comparator(cls = nil) ⇒ Object Also known as: grouping_comparator=

#input(paths, options = {}) ⇒ Object

#map_output_key(cls) ⇒ Object

#map_output_value(cls) ⇒ Object

#mapper(cls = nil) ⇒ Object Also known as: mapper=

#output(dir, options = {}) ⇒ Object

#output_key(cls) ⇒ Object

#partitioner(cls = nil) ⇒ Object Also known as: partitioner=

#raw {|job| ... } ⇒ Object

#reducer(cls = nil) ⇒ Object Also known as: reducer=

#set(property, value) ⇒ Object

#sort_comparator(cls = nil) ⇒ Object Also known as: sort_comparator=

#initialize(context, job) ⇒ `JobDefinition`

#combiner(cls = nil) ⇒ `Object` Also known as: combiner=

#grouping_comparator(cls = nil) ⇒ `Object` Also known as: grouping_comparator=

#input(paths, options = {}) ⇒ `Object`

#map_output_key(cls) ⇒ `Object`

#map_output_value(cls) ⇒ `Object`

#mapper(cls = nil) ⇒ `Object` Also known as: mapper=

#output(dir, options = {}) ⇒ `Object`

#output_key(cls) ⇒ `Object`

#partitioner(cls = nil) ⇒ `Object` Also known as: partitioner=

#raw {|job| ... } ⇒ `Object`

#reducer(cls = nil) ⇒ `Object` Also known as: reducer=

#set(property, value) ⇒ `Object`

#sort_comparator(cls = nil) ⇒ `Object` Also known as: sort_comparator=