Class: Rubydoop::JobDefinition

Inherits:
Object
  • Object
show all
Defined in:
lib/rubydoop/dsl.rb

Overview

Job configuration DSL.

Rubydoop.configure blocks are run within the context of an instance of this class. These are the methods available in those blocks.

Instance Method Summary collapse

Constructor Details

#initialize(context, job) ⇒ JobDefinition

Returns a new instance of JobDefinition.



82
83
84
85
# File 'lib/rubydoop/dsl.rb', line 82

def initialize(context, job)
  @context = context
  @job = job
end

Instance Method Details

#combiner(cls = nil) ⇒ Object Also known as: combiner=

Sets the combiner class.

The equivalent of calling setCombinerClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

A combiner should implement reduce, just like reducers.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) combiner class.

See Also:



237
238
239
240
241
242
243
244
# File 'lib/rubydoop/dsl.rb', line 237

def combiner(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::CombinerProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_combiner_class(Rubydoop::CombinerProxy)
    @combiner = cls
  end
  @combiner
end

#grouping_comparator(cls = nil) ⇒ Object Also known as: grouping_comparator=

Sets a custom grouping comparator.

The equivalent of calling setGroupingComparatorClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) comparator class.

See Also:



278
279
280
281
282
283
284
285
# File 'lib/rubydoop/dsl.rb', line 278

def grouping_comparator(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::GroupingComparatorProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_grouping_comparator_class(Rubydoop::GroupingComparatorProxy)
    @grouping_comparator = cls
  end
  @grouping_comparator
end

#input(paths, options = {}) ⇒ Object

Sets the input paths of the job.

Calls setInputFormatClass on the Hadoop job and uses the static setInputPaths on the input format to set the job’s input path.

Parameters:

  • paths (String, Enumerable)

    The input paths, either a comma separated string or an Enumerable of strings (which will be joined with a comma).

  • options (Hash) (defaults to: {})

Options Hash (options):

  • :format (JavaClass)

    The input format to use, defaults to TextInputFormat

See Also:



98
99
100
101
102
103
104
105
106
107
108
109
110
111
# File 'lib/rubydoop/dsl.rb', line 98

def input(paths, options={})
  paths = paths.join(',') if paths.is_a?(Enumerable)
  format = options.fetch(:format, :text)
  unless format.is_a?(Class)
    class_name = format.to_s.gsub(/^.|_./) {|x| x[-1,1].upcase } + "InputFormat"
    format = Hadoop::Mapreduce::Lib::Input.const_get(class_name)
  end
  unless format <= Hadoop::Mapreduce::InputFormat
    @job.configuration.set(Rubydoop::InputFormatProxy::RUBY_CLASS_KEY, format.name)
    format = Rubydoop::InputFormatProxy
  end
  format.set_input_paths(@job, paths)
  @job.set_input_format_class(format)
end

#map_output_key(cls) ⇒ Object

Sets the mapper’s output key type.

Parameters:

  • cls (Class)

    The mapper’s output key type

See Also:



342
# File 'lib/rubydoop/dsl.rb', line 342

class_setter :map_output_key

#map_output_value(cls) ⇒ Object

Sets the reducer’s output value type.

Parameters:

  • cls (Class)

    The reducer’s output value type

See Also:



351
# File 'lib/rubydoop/dsl.rb', line 351

class_setter :map_output_value

#mapper(cls = nil) ⇒ Object Also known as: mapper=

Sets the mapper class.

The equivalent of calling setMapperClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

The class only needs to implement the method map, which will be called exactly like a Java mapper class’ map method would be called.

You can optionally implement setup and cleanup, which mirrors the methods of the same name in Java mappers.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) mapper class.

See Also:



190
191
192
193
194
195
196
197
# File 'lib/rubydoop/dsl.rb', line 190

def mapper(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::MapperProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_mapper_class(Rubydoop::MapperProxy)
    @mapper = cls
  end
  @mapper
end

#output(dir = nil, options = {}) ⇒ Object

Sets or gets the output path of the job.

Calls setOutputFormatClass on the Hadoop job and uses the static setOutputPath on the output format to set the job’s output path.

Parameters:

  • dir (String) (defaults to: nil)

    The output path

  • options (Hash) (defaults to: {})

Options Hash (options):

  • :format (JavaClass)

    The output format to use, defaults to TextOutputFormat

See Also:



123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# File 'lib/rubydoop/dsl.rb', line 123

def output(dir=nil, options={})
  if dir
    if dir.is_a?(Hash)
      options = dir
      if options[:intermediate]
        dir = @job.job_name
      else
        raise ArgumentError, sprintf('neither dir nor intermediate: true was specified')
      end
    end
    dir = sprintf('%s-%010d-%05d', dir, Time.now, rand(1e5)) if options[:intermediate]
    @output_dir = dir
    format = options.fetch(:format, :text)
    unless format.is_a?(Class)
      class_name = format.to_s.gsub(/^.|_./) {|x| x[-1,1].upcase } + "OutputFormat"
      format = Hadoop::Mapreduce::Lib::Output.const_get(class_name)
    end
    format.set_output_path(@job, Hadoop::Fs::Path.new(@output_dir))
    @job.set_output_format_class(format)
    if options[:lazy]
      Hadoop::Mapreduce::Lib::Output::LazyOutputFormat.set_output_format_class(@job, format)
    end
  end
  @output_dir
end

#output_key(cls) ⇒ Object

Sets the reducer’s output key type.

Parameters:

  • cls (Class)

    The reducer’s output key type

See Also:



360
# File 'lib/rubydoop/dsl.rb', line 360

class_setter :output_key

#partitioner(cls = nil) ⇒ Object Also known as: partitioner=

Sets a custom partitioner.

The equivalent of calling setPartitionerClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

The class must implement partition, which will be called exactly like a Java partitioner would.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) partitioner class.

See Also:



259
260
261
262
263
264
265
266
# File 'lib/rubydoop/dsl.rb', line 259

def partitioner(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::PartitionerProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_partitioner_class(Rubydoop::PartitionerProxy)
    @partitioner = cls
  end
  @partitioner
end

#raw {|job| ... } ⇒ Object

If you need to manipulate the Hadoop job in some that isn’t covered by this DSL, this is the method for you. It yields the Job, letting you do whatever you want with it.

Yield Parameters:

  • job (Hadoop::Mapreduce::Job)

    The raw Hadoop Job instance

See Also:



314
315
316
# File 'lib/rubydoop/dsl.rb', line 314

def raw(&block)
  yield @job
end

#reducer(cls = nil) ⇒ Object Also known as: reducer=

Sets the reducer class.

The equivalent of calling setReducerClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

The class only needs to implement the method reduce, which will be called exactly like a Java reducer class’ reduce method would be called.

You can optionally implement setup and cleanup, which mirrors the methods of the same name in Java reducers.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) reducer class.

See Also:



216
217
218
219
220
221
222
223
# File 'lib/rubydoop/dsl.rb', line 216

def reducer(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::ReducerProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_reducer_class(Rubydoop::ReducerProxy)
    @reducer = cls
  end
  @reducer
end

#set(property, value) ⇒ Object

Sets a job property.

Calls set/setBoolean/setLong/setFloat on the Hadoop Job’s configuration (exact method depends on the type of the value).

Parameters:

  • property (String)

    The property name

  • value (String, Numeric, Boolean)

    The property value

See Also:



161
162
163
164
165
166
167
168
169
170
171
172
# File 'lib/rubydoop/dsl.rb', line 161

def set(property, value)
  case value
  when Integer
    @job.configuration.set_long(property, value)
  when Float
    @job.configuration.set_float(property, value)
  when true, false
    @job.configuration.set_boolean(property, value)
  else
    @job.configuration.set(property, value)
  end
end

#sort_comparator(cls = nil) ⇒ Object Also known as: sort_comparator=

Sets a custom sort comparator.

The equivalent of calling setSortComparatorClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) comparator class.

See Also:



297
298
299
300
301
302
303
304
# File 'lib/rubydoop/dsl.rb', line 297

def sort_comparator(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::SortComparatorProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_sort_comparator_class(Rubydoop::SortComparatorProxy)
    @sort_comparator = cls
  end
  @sort_comparator
end