Class: Rubydoop::JobDefinition

Inherits:
Object
  • Object
show all
Defined in:
lib/rubydoop/dsl.rb

Overview

Job configuration DSL.

‘Rubydoop.configure` blocks are run within the context of an instance of this class. These are the methods available in those blocks.

Instance Method Summary collapse

Constructor Details

#initialize(context, job) ⇒ JobDefinition

Returns a new instance of JobDefinition.



82
83
84
85
# File 'lib/rubydoop/dsl.rb', line 82

def initialize(context, job)
  @context = context
  @job = job
end

Instance Method Details

#combiner(cls = nil) ⇒ Object Also known as: combiner=

Sets the combiner class.

The equivalent of calling ‘setCombinerClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

A combiner should implement ‘reduce`, just like reducers.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) combiner class.

See Also:



237
238
239
240
241
242
243
244
# File 'lib/rubydoop/dsl.rb', line 237

def combiner(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::CombinerProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_combiner_class(Rubydoop::CombinerProxy)
    @combiner = cls
  end
  @combiner
end

#grouping_comparator(cls = nil) ⇒ Object Also known as: grouping_comparator=

Sets a custom grouping comparator.

The equivalent of calling ‘setGroupingComparatorClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) comparator class.

See Also:



278
279
280
281
282
283
284
285
# File 'lib/rubydoop/dsl.rb', line 278

def grouping_comparator(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::GroupingComparatorProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_grouping_comparator_class(Rubydoop::GroupingComparatorProxy)
    @grouping_comparator = cls
  end
  @grouping_comparator
end

#input(paths, options = {}) ⇒ Object

Sets the input paths of the job.

Calls ‘setInputFormatClass` on the Hadoop job and uses the static `setInputPaths` on the input format to set the job’s input path.

Parameters:

  • paths (String, Enumerable)

    The input paths, either a comma separated string or an ‘Enumerable` of strings (which will be joined with a comma).

  • options (Hash) (defaults to: {})

Options Hash (options):

  • :format (JavaClass)

    The input format to use, defaults to ‘TextInputFormat`

See Also:



98
99
100
101
102
103
104
105
106
107
108
109
110
111
# File 'lib/rubydoop/dsl.rb', line 98

def input(paths, options={})
  paths = paths.join(',') if paths.is_a?(Enumerable)
  format = options.fetch(:format, :text)
  unless format.is_a?(Class)
    class_name = format.to_s.gsub(/^.|_./) {|x| x[-1,1].upcase } + "InputFormat"
    format = Hadoop::Mapreduce::Lib::Input.const_get(class_name)
  end
  unless format <= Hadoop::Mapreduce::InputFormat
    @job.configuration.set(Rubydoop::InputFormatProxy::RUBY_CLASS_KEY, format.name)
    format = Rubydoop::InputFormatProxy
  end
  format.set_input_paths(@job, paths)
  @job.set_input_format_class(format)
end

#map_output_key(cls) ⇒ Object

Sets the mapper’s output key type.

Parameters:

  • cls (Class)

    The mapper’s output key type

See Also:



342
# File 'lib/rubydoop/dsl.rb', line 342

class_setter :map_output_key

#map_output_value(cls) ⇒ Object

Sets the reducer’s output value type.

Parameters:

  • cls (Class)

    The reducer’s output value type

See Also:



351
# File 'lib/rubydoop/dsl.rb', line 351

class_setter :map_output_value

#mapper(cls = nil) ⇒ Object Also known as: mapper=

Sets the mapper class.

The equivalent of calling ‘setMapperClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

The class only needs to implement the method ‘map`, which will be called exactly like a Java mapper class’ ‘map` method would be called.

You can optionally implement ‘setup` and `cleanup`, which mirrors the methods of the same name in Java mappers.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) mapper class.

See Also:



190
191
192
193
194
195
196
197
# File 'lib/rubydoop/dsl.rb', line 190

def mapper(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::MapperProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_mapper_class(Rubydoop::MapperProxy)
    @mapper = cls
  end
  @mapper
end

#output(dir = nil, options = {}) ⇒ Object

Sets or gets the output path of the job.

Calls ‘setOutputFormatClass` on the Hadoop job and uses the static `setOutputPath` on the output format to set the job’s output path.

Parameters:

  • dir (String) (defaults to: nil)

    The output path

  • options (Hash) (defaults to: {})

Options Hash (options):

  • :format (JavaClass)

    The output format to use, defaults to ‘TextOutputFormat`

See Also:



123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# File 'lib/rubydoop/dsl.rb', line 123

def output(dir=nil, options={})
  if dir
    if dir.is_a?(Hash)
      options = dir
      if options[:intermediate]
        dir = @job.job_name
      else
        raise ArgumentError, sprintf('neither dir nor intermediate: true was specified')
      end
    end
    dir = sprintf('%s-%010d-%05d', dir, Time.now, rand(1e5)) if options[:intermediate]
    @output_dir = dir
    format = options.fetch(:format, :text)
    unless format.is_a?(Class)
      class_name = format.to_s.gsub(/^.|_./) {|x| x[-1,1].upcase } + "OutputFormat"
      format = Hadoop::Mapreduce::Lib::Output.const_get(class_name)
    end
    format.set_output_path(@job, Hadoop::Fs::Path.new(@output_dir))
    @job.set_output_format_class(format)
    if options[:lazy]
      Hadoop::Mapreduce::Lib::Output::LazyOutputFormat.set_output_format_class(@job, format)
    end
  end
  @output_dir
end

#output_key(cls) ⇒ Object

Sets the reducer’s output key type.

Parameters:

  • cls (Class)

    The reducer’s output key type

See Also:



360
# File 'lib/rubydoop/dsl.rb', line 360

class_setter :output_key

#partitioner(cls = nil) ⇒ Object Also known as: partitioner=

Sets a custom partitioner.

The equivalent of calling ‘setPartitionerClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

The class must implement ‘partition`, which will be called exactly like a Java partitioner would.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) partitioner class.

See Also:



259
260
261
262
263
264
265
266
# File 'lib/rubydoop/dsl.rb', line 259

def partitioner(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::PartitionerProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_partitioner_class(Rubydoop::PartitionerProxy)
    @partitioner = cls
  end
  @partitioner
end

#raw {|job| ... } ⇒ Object

If you need to manipulate the Hadoop job in some that isn’t covered by this DSL, this is the method for you. It yields the ‘Job`, letting you do whatever you want with it.

Yield Parameters:

  • job (Hadoop::Mapreduce::Job)

    The raw Hadoop Job instance

See Also:



314
315
316
# File 'lib/rubydoop/dsl.rb', line 314

def raw(&block)
  yield @job
end

#reducer(cls = nil) ⇒ Object Also known as: reducer=

Sets the reducer class.

The equivalent of calling ‘setReducerClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

The class only needs to implement the method ‘reduce`, which will be called exactly like a Java reducer class’ ‘reduce` method would be called.

You can optionally implement ‘setup` and `cleanup`, which mirrors the methods of the same name in Java reducers.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) reducer class.

See Also:



216
217
218
219
220
221
222
223
# File 'lib/rubydoop/dsl.rb', line 216

def reducer(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::ReducerProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_reducer_class(Rubydoop::ReducerProxy)
    @reducer = cls
  end
  @reducer
end

#set(property, value) ⇒ Object

Sets a job property.

Calls ‘set`/`setBoolean`/`setLong`/`setFloat` on the Hadoop Job’s configuration (exact method depends on the type of the value).

Parameters:

  • property (String)

    The property name

  • value (String, Numeric, Boolean)

    The property value

See Also:



161
162
163
164
165
166
167
168
169
170
171
172
# File 'lib/rubydoop/dsl.rb', line 161

def set(property, value)
  case value
  when Integer
    @job.configuration.set_long(property, value)
  when Float
    @job.configuration.set_float(property, value)
  when true, false
    @job.configuration.set_boolean(property, value)
  else
    @job.configuration.set(property, value)
  end
end

#sort_comparator(cls = nil) ⇒ Object Also known as: sort_comparator=

Sets a custom sort comparator.

The equivalent of calling ‘setSortComparatorClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) comparator class.

See Also:



297
298
299
300
301
302
303
304
# File 'lib/rubydoop/dsl.rb', line 297

def sort_comparator(cls=nil)
  if cls
    @job.configuration.set(Rubydoop::SortComparatorProxy::RUBY_CLASS_KEY, cls.name)
    @job.set_sort_comparator_class(Rubydoop::SortComparatorProxy)
    @sort_comparator = cls
  end
  @sort_comparator
end