Class: Rubydoop::JobDefinition

Inherits:
Object
  • Object
show all
Defined in:
lib/rubydoop/dsl.rb

Overview

Job configuration DSL.

Rubydoop.configure blocks are run within the context of an instance of this class. These are the methods available in those blocks.

Instance Method Summary collapse

Constructor Details

#initialize(context, job) ⇒ JobDefinition

Returns a new instance of JobDefinition.



74
75
76
77
# File 'lib/rubydoop/dsl.rb', line 74

def initialize(context, job)
  @context = context
  @job = job
end

Instance Method Details

#combiner(cls = nil) ⇒ Object Also known as: combiner=

Sets the combiner class.

The equivalent of calling setCombinerClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

A combiner should implement reduce, just like reducers.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) combiner class.

See Also:



209
210
211
212
213
214
215
216
# File 'lib/rubydoop/dsl.rb', line 209

def combiner(cls=nil)
  if cls
    @job.configuration.set(COMBINER_KEY, cls.name)
    @job.set_combiner_class(@context.proxy_class(:combiner))
    @combiner = cls
  end
  @combiner
end

#grouping_comparator(cls = nil) ⇒ Object Also known as: grouping_comparator=

Sets a custom grouping comparator.

The equivalent of calling setGroupingComparatorClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) comparator class.

See Also:



250
251
252
253
254
255
256
257
# File 'lib/rubydoop/dsl.rb', line 250

def grouping_comparator(cls=nil)
  if cls
    @job.configuration.set(GROUPING_COMPARATOR_KEY, cls.name)
    @job.set_grouping_comparator_class(@context.proxy_class(:grouping_comparator))
    @grouping_comparator = cls
  end
  @grouping_comparator
end

#input(paths, options = {}) ⇒ Object

Sets the input paths of the job.

Calls setInputFormatClass on the Hadoop job and uses the static setInputPaths on the input format to set the job’s input path.

Parameters:

  • paths (String, Enumerable)

    The input paths, either a comma separated string or an Enumerable of strings (which will be joined with a comma).

  • options (Hash) (defaults to: {})

Options Hash (options):

  • :format (JavaClass)

    The input format to use, defaults to TextInputFormat

See Also:



90
91
92
93
94
95
96
97
98
99
# File 'lib/rubydoop/dsl.rb', line 90

def input(paths, options={})
  paths = paths.join(',') if paths.is_a?(Enumerable)
  format = options.fetch(:format, :text)
  unless format.is_a?(Class)
    class_name = format.to_s.gsub(/^.|_./) {|x| x[-1,1].upcase } + "InputFormat"
    format = Hadoop::Mapreduce::Lib::Input.const_get(class_name)
  end
  format.set_input_paths(@job, paths)
  @job.set_input_format_class(format)
end

#map_output_key(cls) ⇒ Object

Sets the mapper’s output key type.

Parameters:

  • cls (Class)

    The mapper’s output key type

See Also:



314
# File 'lib/rubydoop/dsl.rb', line 314

class_setter :map_output_key

#map_output_value(cls) ⇒ Object

Sets the reducer’s output value type.

Parameters:

  • cls (Class)

    The reducer’s output value type

See Also:



323
# File 'lib/rubydoop/dsl.rb', line 323

class_setter :map_output_value

#mapper(cls = nil) ⇒ Object Also known as: mapper=

Sets the mapper class.

The equivalent of calling setMapperClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

The class only needs to implement the method map, which will be called exactly like a Java mapper class’ map method would be called.

You can optionally implement setup and cleanup, which mirrors the methods of the same name in Java mappers.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) mapper class.

See Also:



162
163
164
165
166
167
168
169
# File 'lib/rubydoop/dsl.rb', line 162

def mapper(cls=nil)
  if cls
    @job.configuration.set(MAPPER_KEY, cls.name)
    @job.set_mapper_class(@context.proxy_class(:mapper))
    @mapper = cls
  end
  @mapper
end

#output(dir, options = {}) ⇒ Object

Sets the output path of the job.

Calls setOutputFormatClass on the Hadoop job and uses the static setOutputPath on the output format to set the job’s output path.

Parameters:

  • dir (String)

    The output path

  • options (Hash) (defaults to: {})

Options Hash (options):

  • :format (JavaClass)

    The output format to use, defaults to TextOutputFormat

See Also:



111
112
113
114
115
116
117
118
119
# File 'lib/rubydoop/dsl.rb', line 111

def output(dir, options={})
  format = options.fetch(:format, :text)
  unless format.is_a?(Class)
    class_name = format.to_s.gsub(/^.|_./) {|x| x[-1,1].upcase } + "OutputFormat"
    format = Hadoop::Mapreduce::Lib::Output.const_get(class_name)
  end
  format.set_output_path(@job, Hadoop::Fs::Path.new(dir))
  @job.set_output_format_class(format)
end

#output_key(cls) ⇒ Object

Sets the reducer’s output key type.

Parameters:

  • cls (Class)

    The reducer’s output key type

See Also:



332
# File 'lib/rubydoop/dsl.rb', line 332

class_setter :output_key

#partitioner(cls = nil) ⇒ Object Also known as: partitioner=

Sets a custom partitioner.

The equivalent of calling setPartitionerClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

The class must implement partition, which will be called exactly like a Java partitioner would.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) partitioner class.

See Also:



231
232
233
234
235
236
237
238
# File 'lib/rubydoop/dsl.rb', line 231

def partitioner(cls=nil)
  if cls
    @job.configuration.set(PARTITIONER_KEY, cls.name)
    @job.set_partitioner_class(@context.proxy_class(:partitioner))
    @partitioner = cls
  end
  @partitioner
end

#raw {|job| ... } ⇒ Object

If you need to manipulate the Hadoop job in some that isn’t covered by this DSL, this is the method for you. It yields the Job, letting you do whatever you want with it.

Yield Parameters:

  • job (Hadoop::Mapreduce::Job)

    The raw Hadoop Job instance

See Also:



286
287
288
# File 'lib/rubydoop/dsl.rb', line 286

def raw(&block)
  yield @job
end

#reducer(cls = nil) ⇒ Object Also known as: reducer=

Sets the reducer class.

The equivalent of calling setReducerClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

The class only needs to implement the method reduce, which will be called exactly like a Java reducer class’ reduce method would be called.

You can optionally implement setup and cleanup, which mirrors the methods of the same name in Java reducers.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) reducer class.

See Also:



188
189
190
191
192
193
194
195
# File 'lib/rubydoop/dsl.rb', line 188

def reducer(cls=nil)
  if cls
    @job.configuration.set(REDUCER_KEY, cls.name)
    @job.set_reducer_class(@context.proxy_class(:reducer))
    @reducer = cls
  end
  @reducer
end

#set(property, value) ⇒ Object

Sets a job property.

Calls set/setBoolean/setLong/setFloat on the Hadoop Job’s configuration (exact method depends on the type of the value).

Parameters:

  • property (String)

    The property name

  • value (String, Numeric, Boolean)

    The property value

See Also:



133
134
135
136
137
138
139
140
141
142
143
144
# File 'lib/rubydoop/dsl.rb', line 133

def set(property, value)
  case value
  when Integer
    @job.configuration.set_long(property, value)
  when Float
    @job.configuration.set_float(property, value)
  when true, false
    @job.configuration.set_boolean(property, value)
  else
    @job.configuration.set(property, value)
  end
end

#sort_comparator(cls = nil) ⇒ Object Also known as: sort_comparator=

Sets a custom sort comparator.

The equivalent of calling setSortComparatorClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.

Parameters:

  • cls (Class) (defaults to: nil)

    The (Ruby) comparator class.

See Also:



269
270
271
272
273
274
275
276
# File 'lib/rubydoop/dsl.rb', line 269

def sort_comparator(cls=nil)
  if cls
    @job.configuration.set(SORT_COMPARATOR_KEY, cls.name)
    @job.set_sort_comparator_class(@context.proxy_class(:sort_comparator))
    @sort_comparator = cls
  end
  @sort_comparator
end