Class: Rubydoop::JobDefinition
- Inherits:
-
Object
- Object
- Rubydoop::JobDefinition
- Defined in:
- lib/rubydoop/dsl.rb
Overview
Job configuration DSL.
Rubydoop.configure blocks are run within the context of an instance of this class. These are the methods available in those blocks.
Instance Method Summary collapse
-
#combiner(cls = nil) ⇒ Object
(also: #combiner=)
Sets the combiner class.
-
#grouping_comparator(cls = nil) ⇒ Object
(also: #grouping_comparator=)
Sets a custom grouping comparator.
-
#initialize(context, job) ⇒ JobDefinition
constructor
A new instance of JobDefinition.
-
#input(paths, options = {}) ⇒ Object
Sets the input paths of the job.
-
#map_output_key(cls) ⇒ Object
Sets the mapper’s output key type.
-
#map_output_value(cls) ⇒ Object
Sets the reducer’s output value type.
-
#mapper(cls = nil) ⇒ Object
(also: #mapper=)
Sets the mapper class.
-
#output(dir, options = {}) ⇒ Object
Sets the output path of the job.
-
#output_key(cls) ⇒ Object
Sets the reducer’s output key type.
-
#partitioner(cls = nil) ⇒ Object
(also: #partitioner=)
Sets a custom partitioner.
-
#raw {|job| ... } ⇒ Object
If you need to manipulate the Hadoop job in some that isn’t covered by this DSL, this is the method for you.
-
#reducer(cls = nil) ⇒ Object
(also: #reducer=)
Sets the reducer class.
-
#set(property, value) ⇒ Object
Sets a job property.
-
#sort_comparator(cls = nil) ⇒ Object
(also: #sort_comparator=)
Sets a custom sort comparator.
Constructor Details
#initialize(context, job) ⇒ JobDefinition
Returns a new instance of JobDefinition.
74 75 76 77 |
# File 'lib/rubydoop/dsl.rb', line 74 def initialize(context, job) @context = context @job = job end |
Instance Method Details
#combiner(cls = nil) ⇒ Object Also known as: combiner=
Sets the combiner class.
The equivalent of calling setCombinerClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.
A combiner should implement reduce, just like reducers.
209 210 211 212 213 214 215 216 |
# File 'lib/rubydoop/dsl.rb', line 209 def combiner(cls=nil) if cls @job.configuration.set(COMBINER_KEY, cls.name) @job.set_combiner_class(@context.proxy_class(:combiner)) @combiner = cls end @combiner end |
#grouping_comparator(cls = nil) ⇒ Object Also known as: grouping_comparator=
Sets a custom grouping comparator.
The equivalent of calling setGroupingComparatorClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.
250 251 252 253 254 255 256 257 |
# File 'lib/rubydoop/dsl.rb', line 250 def grouping_comparator(cls=nil) if cls @job.configuration.set(GROUPING_COMPARATOR_KEY, cls.name) @job.set_grouping_comparator_class(@context.proxy_class(:grouping_comparator)) @grouping_comparator = cls end @grouping_comparator end |
#input(paths, options = {}) ⇒ Object
Sets the input paths of the job.
Calls setInputFormatClass on the Hadoop job and uses the static setInputPaths on the input format to set the job’s input path.
90 91 92 93 94 95 96 97 98 99 |
# File 'lib/rubydoop/dsl.rb', line 90 def input(paths, ={}) paths = paths.join(',') if paths.is_a?(Enumerable) format = .fetch(:format, :text) unless format.is_a?(Class) class_name = format.to_s.gsub(/^.|_./) {|x| x[-1,1].upcase } + "InputFormat" format = Hadoop::Mapreduce::Lib::Input.const_get(class_name) end format.set_input_paths(@job, paths) @job.set_input_format_class(format) end |
#map_output_key(cls) ⇒ Object
Sets the mapper’s output key type.
314 |
# File 'lib/rubydoop/dsl.rb', line 314 class_setter :map_output_key |
#map_output_value(cls) ⇒ Object
Sets the reducer’s output value type.
323 |
# File 'lib/rubydoop/dsl.rb', line 323 class_setter :map_output_value |
#mapper(cls = nil) ⇒ Object Also known as: mapper=
Sets the mapper class.
The equivalent of calling setMapperClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.
The class only needs to implement the method map, which will be called exactly like a Java mapper class’ map method would be called.
You can optionally implement setup and cleanup, which mirrors the methods of the same name in Java mappers.
162 163 164 165 166 167 168 169 |
# File 'lib/rubydoop/dsl.rb', line 162 def mapper(cls=nil) if cls @job.configuration.set(MAPPER_KEY, cls.name) @job.set_mapper_class(@context.proxy_class(:mapper)) @mapper = cls end @mapper end |
#output(dir, options = {}) ⇒ Object
Sets the output path of the job.
Calls setOutputFormatClass on the Hadoop job and uses the static setOutputPath on the output format to set the job’s output path.
111 112 113 114 115 116 117 118 119 |
# File 'lib/rubydoop/dsl.rb', line 111 def output(dir, ={}) format = .fetch(:format, :text) unless format.is_a?(Class) class_name = format.to_s.gsub(/^.|_./) {|x| x[-1,1].upcase } + "OutputFormat" format = Hadoop::Mapreduce::Lib::Output.const_get(class_name) end format.set_output_path(@job, Hadoop::Fs::Path.new(dir)) @job.set_output_format_class(format) end |
#output_key(cls) ⇒ Object
Sets the reducer’s output key type.
332 |
# File 'lib/rubydoop/dsl.rb', line 332 class_setter :output_key |
#partitioner(cls = nil) ⇒ Object Also known as: partitioner=
Sets a custom partitioner.
The equivalent of calling setPartitionerClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.
The class must implement partition, which will be called exactly like a Java partitioner would.
231 232 233 234 235 236 237 238 |
# File 'lib/rubydoop/dsl.rb', line 231 def partitioner(cls=nil) if cls @job.configuration.set(PARTITIONER_KEY, cls.name) @job.set_partitioner_class(@context.proxy_class(:partitioner)) @partitioner = cls end @partitioner end |
#raw {|job| ... } ⇒ Object
If you need to manipulate the Hadoop job in some that isn’t covered by this DSL, this is the method for you. It yields the Job, letting you do whatever you want with it.
286 287 288 |
# File 'lib/rubydoop/dsl.rb', line 286 def raw(&block) yield @job end |
#reducer(cls = nil) ⇒ Object Also known as: reducer=
Sets the reducer class.
The equivalent of calling setReducerClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.
The class only needs to implement the method reduce, which will be called exactly like a Java reducer class’ reduce method would be called.
You can optionally implement setup and cleanup, which mirrors the methods of the same name in Java reducers.
188 189 190 191 192 193 194 195 |
# File 'lib/rubydoop/dsl.rb', line 188 def reducer(cls=nil) if cls @job.configuration.set(REDUCER_KEY, cls.name) @job.set_reducer_class(@context.proxy_class(:reducer)) @reducer = cls end @reducer end |
#set(property, value) ⇒ Object
Sets a job property.
Calls set/setBoolean/setLong/setFloat on the Hadoop Job’s configuration (exact method depends on the type of the value).
133 134 135 136 137 138 139 140 141 142 143 144 |
# File 'lib/rubydoop/dsl.rb', line 133 def set(property, value) case value when Integer @job.configuration.set_long(property, value) when Float @job.configuration.set_float(property, value) when true, false @job.configuration.set_boolean(property, value) else @job.configuration.set(property, value) end end |
#sort_comparator(cls = nil) ⇒ Object Also known as: sort_comparator=
Sets a custom sort comparator.
The equivalent of calling setSortComparatorClass on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.
269 270 271 272 273 274 275 276 |
# File 'lib/rubydoop/dsl.rb', line 269 def sort_comparator(cls=nil) if cls @job.configuration.set(SORT_COMPARATOR_KEY, cls.name) @job.set_sort_comparator_class(@context.proxy_class(:sort_comparator)) @sort_comparator = cls end @sort_comparator end |