Class: Rubydoop::JobDefinition
- Inherits:
-
Object
- Object
- Rubydoop::JobDefinition
- Defined in:
- lib/rubydoop/dsl.rb
Overview
Job configuration DSL.
‘Rubydoop.configure` blocks are run within the context of an instance of this class. These are the methods available in those blocks.
Instance Method Summary collapse
-
#combiner(cls = nil) ⇒ Object
(also: #combiner=)
Sets the combiner class.
-
#grouping_comparator(cls = nil) ⇒ Object
(also: #grouping_comparator=)
Sets a custom grouping comparator.
-
#initialize(context, job) ⇒ JobDefinition
constructor
A new instance of JobDefinition.
-
#input(paths, options = {}) ⇒ Object
Sets the input paths of the job.
-
#map_output_key(cls) ⇒ Object
Sets the mapper’s output key type.
-
#map_output_value(cls) ⇒ Object
Sets the reducer’s output value type.
-
#mapper(cls = nil) ⇒ Object
(also: #mapper=)
Sets the mapper class.
-
#output(dir = nil, options = {}) ⇒ Object
Sets or gets the output path of the job.
-
#output_key(cls) ⇒ Object
Sets the reducer’s output key type.
-
#partitioner(cls = nil) ⇒ Object
(also: #partitioner=)
Sets a custom partitioner.
-
#raw {|job| ... } ⇒ Object
If you need to manipulate the Hadoop job in some that isn’t covered by this DSL, this is the method for you.
-
#reducer(cls = nil) ⇒ Object
(also: #reducer=)
Sets the reducer class.
-
#set(property, value) ⇒ Object
Sets a job property.
-
#sort_comparator(cls = nil) ⇒ Object
(also: #sort_comparator=)
Sets a custom sort comparator.
Constructor Details
#initialize(context, job) ⇒ JobDefinition
Returns a new instance of JobDefinition.
82 83 84 85 |
# File 'lib/rubydoop/dsl.rb', line 82 def initialize(context, job) @context = context @job = job end |
Instance Method Details
#combiner(cls = nil) ⇒ Object Also known as: combiner=
Sets the combiner class.
The equivalent of calling ‘setCombinerClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.
A combiner should implement ‘reduce`, just like reducers.
237 238 239 240 241 242 243 244 |
# File 'lib/rubydoop/dsl.rb', line 237 def combiner(cls=nil) if cls @job.configuration.set(Rubydoop::CombinerProxy::RUBY_CLASS_KEY, cls.name) @job.set_combiner_class(Rubydoop::CombinerProxy) @combiner = cls end @combiner end |
#grouping_comparator(cls = nil) ⇒ Object Also known as: grouping_comparator=
Sets a custom grouping comparator.
The equivalent of calling ‘setGroupingComparatorClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.
278 279 280 281 282 283 284 285 |
# File 'lib/rubydoop/dsl.rb', line 278 def grouping_comparator(cls=nil) if cls @job.configuration.set(Rubydoop::GroupingComparatorProxy::RUBY_CLASS_KEY, cls.name) @job.set_grouping_comparator_class(Rubydoop::GroupingComparatorProxy) @grouping_comparator = cls end @grouping_comparator end |
#input(paths, options = {}) ⇒ Object
Sets the input paths of the job.
Calls ‘setInputFormatClass` on the Hadoop job and uses the static `setInputPaths` on the input format to set the job’s input path.
98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
# File 'lib/rubydoop/dsl.rb', line 98 def input(paths, ={}) paths = paths.join(',') if paths.is_a?(Enumerable) format = .fetch(:format, :text) unless format.is_a?(Class) class_name = format.to_s.gsub(/^.|_./) {|x| x[-1,1].upcase } + "InputFormat" format = Hadoop::Mapreduce::Lib::Input.const_get(class_name) end unless format <= Hadoop::Mapreduce::InputFormat @job.configuration.set(Rubydoop::InputFormatProxy::RUBY_CLASS_KEY, format.name) format = Rubydoop::InputFormatProxy end format.set_input_paths(@job, paths) @job.set_input_format_class(format) end |
#map_output_key(cls) ⇒ Object
Sets the mapper’s output key type.
342 |
# File 'lib/rubydoop/dsl.rb', line 342 class_setter :map_output_key |
#map_output_value(cls) ⇒ Object
Sets the reducer’s output value type.
351 |
# File 'lib/rubydoop/dsl.rb', line 351 class_setter :map_output_value |
#mapper(cls = nil) ⇒ Object Also known as: mapper=
Sets the mapper class.
The equivalent of calling ‘setMapperClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.
The class only needs to implement the method ‘map`, which will be called exactly like a Java mapper class’ ‘map` method would be called.
You can optionally implement ‘setup` and `cleanup`, which mirrors the methods of the same name in Java mappers.
190 191 192 193 194 195 196 197 |
# File 'lib/rubydoop/dsl.rb', line 190 def mapper(cls=nil) if cls @job.configuration.set(Rubydoop::MapperProxy::RUBY_CLASS_KEY, cls.name) @job.set_mapper_class(Rubydoop::MapperProxy) @mapper = cls end @mapper end |
#output(dir = nil, options = {}) ⇒ Object
Sets or gets the output path of the job.
Calls ‘setOutputFormatClass` on the Hadoop job and uses the static `setOutputPath` on the output format to set the job’s output path.
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
# File 'lib/rubydoop/dsl.rb', line 123 def output(dir=nil, ={}) if dir if dir.is_a?(Hash) = dir if [:intermediate] dir = @job.job_name else raise ArgumentError, sprintf('neither dir nor intermediate: true was specified') end end dir = sprintf('%s-%010d-%05d', dir, Time.now, rand(1e5)) if [:intermediate] @output_dir = dir format = .fetch(:format, :text) unless format.is_a?(Class) class_name = format.to_s.gsub(/^.|_./) {|x| x[-1,1].upcase } + "OutputFormat" format = Hadoop::Mapreduce::Lib::Output.const_get(class_name) end format.set_output_path(@job, Hadoop::Fs::Path.new(@output_dir)) @job.set_output_format_class(format) if [:lazy] Hadoop::Mapreduce::Lib::Output::LazyOutputFormat.set_output_format_class(@job, format) end end @output_dir end |
#output_key(cls) ⇒ Object
Sets the reducer’s output key type.
360 |
# File 'lib/rubydoop/dsl.rb', line 360 class_setter :output_key |
#partitioner(cls = nil) ⇒ Object Also known as: partitioner=
Sets a custom partitioner.
The equivalent of calling ‘setPartitionerClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.
The class must implement ‘partition`, which will be called exactly like a Java partitioner would.
259 260 261 262 263 264 265 266 |
# File 'lib/rubydoop/dsl.rb', line 259 def partitioner(cls=nil) if cls @job.configuration.set(Rubydoop::PartitionerProxy::RUBY_CLASS_KEY, cls.name) @job.set_partitioner_class(Rubydoop::PartitionerProxy) @partitioner = cls end @partitioner end |
#raw {|job| ... } ⇒ Object
If you need to manipulate the Hadoop job in some that isn’t covered by this DSL, this is the method for you. It yields the ‘Job`, letting you do whatever you want with it.
314 315 316 |
# File 'lib/rubydoop/dsl.rb', line 314 def raw(&block) yield @job end |
#reducer(cls = nil) ⇒ Object Also known as: reducer=
Sets the reducer class.
The equivalent of calling ‘setReducerClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.
The class only needs to implement the method ‘reduce`, which will be called exactly like a Java reducer class’ ‘reduce` method would be called.
You can optionally implement ‘setup` and `cleanup`, which mirrors the methods of the same name in Java reducers.
216 217 218 219 220 221 222 223 |
# File 'lib/rubydoop/dsl.rb', line 216 def reducer(cls=nil) if cls @job.configuration.set(Rubydoop::ReducerProxy::RUBY_CLASS_KEY, cls.name) @job.set_reducer_class(Rubydoop::ReducerProxy) @reducer = cls end @reducer end |
#set(property, value) ⇒ Object
Sets a job property.
Calls ‘set`/`setBoolean`/`setLong`/`setFloat` on the Hadoop Job’s configuration (exact method depends on the type of the value).
161 162 163 164 165 166 167 168 169 170 171 172 |
# File 'lib/rubydoop/dsl.rb', line 161 def set(property, value) case value when Integer @job.configuration.set_long(property, value) when Float @job.configuration.set_float(property, value) when true, false @job.configuration.set_boolean(property, value) else @job.configuration.set(property, value) end end |
#sort_comparator(cls = nil) ⇒ Object Also known as: sort_comparator=
Sets a custom sort comparator.
The equivalent of calling ‘setSortComparatorClass` on a Hadoop job, but instead of a Java class you pass a Ruby class and Rubydoop will wrap it in a way that works with Hadoop.
297 298 299 300 301 302 303 304 |
# File 'lib/rubydoop/dsl.rb', line 297 def sort_comparator(cls=nil) if cls @job.configuration.set(Rubydoop::SortComparatorProxy::RUBY_CLASS_KEY, cls.name) @job.set_sort_comparator_class(Rubydoop::SortComparatorProxy) @sort_comparator = cls end @sort_comparator end |