Module: Rubydoop

Defined in:
lib/rubydoop.rb,
lib/rubydoop/dsl.rb,
lib/rubydoop/package.rb,
lib/rubydoop/version.rb

Overview

See Rubydoop.configure for the job configuration DSL documentation, Package for the packaging documentation, or the README for a getting started guide.

Defined Under Namespace

Classes: ConfigurationDefinition, Context, JobDefinition, Package

Constant Summary collapse

VERSION =
'1.1.3'

Class Method Summary collapse

Class Method Details

.configure(impl = ConfigurationDefinition) {|*arguments| ... } ⇒ Object

Note:

The tool runner will set the global variable ‘$rubydoop_context` to an object that contains references to the necessary Hadoop configuration. Unless this global variable is set the configuration block is not run (this is a feature, it means that the configuration block doesn’t run in mappers and reducers).

Main entrypoint into the configuration DSL.

Within a configure block you can specify one or more jobs, the job blocks are run in the context of a JobDefinition instance, so look at that class for documentation about the available properties. The configure block is run within the context of a ConfigurationDefinition instance. The arguments to the configure block is the command line arguments, minus those handled by Hadoop’s ToolRunner.

Examples:

Configuring a job


Rubydoop.configure do |*args|
  job 'word_count' do
    input args[0]
    output args[1]

    mapper WordCount::Mapper
    reducer WordCount::Mapper

    output_key Hadoop::Io::Text
    output_value Hadoop::Io::IntWritable
  end
end

Yield Parameters:

  • *arguments (Array<String>)

    The command line arguments



36
37
38
# File 'lib/rubydoop/dsl.rb', line 36

def self.configure(impl=ConfigurationDefinition, &block)
  impl.new($rubydoop_context, &block) if $rubydoop_context
end

.create_combiner(conf) ⇒ Object



24
25
26
# File 'lib/rubydoop.rb', line 24

def self.create_combiner(conf)
  create_instance(conf.get(COMBINER_KEY))
end

.create_grouping_comparator(conf) ⇒ Object



34
35
36
# File 'lib/rubydoop.rb', line 34

def self.create_grouping_comparator(conf)
  create_instance(conf.get(GROUPING_COMPARATOR_KEY))
end

.create_mapper(conf) ⇒ Object



14
15
16
# File 'lib/rubydoop.rb', line 14

def self.create_mapper(conf)
  create_instance(conf.get(MAPPER_KEY))
end

.create_partitioner(conf) ⇒ Object



29
30
31
# File 'lib/rubydoop.rb', line 29

def self.create_partitioner(conf)
  create_instance(conf.get(PARTITIONER_KEY))
end

.create_reducer(conf) ⇒ Object



19
20
21
# File 'lib/rubydoop.rb', line 19

def self.create_reducer(conf)
  create_instance(conf.get(REDUCER_KEY))
end

.create_sort_comparator(conf) ⇒ Object



39
40
41
# File 'lib/rubydoop.rb', line 39

def self.create_sort_comparator(conf)
  create_instance(conf.get(SORT_COMPARATOR_KEY))
end