Class: RightAws::EmrInterface
- Inherits:
-
RightAwsBase
- Object
- RightAwsBase
- RightAws::EmrInterface
- Includes:
- RightAwsBaseInterface
- Defined in:
- lib/emr/right_emr_interface.rb
Overview
RightAWS::EmrInterface – RightScale Amazon Elastic Map Reduce interface
The RightAws::EmrInterface class provides a complete interface to Amazon Elastic Map Reduce service.
For explanations of the semantics of each call, please refer to Amazon’s documentation at aws.amazon.com/documentation/elasticmapreduce/
Create an interface handle:
emr = RightAws::EmrInterface.new(aws_access_key_id, aws_secret_access_key)
Create a job flow:
emr.run_job_flow(
:name => 'job flow 1',
:master_instance_type => 'm1.large',
:slave_instance_type => 'm1.large',
:instance_count => 5,
:log_uri => 's3n://bucket/path/to/logs',
:steps => [{
:name => 'step 1',
:jar => 's3n://bucket/path/to/code.jar',
:main_class => 'com.foobar.emr.Step1',
:args => ['arg', 'arg'],
}]) #=> "j-9K18HM82Q0AE7"
Describe a job flow:
emr.describe_job_flows('j-9K18HM82Q0AE7') #=> {...}
Terminate a job flow:
emr.terminate_job_flows('j-9K18HM82Q0AE7') #=> true
Defined Under Namespace
Classes: AddInstanceGroupsParser, DescribeJobFlowsParser, RunJobFlowParser
Constant Summary collapse
- API_VERSION =
Amazon EMR API version being used
'2009-03-31'
- DEFAULT_HOST =
'elasticmapreduce.amazonaws.com'
- DEFAULT_PATH =
'/'
- DEFAULT_PROTOCOL =
'https'
- DEFAULT_PORT =
443
- EMR_INSTANCES_KEY_MAPPING =
Job Flows
{ # :nodoc: :additional_info => 'AdditionalInfo', :log_uri => 'LogUri', :name => 'Name', # JobFlowInstancesConfig :ec2_key_name => 'Instances.Ec2KeyName', :hadoop_version => 'Instances.HadoopVersion', :instance_count => 'Instances.InstanceCount', :keep_job_flow_alive_when_no_steps => 'Instances.KeepJobFlowAliveWhenNoSteps', :master_instance_type => 'Instances.MasterInstanceType', :slave_instance_type => 'Instances.SlaveInstanceType', :termination_protected => 'Instances.TerminationProtected', # PlacementType :availability_zone => 'Instances.Placement.AvailabilityZone', }
- BOOTSTRAP_ACTION_KEY_MAPPING =
:nodoc:
{ # :nodoc: :name => 'Name', # ScriptBootstrapActionConfig :args => 'ScriptBootstrapAction.Args', :path => 'ScriptBootstrapAction.Path', }
- INSTANCE_GROUP_KEY_MAPPING =
:nodoc:
{ # :nodoc: :bid_price => 'BidPrice', :instance_count => 'InstanceCount', :instance_role => 'InstanceRole', :instance_type => 'InstanceType', :market => 'Market', :name => 'Name', }
- STEP_CONFIG_KEY_MAPPING =
:nodoc:
{ # :nodoc: :action_on_failure => 'ActionOnFailure', :name => 'Name', # HadoopJarStepConfig :args => 'HadoopJarStep.Args', :jar => 'HadoopJarStep.Jar', :main_class => 'HadoopJarStep.MainClass', :properties => 'HadoopJarStep.Properties', }
- KEY_VALUE_KEY_MAPPINGS =
{ :key => 'Key', :value => 'Value', }
- MODIFY_INSTANCE_GROUP_KEY_MAPPINGS =
{ :instance_group_id => 'InstanceGroupId', :instance_count => 'InstanceCount', }
- @@bench =
AwsBenchmarkingBlock.new
Constants included from RightAwsBaseInterface
RightAwsBaseInterface::BLOCK_DEVICE_KEY_MAPPING, RightAwsBaseInterface::DEFAULT_SIGNATURE_VERSION
Constants inherited from RightAwsBase
RightAwsBase::AMAZON_PROBLEMS, RightAwsBase::RAISE_ON_TIMEOUT_ON_ACTIONS
Instance Attribute Summary
Attributes included from RightAwsBaseInterface
#aws_access_key_id, #aws_secret_access_key, #cache, #connection, #last_errors, #last_request, #last_request_id, #last_response, #logger, #params, #signature_version
Class Method Summary collapse
Instance Method Summary collapse
-
#add_instance_groups(job_flow_id, *instance_groups) ⇒ Object
Adds instance groups to a running job flow.
-
#add_job_flow_steps(job_flow_id, *steps) ⇒ Object
Adds steps to a running job flow.
-
#describe_job_flows(*job_flow_ids_and_options) ⇒ Object
Returns a list of job flows that match all of supplied parameters.
-
#generate_request(action, params = {}) ⇒ Object
:nodoc:.
-
#initialize(aws_access_key_id = nil, aws_secret_access_key = nil, params = {}) ⇒ EmrInterface
constructor
Create a new handle to a EMR service.
-
#modify_instance_groups(*args) ⇒ Object
Modifies instance groups.
-
#request_info(request, parser) ⇒ Object
Sends request to Amazon and parses the response Raises AwsError if any banana happened.
-
#run_job_flow(options = {}) ⇒ Object
Creates and starts running a new job flow.
-
#set_termination_protection(*job_flow_ids_and_options) ⇒ Object
Locks a job flow so the EC2 instances in the cluster cannot be terminated by user intervention, an API call, or in the event of a job flow error.
-
#terminate_job_flows(*job_flow_ids) ⇒ Object
Terminates specified job flows.
Methods included from RightAwsBaseInterface
#amazonize_block_device_mappings, #amazonize_hash_with_key_mapping, #amazonize_list, #amazonize_list_with_key_mapping, #cache_hits?, caching, caching=, #caching?, #destroy_connection, #generate_request_impl, #get_connection, #get_connections_storage, #get_server_url, #incrementally_list_items, #init, #map_api_keys_and_values, #on_exception, #request_cache_or_info, #request_info_impl, #signed_service_params, #update_cache, #with_connection_options
Methods inherited from RightAwsBase
amazon_problems, amazon_problems=, raise_on_timeout_on_actions, raise_on_timeout_on_actions=
Constructor Details
#initialize(aws_access_key_id = nil, aws_secret_access_key = nil, params = {}) ⇒ EmrInterface
Create a new handle to a EMR service.
All handles share the same per process or per thread HTTP connection to EMR. Each handle is for a specific account. The params have the following options:
-
:endpoint_url
a fully qualified url to Amazon API endpoint (this overwrites: :server, :port, :service, :protocol). Example: ‘elasticmapreduce.amazonaws.com’ -
:server
: EMR service host, default: DEFAULT_HOST -
:port
: EMR service port, default: DEFAULT_PORT -
:protocol
: ‘http’ or ‘https’, default: DEFAULT_PROTOCOL -
:logger
: for log messages, default: RAILS_DEFAULT_LOGGER else STDOUT
emr = RightAws::EmrInterface.new('xxxxxxxxxxxxxxxxxxxxx','xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
{:logger => Logger.new('/tmp/x.log')}) #=> #<RightAws::EmrInterface::0xb7b3c30c>
97 98 99 100 101 102 103 104 105 106 107 |
# File 'lib/emr/right_emr_interface.rb', line 97 def initialize(aws_access_key_id=nil, aws_secret_access_key=nil, params={}) init({ :name => 'EMR', :default_host => ENV['EMR_URL'] ? URI.parse(ENV['EMR_URL']).host : DEFAULT_HOST, :default_port => ENV['EMR_URL'] ? URI.parse(ENV['EMR_URL']).port : DEFAULT_PORT, :default_service => ENV['EMR_URL'] ? URI.parse(ENV['EMR_URL']).path : DEFAULT_PATH, :default_protocol => ENV['EMR_URL'] ? URI.parse(ENV['EMR_URL']).scheme : DEFAULT_PROTOCOL, :default_api_version => ENV['EMR_API_VERSION'] || API_VERSION }, aws_access_key_id || ENV['AWS_ACCESS_KEY_ID'] , aws_secret_access_key|| ENV['AWS_SECRET_ACCESS_KEY'], params) end |
Class Method Details
.bench_service ⇒ Object
76 77 78 |
# File 'lib/emr/right_emr_interface.rb', line 76 def self.bench_service @@bench.service end |
.bench_xml ⇒ Object
73 74 75 |
# File 'lib/emr/right_emr_interface.rb', line 73 def self.bench_xml @@bench.xml end |
Instance Method Details
#add_instance_groups(job_flow_id, *instance_groups) ⇒ Object
Adds instance groups to a running job flow.
Instance group configuration options are the same as the ones accepted by run_job_flow.
Only task instance groups may be added at runtime. Instance groups cannot be added to job flows that have only a master instance (i.e. 1 instance in total).
emr.add_instance_groups('j-2QE0KHA1LP4GS', {
:bid_price => '0.1',
:instance_count => '2',
:instance_role => 'TASK',
:instance_type => 'm1.small',
:market => 'SPOT',
:name => 'core group',
}) #=> true
459 460 461 462 463 464 465 466 |
# File 'lib/emr/right_emr_interface.rb', line 459 def add_instance_groups(job_flow_id, *instance_groups) request_hash = amazonize_instance_groups(instance_groups, 'InstanceGroups') request_hash['JobFlowId'] = job_flow_id link = generate_request("AddInstanceGroups", request_hash) request_info(link, AddInstanceGroupsParser.new(:logger => @logger)) rescue on_exception end |
#add_job_flow_steps(job_flow_id, *steps) ⇒ Object
Adds steps to a running job flow.
A maximum of 256 steps are allowed in a job flow. Steps can only be added to job flows that are starting, bootstrapping, running or waiting.
Step configuration options are the same as the ones accepted by run_job_flow.
emr.add_job_flow_steps('j-2QE0KHA1LP4GS', {
:name => 'step 1',
:jar => 's3n://bucket/path/to/code.jar',
:main_class => 'com.foobar.emr.Step1',
:args => ['arg', 'arg'],
:properties => {
'property' => 'value',
},
:action_on_failure => 'TERMINATE_JOB_FLOW',
}) #=> true
428 429 430 431 432 433 434 435 |
# File 'lib/emr/right_emr_interface.rb', line 428 def add_job_flow_steps(job_flow_id, *steps) request_hash = amazonize_steps(steps) request_hash['JobFlowId'] = job_flow_id link = generate_request("AddJobFlowSteps", request_hash) request_info(link, RightHttp2xxParser.new(:logger => @logger)) rescue on_exception end |
#describe_job_flows(*job_flow_ids_and_options) ⇒ Object
Returns a list of job flows that match all of supplied parameters.
Without parameters, returns job flows started in the last two weeks or running job flows started in the last two months.
Regardless of parameters, only jobs started in the last two months are returned.
# default list:
emr.describe_job_flows #=> [
{:keep_job_flow_alive_when_no_steps=>false,
:log_uri=>"s3n://bucket/path/to/logs",
:master_instance_type=>"m1.small",
:availability_zone=>"us-east-1d",
:last_state_change_reason=>"Steps completed",
:termination_protected=>false,
:master_instance_id=>"i-1fe51278",
:instance_count=>1,
:ready_date_time=>"2011-08-31T18:58:58Z",
:bootstrap_actions=>[],
:master_public_dns_name=>"ec2-184-78-29-127.compute-1.amazonaws.com",
:instance_groups=>
[{:instance_request_count=>1,
:last_state_change_reason=>"Job flow terminated",
:instance_role=>"MASTER",
:ready_date_time=>"2011-08-31T18:58:56Z",
:instance_running_count=>0,
:start_date_time=>"2011-08-31T18:58:19Z",
:market=>"ON_DEMAND",
:creation_date_time=>"2011-08-31T18:55:36Z",
:name=>"master",
:instance_group_id=>"ig-1D91GQR7A9H2K",
:state=>"ENDED",
:instance_type=>"m1.small",
:end_date_time=>"2011-08-31T19:01:09Z"}],
:start_date_time=>"2011-08-31T18:58:58Z",
:steps=>
[{:jar=>"s3n://bucket/path/to/code.jar",
:main_class=>"com.foobar.emr.Step1",
:start_date_time=>"2011-08-31T18:58:58Z",
:properties=>{},
:args=>[],
:creation_date_time=>"2011-08-31T18:55:36Z",
:action_on_failure=>"TERMINATE_JOB_FLOW",
:name=>"step 1",
:state=>"COMPLETED",
:end_date_time=>"2011-08-31T19:00:34Z"}],
:normalized_instance_hours=>1,
:ami_version=>"1.0",
:creation_date_time=>"2011-08-31T18:55:36Z",
:name=>"jobflow 1",
:hadoop_version=>"0.18",
:job_flow_id=>"j-9K18HM82Q0AE7",
:state=>"COMPLETED",
:end_date_time=>"2011-08-31T19:01:09Z"}]
# describe specific job flows:
emr.describe_job_flows('j-9K18HM82Q0AE7', 'j-2QE0KHA1LP4GS') #=> [...]
# specify parameters:
emr.describe_job_flows(
:created_after => Time.now - 86400,
:created_before => Time.now - 3600,
:job_flow_ids => ['j-9K18HM82Q0AE7', 'j-2QE0KHA1LP4GS'],
:job_flow_states => ['RUNNING']
) #=> [...]
# combined job flow list and parameters syntax:
emr.describe_job_flows('j-9K18HM82Q0AE7', 'j-2QE0KHA1LP4GS',
:job_flow_states => ['RUNNING']
) #=> [...]
326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 |
# File 'lib/emr/right_emr_interface.rb', line 326 def describe_job_flows(*) job_flow_ids, = AwsUtils::split_items_and_params() # merge job flow ids passed in as arguments and in options unless job_flow_ids.empty? # do not modify passed in options = .dup if = [:job_flow_ids] # allow the same ids to be passed in either location; # remove duplicates [:job_flow_ids] = ( + job_flow_ids).uniq else [:job_flow_ids] = job_flow_ids end end request_hash = {} unless (job_flow_ids = [:job_flow_ids]).right_blank? request_hash.update(amazonize_list("JobFlowIds.member", job_flow_ids)) end unless (job_flow_states = [:job_flow_states]).right_blank? request_hash = amazonize_list("JobFlowStates.member", job_flow_states) end request_hash['CreatedAfter'] = AwsUtils::utc_iso8601([:created_after]) unless [:created_after].right_blank? request_hash['CreatedBefore'] = AwsUtils::utc_iso8601([:created_before]) unless [:created_before].right_blank? link = generate_request("DescribeJobFlows", request_hash) request_cache_or_info(:describe_job_flows, link, DescribeJobFlowsParser, @@bench, nil) rescue on_exception end |
#generate_request(action, params = {}) ⇒ Object
:nodoc:
109 110 111 |
# File 'lib/emr/right_emr_interface.rb', line 109 def generate_request(action, params={}) #:nodoc: generate_request_impl(:get, action, params ) end |
#modify_instance_groups(*args) ⇒ Object
Modifies instance groups.
The only modifiable parameter is instance count.
An instance group may only be modified when the job flow is running or waiting. Additionally, hadoop 0.20 is required to resize job flows.
# general syntax
emr.modify_instance_groups(
{:instance_group_id => 'ig-P2OPM2L9ZQ4P', :instance_count => 5},
{:instance_group_id => 'ig-J82ML0M94A7E', :instance_count => 1}
) #=> true
# shortcut syntax
emr.modify_instance_groups('ig-P2OPM2L9ZQ4P', 5) #=> true
Shortcut syntax supports modifying only one instance group at a time.
491 492 493 494 495 496 497 498 499 500 501 502 503 |
# File 'lib/emr/right_emr_interface.rb', line 491 def modify_instance_groups(*args) unless args.first.is_a?(Hash) if args.length != 2 raise ArgumentError, "Must be given two arguments if arguments are not hashes" end args = [{:instance_group_id => args.first, :instance_count => args.last}] end request_hash = amazonize_list_with_key_mapping('InstanceGroups.member', MODIFY_INSTANCE_GROUP_KEY_MAPPINGS, args) link = generate_request("ModifyInstanceGroups", request_hash) request_info(link, RightHttp2xxParser.new(:logger => @logger)) rescue on_exception end |
#request_info(request, parser) ⇒ Object
Sends request to Amazon and parses the response Raises AwsError if any banana happened
115 116 117 |
# File 'lib/emr/right_emr_interface.rb', line 115 def request_info(request, parser) #:nodoc: request_info_impl(:emr_connection, @@bench, request, parser) end |
#run_job_flow(options = {}) ⇒ Object
Creates and starts running a new job flow.
The job flow will run the steps specified and terminate (unless keep alive option is set).
A maximum of 256 steps are allowed in a job flow.
At least the name, instance types, instance count and one step must be specified.
# simple usage:
emr.run_job_flow(
:name => 'job flow 1',
:master_instance_type => 'm1.large',
:slave_instance_type => 'm1.large',
:instance_count => 5,
:log_uri => 's3n://bucket/path/to/logs',
:steps => [{
:name => 'step 1',
:jar => 's3n://bucket/path/to/code.jar',
:main_class => 'com.foobar.emr.Step1',
:args => ['arg', 'arg'],
}]) #=> "j-9K18HM82Q0AE7"
# advanced usage:
emr.run_job_flow(
:name => 'job flow 1',
:ec2_key_name => 'gsg-keypair',
:hadoop_version => '0.20',
:instance_groups => [{
:bid_price => '0.1',
:instance_count => '1',
:instance_role => 'MASTER',
:instance_type => 'm1.small',
:market => 'SPOT',
:name => 'master group',
}, {
:bid_price => '0.1',
:instance_count => '2',
:instance_role => 'CORE',
:instance_type => 'm1.small',
:market => 'SPOT',
:name => 'core group',
}, {
:bid_price => '0.1',
:instance_count => '2',
:instance_role => 'TASK',
:instance_type => 'm1.small',
:market => 'SPOT',
:name => 'task group',
}],
:keep_job_flow_alive_when_no_steps => true,
:availability_zone => 'us-east-1a',
:termination_protected => true,
:log_uri => 's3n://bucket/path/to/logs',
:steps => [{
:name => 'step 1',
:jar => 's3n://bucket/path/to/code.jar',
:main_class => 'com.foobar.emr.Step1',
:args => ['arg', 'arg'],
:properties => {
'property' => 'value',
},
:action_on_failure => 'TERMINATE_JOB_FLOW',
}],
:additional_info => '',
:bootstrap_actions => [{
:name => 'bootstrap action 1',
:path => 's3n://bucket/path/to/bootstrap',
:args => ['hello', 'world'],
}],
) #=> "j-9K18HM82Q0AE7"
243 244 245 246 247 248 249 250 251 252 |
# File 'lib/emr/right_emr_interface.rb', line 243 def run_job_flow(={}) request_hash = amazonize_run_job_flow() request_hash.update(amazonize_bootstrap_actions([:bootstrap_actions])) request_hash.update(amazonize_instance_groups([:instance_groups])) request_hash.update(amazonize_steps([:steps])) link = generate_request("RunJobFlow", request_hash) request_info(link, RunJobFlowParser.new(:logger => @logger)) rescue on_exception end |
#set_termination_protection(*job_flow_ids_and_options) ⇒ Object
Locks a job flow so the EC2 instances in the cluster cannot be terminated by user intervention, an API call, or in the event of a job flow error. Cluster will still terminate upon successful completion of the job flow.
emr.set_termination_protection(
'j-9K18HM82Q0AE7', 'j-2QE0KHA1LP4GS', :termination_protected => true
) #=> true
Protection can be enabled using the shortcut syntax:
emr.set_termination_protection('j-9K18HM82Q0AE7') #=> true
379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 |
# File 'lib/emr/right_emr_interface.rb', line 379 def set_termination_protection(*) job_flow_ids, = AwsUtils::split_items_and_params() request_hash = amazonize_list('JobFlowIds.member', job_flow_ids) request_hash['TerminationProtected'] = case value = [:termination_protected] when true 'true' when false 'false' when nil # if :termination_protected => nil was given, then unprotect; # if no :termination_protected option was given, protect if .has_key?(:termination_protected) 'false' else 'true' end else # pass value through value end link = generate_request("SetTerminationProtection", request_hash) request_info(link, RightHttp2xxParser.new(:logger => @logger)) rescue on_exception end |
#terminate_job_flows(*job_flow_ids) ⇒ Object
Terminates specified job flows.
emr.terminate_job_flows('j-9K18HM82Q0AE7') #=> true
359 360 361 362 363 364 |
# File 'lib/emr/right_emr_interface.rb', line 359 def terminate_job_flows(*job_flow_ids) link = generate_request("TerminateJobFlows", amazonize_list('JobFlowIds.member', job_flow_ids)) request_info(link, RightHttp2xxParser.new(:logger => @logger)) rescue on_exception end |