Module: Cascading::Operations
- Included in:
- Assembly
- Defined in:
- lib/cascading/operations.rb
Instance Method Summary collapse
-
#debug(options = {}) ⇒ Object
Debugs the current assembly at runtime, printing every tuple and fields every 10 tuples by default.
-
#insert(insert_map) ⇒ Object
Inserts new fields into the current assembly.
-
#null_indicator(input_field, into_field, options = {}) ⇒ Object
Efficient way of inserting a null indicator for any field, even one that cannot be coerced to a string.
-
#regex_contains(input_field, regex, into_field, options = {}) ⇒ Object
Given an input_field and a regex, returns an indicator that is 1 if the string contains at least 1 match and 0 otherwise.
-
#set_value(input_fields, filter, keep_value, remove_value, into_field, options = {}) ⇒ Object
Inserts one of two values into the dataflow based upon the result of the supplied filter on the input_fields.
-
#ungroup(key, into_fields, options = {}) ⇒ Object
Ungroups, or unpivots, a tuple (see Cascading’s UnGroup).
Instance Method Details
#debug(options = {}) ⇒ Object
Debugs the current assembly at runtime, printing every tuple and fields every 10 tuples by default.
The named options are:
- prefix
-
String to prefix prints with.
- print_fields
-
Boolean controlling field printing, defaults to false.
- tuple_interval
-
Integer specifying interval between printed tuples
- fields_interval
-
Integer specifying interval between printing fields
Example:
debug :prefix => 'DEBUG', :print_fields => true, :fields_interval => 1000
14 15 16 17 18 19 20 21 22 23 24 25 |
# File 'lib/cascading/operations.rb', line 14 def debug( = {}) input_fields = [:input] || all_fields prefix = [:prefix] print_fields = [:print_fields] debug = Java::CascadingOperation::Debug.new(*[prefix, print_fields].compact) debug.print_tuple_every = [:tuple_interval] || 1 debug.print_fields_every = [:fields_interval] || 10 each(input_fields, :filter => debug) end |
#insert(insert_map) ⇒ Object
Inserts new fields into the current assembly. Values may be constants or expressions (see Cascading::expr). Fields will be inserted in lexicographic order (not necessarily the order provided).
Example:
insert 'field1' => 'constant_string', 'field2' => 0, 'field3' => expr('fieldA:long + fieldB:long')
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
# File 'lib/cascading/operations.rb', line 33 def insert(insert_map) insert_map.keys.sort.each do |field_name| value = insert_map[field_name] if value.kind_of?(ExprStub) value.validate_scope(scope) names, types = value.names_and_types each( all_fields, :function => Java::CascadingOperationExpression::ExpressionFunction.new(fields(field_name), value.expression, names, types), :output => all_fields ) else # value is a constant each( all_fields, :function => Java::CascadingOperation::Insert.new(fields(field_name), to_java_comparable_array([value])), :output => all_fields ) end end end |
#null_indicator(input_field, into_field, options = {}) ⇒ Object
Efficient way of inserting a null indicator for any field, even one that cannot be coerced to a string. This is accomplished using Cascading’s FilterNull and SetValue operators rather than Janino. 1 is produced if the field is null and 0 otherwise.
Example:
null_indicator 'field1', 'is_field1_null'
102 103 104 |
# File 'lib/cascading/operations.rb', line 102 def null_indicator(input_field, into_field, = {}) set_value input_field, Java::CascadingOperationFilter::FilterNull.new, 1.to_java, 0.to_java, into_field, :output => [:output] end |
#regex_contains(input_field, regex, into_field, options = {}) ⇒ Object
Given an input_field and a regex, returns an indicator that is 1 if the string contains at least 1 match and 0 otherwise.
Example:
regex_contains 'field1', /\w+\s+\w+/, 'does_field1_contain_pair'
111 112 113 |
# File 'lib/cascading/operations.rb', line 111 def regex_contains(input_field, regex, into_field, = {}) set_value input_field, Java::CascadingOperationRegex::RegexFilter.new(pattern.to_s), 1.to_java, 0.to_java, into_field, :output => [:output] end |
#set_value(input_fields, filter, keep_value, remove_value, into_field, options = {}) ⇒ Object
Inserts one of two values into the dataflow based upon the result of the supplied filter on the input_fields. This is primarily useful for creating indicators from filters. keep_value specifies the Java value to produce when the filter would keep the given input and remove_value specifies the Java value to produce when the filter would remove the given input.
Example:
set_value 'field1', Java::CascadingOperationFilter::FilterNull.new, 1.to_java, 0.to_java, 'is_field1_null'
90 91 92 93 |
# File 'lib/cascading/operations.rb', line 90 def set_value(input_fields, filter, keep_value, remove_value, into_field, = {}) output = [:output] || all_fields each input_fields, :function => Java::CascadingOperationFunction::SetValue.new(fields(into_field), filter, keep_value, remove_value), :output => output end |
#ungroup(key, into_fields, options = {}) ⇒ Object
Ungroups, or unpivots, a tuple (see Cascading’s UnGroup).
You must provide exactly one of :value_selectors and :num_values.
The named options are:
- value_selectors
-
Array of field names to ungroup. Each field will be ungrouped into an output tuple along with the key fields in the order provided.
- num_values
-
Integer specifying the number of fields to ungroup into each output tuple (excluding the key fields). All input fields will be ungrouped.
Example:
ungroup 'key', ['new_key', 'val], :value_selectors => ['val1', 'val2', 'val3'], :output => ['new_key', 'val']
69 70 71 72 73 74 75 76 77 78 79 |
# File 'lib/cascading/operations.rb', line 69 def ungroup(key, into_fields, = {}) input_fields = [:input] || all_fields output = [:output] || all_fields raise 'You must provide exactly one of :value_selectors or :num_values to ungroup' unless .has_key?(:value_selectors) ^ .has_key?(:num_values) value_selectors = [:value_selectors].map{ |vs| fields(vs) }.to_java(Java::CascadingTuple::Fields) if .has_key?(:value_selectors) num_values = [:num_values] if .has_key?(:num_values) parameters = [fields(into_fields), fields(key), value_selectors, num_values].compact each input_fields, :function => Java::CascadingOperationFunction::UnGroup.new(*parameters), :output => output end |