Module: Cascading::Operations

Included in:
Assembly
Defined in:
lib/cascading/operations.rb

Instance Method Summary collapse

Instance Method Details

#debug(options = {}) ⇒ Object

Debugs the current assembly at runtime, printing every tuple and fields every 10 tuples by default.

The named options are:

prefix

String to prefix prints with.

print_fields

Boolean controlling field printing, defaults to false.

tuple_interval

Integer specifying interval between printed tuples

fields_interval

Integer specifying interval between printing fields

Example:

debug :prefix => 'DEBUG', :print_fields => true, :fields_interval => 1000


14
15
16
17
18
19
20
21
22
23
24
25
# File 'lib/cascading/operations.rb', line 14

def debug(options = {})
  input_fields = options[:input] || all_fields
  prefix = options[:prefix]
  print_fields = options[:print_fields]

  debug = Java::CascadingOperation::Debug.new(*[prefix, print_fields].compact)

  debug.print_tuple_every = options[:tuple_interval] || 1
  debug.print_fields_every = options[:fields_interval] || 10

  each(input_fields, :filter => debug)
end

#insert(insert_map) ⇒ Object

Inserts new fields into the current assembly. Values may be constants or expressions (see Cascading::expr). Fields will be inserted in lexicographic order (not necessarily the order provided).

Example:

insert 'field1' => 'constant_string', 'field2' => 0, 'field3' => expr('fieldA:long + fieldB:long')


33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# File 'lib/cascading/operations.rb', line 33

def insert(insert_map)
  insert_map.keys.sort.each do |field_name|
    value = insert_map[field_name]

    if value.kind_of?(ExprStub)
      value.validate_scope(scope)
      names, types = value.names_and_types
      each(
        all_fields,
        :function => Java::CascadingOperationExpression::ExpressionFunction.new(fields(field_name), value.expression, names, types),
        :output => all_fields
      )
    else # value is a constant
      each(
        all_fields,
        :function => Java::CascadingOperation::Insert.new(fields(field_name), to_java_comparable_array([value])),
        :output => all_fields
      )
    end
  end
end

#null_indicator(input_field, into_field, options = {}) ⇒ Object

Efficient way of inserting a null indicator for any field, even one that cannot be coerced to a string. This is accomplished using Cascading’s FilterNull and SetValue operators rather than Janino. 1 is produced if the field is null and 0 otherwise.

Example:

null_indicator 'field1', 'is_field1_null'


102
103
104
# File 'lib/cascading/operations.rb', line 102

def null_indicator(input_field, into_field, options = {})
  set_value input_field, Java::CascadingOperationFilter::FilterNull.new, 1.to_java, 0.to_java, into_field, :output => options[:output]
end

#regex_contains(input_field, regex, into_field, options = {}) ⇒ Object

Given an input_field and a regex, returns an indicator that is 1 if the string contains at least 1 match and 0 otherwise.

Example:

regex_contains 'field1', /\w+\s+\w+/, 'does_field1_contain_pair'


111
112
113
# File 'lib/cascading/operations.rb', line 111

def regex_contains(input_field, regex, into_field, options = {})
  set_value input_field, Java::CascadingOperationRegex::RegexFilter.new(pattern.to_s), 1.to_java, 0.to_java, into_field, :output => options[:output]
end

#set_value(input_fields, filter, keep_value, remove_value, into_field, options = {}) ⇒ Object

Inserts one of two values into the dataflow based upon the result of the supplied filter on the input_fields. This is primarily useful for creating indicators from filters. keep_value specifies the Java value to produce when the filter would keep the given input and remove_value specifies the Java value to produce when the filter would remove the given input.

Example:

set_value 'field1', Java::CascadingOperationFilter::FilterNull.new, 1.to_java, 0.to_java, 'is_field1_null'


90
91
92
93
# File 'lib/cascading/operations.rb', line 90

def set_value(input_fields, filter, keep_value, remove_value, into_field, options = {})
  output = options[:output] || all_fields
  each input_fields, :function => Java::CascadingOperationFunction::SetValue.new(fields(into_field), filter, keep_value, remove_value), :output => output
end

#ungroup(key, into_fields, options = {}) ⇒ Object

Ungroups, or unpivots, a tuple (see Cascading’s UnGroup).

You must provide exactly one of :value_selectors and :num_values.

The named options are:

value_selectors

Array of field names to ungroup. Each field will be ungrouped into an output tuple along with the key fields in the order provided.

num_values

Integer specifying the number of fields to ungroup into each output tuple (excluding the key fields). All input fields will be ungrouped.

Example:

ungroup 'key', ['new_key', 'val], :value_selectors => ['val1', 'val2', 'val3'], :output => ['new_key', 'val']


69
70
71
72
73
74
75
76
77
78
79
# File 'lib/cascading/operations.rb', line 69

def ungroup(key, into_fields, options = {})
  input_fields = options[:input] || all_fields
  output = options[:output] || all_fields

  raise 'You must provide exactly one of :value_selectors or :num_values to ungroup' unless options.has_key?(:value_selectors) ^ options.has_key?(:num_values)
  value_selectors = options[:value_selectors].map{ |vs| fields(vs) }.to_java(Java::CascadingTuple::Fields) if options.has_key?(:value_selectors)
  num_values = options[:num_values] if options.has_key?(:num_values)

  parameters = [fields(into_fields), fields(key), value_selectors, num_values].compact
  each input_fields, :function => Java::CascadingOperationFunction::UnGroup.new(*parameters), :output => output
end