Module: Cascading::FilterOperations
- Included in:
- Assembly
- Defined in:
- lib/cascading/filter_operations.rb
Overview
Module of filtering operations. Unlike some of the other functional operations modules, this one does not just wrap operations defined by Cascading in cascading.operation.filter. Instead, it provides some useful high-level DSL pipes which map many Cascading operations into a smaller number of DSL statements.
Still, some are direct wrappers:
- filter_null
- filter_not_null
Instance Method Summary collapse
-
#filter(options = {}) ⇒ Object
Filter the current assembly based on an expression or regex, but not both.
-
#filter_not_null(*input_fields) ⇒ Object
(also: #where_null)
Rejects tuples from the current assembly if any input field is not null.
-
#filter_null(*input_fields) ⇒ Object
(also: #reject_null)
Rejects tuples from the current assembly if any input field is null.
-
#reject(expression, options = {}) ⇒ Object
Rejects tuples from the current assembly based on a Janino expression.
-
#where(expression, options = {}) ⇒ Object
Keeps tuples from the current assembly based on a Janino expression.
Instance Method Details
#filter(options = {}) ⇒ Object
Filter the current assembly based on an expression or regex, but not both.
The named options are:
- expression
-
A Janino expression used to filter. Has access to all :input fields.
- validate
-
Boolean passed to Cascading#expr to enable or disable expression validation. Defaults to true.
- validate_with
-
Hash mapping field names to actual arguments used by Cascading#expr for expression validation. Defaults to {}.
- regex
-
A regular expression used to filter.
- remove_match
-
Boolean indicating if regex matches should be removed or kept. Defaults to false, which is a bit counterintuitive.
- match_each_element
-
Boolean indicating if regex should match entire incoming tuple (joined with tabs) or each field individually. Defaults to false.
Example:
filter :input => 'field1', :regex => /\t/, :remove_match => true
filter :expression => 'field1:long > 0 && "".equals(field2:string)'
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
# File 'lib/cascading/filter_operations.rb', line 31 def filter( = {}) input_fields = [:input] || all_fields expression = [:expression] regex = [:regex] if expression validate = .has_key?(:validate) ? [:validate] : true validate_with = [:validate_with] || {} stub = expr(expression, { :validate => validate, :validate_with => validate_with }) stub.validate_scope(scope) names, types = stub.names_and_types each input_fields, :filter => Java::CascadingOperationExpression::ExpressionFilter.new( stub.expression, names, types ) elsif regex parameters = [regex.to_s, [:remove_match], [:match_each_element]].compact each input_fields, :filter => Java::CascadingOperationRegex::RegexFilter.new(*parameters) else raise 'filter requires one of :expression or :regex' end end |
#filter_not_null(*input_fields) ⇒ Object Also known as: where_null
Rejects tuples from the current assembly if any input field is not null.
Example:
filter_not_null 'field1', 'field2'
96 97 98 |
# File 'lib/cascading/filter_operations.rb', line 96 def filter_not_null(*input_fields) each(input_fields, :filter => Java::CascadingOperationFilter::FilterNotNull.new) end |
#filter_null(*input_fields) ⇒ Object Also known as: reject_null
Rejects tuples from the current assembly if any input field is null.
Example:
filter_null 'field1', 'field2'
87 88 89 |
# File 'lib/cascading/filter_operations.rb', line 87 def filter_null(*input_fields) each(input_fields, :filter => Java::CascadingOperationFilter::FilterNull.new) end |
#reject(expression, options = {}) ⇒ Object
Rejects tuples from the current assembly based on a Janino expression. This is just a wrapper for FilterOperations.filter.
Example:
reject 'field1:long > 0 && "".equals(field2:string)'
62 63 64 65 |
# File 'lib/cascading/filter_operations.rb', line 62 def reject(expression, = {}) [:expression] = expression filter() end |
#where(expression, options = {}) ⇒ Object
Keeps tuples from the current assembly based on a Janino expression. This is a wrapper for FilterOperations.filter.
Note that this is accomplished by inverting the given expression, and best attempt is made to support import statements prior to the expression. If this support should break, simply negate your expression and use FilterOperations.reject.
Example:
where 'field1:long > 0 && "".equals(field2:string)'
77 78 79 80 81 |
# File 'lib/cascading/filter_operations.rb', line 77 def where(expression, = {}) _, imports, expr = expression.match(/^((?:\s*import.*;\s*)*)(.*)$/).to_a [:expression] = "#{imports}!(#{expr})" filter() end |