Class: Jinx::Migration::Filter

Inherits:
Object
  • Object
show all
Defined in:
lib/jinx/migration/filter.rb

Overview

Transforms input values to a result based on a migration filter configuration. Each configuration entry is one of the following:

* literal: literal
* regexp: literal
* regexp: template

The regexp template can include match references (+$1+, $2, etc.) corresponding to the regexp captures. If the input value equals a literal, then the mapped literal is returned. Otherwise, if the input value matches a regexp, then the mapped transformation is returned after reference substitution. Otherwise, the input value is returned unchanged.

For example, the config:

/(\d{1,2})\/x\/(\d{1,2})/ : $1/15/$2
n/a : ~

converts the input value as follows:

3/12/02 => 3/12/02 (no match)
5/x/04 => 5/15/04
n/a => nil

A catch-all /.*/ regexp transforms any value which does not match another value or regexp, e.g.:

/^(\d+(\.\d*)?)( g(ram)?s?)?$/ : $1 
/.*/ : 0

converts the input value as follows:

3 => 3
4.3 grams => 4.3
unknown => 0

Constant Summary collapse

REGEXP_PAT =

The pattern to match a regular expression with captures.

/^\/(.*[^\\])\/([inx]+)?$/

Instance Method Summary collapse

Constructor Details

#initialize(spec = nil) {|value| ... } ⇒ Filter

Builds the filter proc from the given specification or block. If both a specification and a block are given, then the block is applied before the specificiation.

Parameters:

  • spec (String) (defaults to: nil)

    the filter configuration specification.

Yields:

  • (value)

    converts the input field value into a caTissue property value

Yield Parameters:

  • value

    the CSV input value

Raises:

  • (ArgumentError)


39
40
41
42
# File 'lib/jinx/migration/filter.rb', line 39

def initialize(spec=nil, &block)
  @proc = spec ? to_proc(spec, &block) : block
  raise ArgumentError.new("Migration filter is missing both a specification and a block") if @proc.nil?
end

Instance Method Details

#parse_regexp_value(value) ⇒ Object, <Integer> (private)

Returns the parsed (value, indexes).

Examples:

parse_regexp_value('Grade $2') #=> ['Grade %s', [1]]

Parameters:

  • value

    the value in the configuration regexp => value entry

Returns:

  • (Object, <Integer>)

    the parsed (value, indexes)

See Also:



158
159
160
161
162
163
164
# File 'lib/jinx/migration/filter.rb', line 158

def parse_regexp_value(value)
  return [value, Array::EMPTY_ARRAY] unless value =~ /\$\d/
  tmpl = value.gsub(/\$\d/, '%s')
  # Look for match references of the form $n.
  ndxs = value.scan(/\$(\d)/).map { |matches| matches.first.to_i - 1 }
  [tmpl, ndxs]
end

#regexp_hash(pat_hash) ⇒ {Regexp => (Object, <Integer>)} (private)

Parses the configuration pattern string => value hash into a regexp => value hash qualified by the match indexes used to substitute match captures into the hash value.

The pattern hash value can include match references ($1, $2, etc.). In that case, the match captures substitute into a %s format reference in the result.

Examples:

regexp_hash({'/Golf/i' => 1}) #=> {1, []}
regexp_hash({'/Hole (\d{1,2})/' => $1}) #=> {'%', [0]}

Parameters:

  • pat_hash ({String => Object})

    the string => value hash

Returns:

  • ({Regexp => (Object, <Integer>)})

    the corresponding regexp => (value, indexes) hash



126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
# File 'lib/jinx/migration/filter.rb', line 126

def regexp_hash(pat_hash)
  # The Regexp => value hash is built from the pattern => value hash.
  reh = {}
  # Make a matcher for each regexp pattern.
  pat_hash.each do |k, v|
    # The /pattern/opts string is parsed to the pattern and options.
    pat, opt = REGEXP_PAT.match(k).captures
    # the catch-all matcher
    if pat == '.*' then
      @catch_all = v
      next
    end
    # Convert the regexp i option character to a Regexp initializer parameter.
    reopt = if opt then
      case opt
        when 'i' then Regexp::IGNORECASE
        else raise MigrationError.new("Migration value filter regular expression #{k} qualifier not supported: expected 'i', found '#{opt}'")
      end
    end
    # the Regexp object
    re = Regexp.new(pat, reopt)
    # Replace each $ match reference with a %s format reference.
    reh[re] = parse_regexp_value(v)
  end
  reh
end

#to_proc(spec = nil) {|value| ... } ⇒ Proc (private)

Builds the filter proc from the given specification.

If both a specification and a block are given, then the block is applied before the specificiation.

Parameters:

  • spec (String) (defaults to: nil)

    the filter configuration specification.

Yields:

  • (value)

    converts the input field value into a caTissue property value

Yield Parameters:

  • value

    the CSV input value

Returns:

  • (Proc)

    a proc which convert the input field value into a caTissue property value



64
65
66
67
68
69
70
71
# File 'lib/jinx/migration/filter.rb', line 64

def to_proc(spec=nil, &block)
  # Split the filter spec into a straight value => value hash and a pattern => value hash.
  ph, vh = spec.split { |k, v| k =~ REGEXP_PAT }
  # The Regexp => value hash is built from the pattern => value hash.
  reh = regexp_hash(ph)
  # The value proc.
  value_proc(reh, vh)
end

#transform(value) ⇒ Object

Returns the transformed result.

Parameters:

  • value (String)

    the input string

Returns:

  • the transformed result



46
47
48
# File 'lib/jinx/migration/filter.rb', line 46

def transform(value)
  @proc.call(value)
end

#value_proc(regexp_hash, value_hash) {|value| ... } ⇒ Proc (private)

Returns a proc which convert the input field value into a caTissue property value.

Parameters:

  • regexp_hash (Regexp => (Object, <Integer>))

    the regexp => (result, indexes) hash

  • value_hash (String => Object)

    the value => result hash

Yields:

  • (value)

    converts the input field value into a caTissue property value

Yield Parameters:

  • value

    the CSV input value

Returns:

  • (Proc)

    a proc which convert the input field value into a caTissue property value



78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
# File 'lib/jinx/migration/filter.rb', line 78

def value_proc(regexp_hash, value_hash)
  # The new proc matches preferentially on the literal value, then the first matching regexp.
  # If no match on either a literal or a regexp, then the value is preserved.
  Proc.new do |value|
    value = yield(value) if block_given?
    if value_hash.has_key?(value) then
      value_hash[value]
    else
      # The first regex which matches the value.
      regexp = regexp_hash.detect_key { |re| value =~ re }
      # If there is a match, then apply the filter to the match data.
      # Otherwise, pass the value through unmodified.
      if regexp then
        reval, ndxs = regexp_hash[regexp]
        if ndxs.empty? or not String === reval then
          reval
        else
          # The match captures (cpts[i - 1] is $i match).
          cpts = $~.captures
          # Substitute the capture index specified in the configuration for the corresponding
          # template variable, e.g. the value filter:
          #   /(Grade )?(\d)/ : $2
          # is parsed as (reval, ndxs) = (/(Grade )?(\d)/, 1) 
          # and transforms 'Grade 3' to cpts[0], or '3'.
          fmtd = reval % ndxs.map { |i| cpts[i] }
          fmtd unless fmtd.blank?
        end
      elsif defined? @catch_all then
        @catch_all
      else
        value
      end
    end
  end
end