Class: Nodepile::InputColumnSpecs

Inherits:
Object
  • Object
show all
Defined in:
lib/nodepile/colspecs.rb

Overview

This class provides information about the valid columns for potential use in documentation and also provides facilities for doing per-line verification of column values within a single line. that can appear on a non-header line of an input file.

Note that the best way to think of this class is as a scanner which is in some sense stateless

Records generated by the #parse method and related methods will by default set metadata fields, particularly including:

'@type' = :node, :edge, :rule, :pragma
'@key' = String or [String,String] for node or edge respectively

Defined Under Namespace

Classes: InvalidRecordError, PatternMatchVerifier

Constant Summary collapse

DEFAULT_ID_DELIMITER =

parsing errors throw this

','
DEFAULT_PRAGMA_MARKER =

may be used in _link_from and _link_to for multiple edges

"#pragma "

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(col_names, id_delimiter: DEFAULT_ID_DELIMITER, pragmas: DEFAULT_PRAGMA_MARKER, metadata_key_prefix: '@') ⇒ InputColumnSpecs

Creates a customized InputColumnSpecs object based on the column names and order that are included in one specific file. That object can then be used ONLY to validate that specific file. See the #coldefs

Parameters:

  • col_names (Array<String>)

    Order of column data expected from calls to #validate

  • id_delimiter (String) (defaults to: DEFAULT_ID_DELIMITER)

    Indicates a character that will be considered a delimiter between ids so that multiple may occupy the field

  • pragmas (String, nil) (defaults to: DEFAULT_PRAGMA_MARKER)

    If nil, “pragmas” are not identified. If true, then when the _id field is started with the “#pragma”, it is identified as a pragma and made available through the #each_pragma method. If a string, then any record whose _id column starts with that string is considered a pragma. Note that ONLY the _id column of a pragma record is captured.

  • metadata_key_prefix (String, nil) (defaults to: '@')

    During #parse and related methods records are yielded in the form of KeyedArrayAccessor objects that have both the loaded data and also metadata about the records such as the type of the entity and whether its existence was triggered explicitly or implicitly. This value is is passed to the KeyedArrayAccessor.

Raises:

  • InvalidRecordError



87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
# File 'lib/nodepile/colspecs.rb', line 87

def initialize(col_names,id_delimiter: DEFAULT_ID_DELIMITER,pragmas: DEFAULT_PRAGMA_MARKER,
                metadata_key_prefix: '@')
  @col_names = col_names.dup.freeze
  @id_cols_indices = self.class.id_cols.map{|cnm| @col_names.find_index(cnm)}.freeze
  @id_delimiter = id_delimiter
  @pragma_marker = pragmas
  @empty_kv_array = KeyedArrayAccessor.new(@col_names,Array.new(@col_names.length).freeze)
  raise InvalidRecordError.new(<<~ERRMSG) if @id_cols_indices[0].nil?
      A valid record set must contain an '_id' column
    ERRMSG
  @metadata_key_prefix = 
  @md_pfxs = [(@metadata_key_prefix||'')+'type',
             (@metadata_key_prefix||'')+'key',
             (@metadata_key_prefix||'')+'is_implied',
            ]      
    
  @mc = CrudeCalculationCache.new
end

Instance Attribute Details

#id_delimiterObject

Defines the characters that will be interpreted as delimiting entity “id” values.



63
64
65
# File 'lib/nodepile/colspecs.rb', line 63

def id_delimiter
  @id_delimiter
end

Class Method Details

.all_colsObject



59
# File 'lib/nodepile/colspecs.rb', line 59

def self.all_cols; coldefs().keys; end

.bulk_parse(rec_source, source: nil, metadata: nil, metadata_key_prefix: nil, &entity_receiver) ⇒ Integer, Enumerator

Bulk parse is a convenience method for parsing a source of records. It is essentially the same as instantiating an object using the first record and then calling parse multiple times

For information on most of the parameters, see the #parse method

Parameters:

  • rec_source (Enumerable<Array<String>>)

    first record is presumed to be the header and all other lines will be forced into the #parse method.

Returns:

  • (Integer, Enumerator)

    If a block is passed in, returns the total of all entities that were yielded from the source. Otherwise returns an enumerator.



264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
# File 'lib/nodepile/colspecs.rb', line 264

def self.bulk_parse(rec_source,source: nil,metadata: nil, metadata_key_prefix: nil, &entity_receiver)
    return enum_for(:bulk_parse,rec_source, source:, metadata:, metadata_key_prefix:) unless block_given?
    hdr_vals = rec_source.next
    specs = InputColumnSpecs.new(hdr_vals)
    rec_count = 0
    begin
        loop do
            next_rec = rec_source.next
            rec_count += specs.parse(next_rec,source:, ref_num: rec_count+2,metadata:,&entity_receiver)
        end
    rescue StopIteration
        #no-op
    end
    return rec_count
end

.coldefsObject

Provide a simple hash of field names and their meaning/use.



43
44
45
46
47
48
49
50
# File 'lib/nodepile/colspecs.rb', line 43

def self.coldefs
  @@class_mcache.cache(__method__){||
       h = YAML.load(defined?(DATA) ? DATA.read : 
                                               /__END__\s+(.*)/m.match(File.read(__FILE__))[1]
                              )['data']['fields']
       h  # this value is cached
     }
end

.id_colsObject

List the most crucial columns that indicate the existence of nodes, edges, and styling instructions.



58
# File 'lib/nodepile/colspecs.rb', line 58

def self.id_cols; %w(_id _links_from _links_to); end

.make_pattern_match_verifier(pattern_string) ⇒ Object

“Rule” type entities are characterized by having one or more “patterns” that are used to determine which of the nodes a given rule should apply to. Most often, the patterns specify sets of node IDs would satisfy them such as through regular expression matching. However, future instances may use field values to determine matching.

For explanation of pattern logic see the PatternMatchVerifier class



333
334
335
# File 'lib/nodepile/colspecs.rb', line 333

def self.make_pattern_match_verifier(pattern_string)
    return PatternMatchVerifier.new(pattern_string)
end

.val_is_pattern?(s) ⇒ Boolean

Returns:

  • (Boolean)


52
53
54
# File 'lib/nodepile/colspecs.rb', line 52

def self.val_is_pattern?(s)
  s[0] == '/' ? :pattern : nil
end

Instance Method Details

#parse(col_value_array, source: nil, ref_num: nil, metadata: nil) {|A| ... } ⇒ Integer

Given a single “record” (which may define zero or more entities or contain errors) this method will yield once for each “entity” or “rule” that may be inferred by that record. The “entities” defined by a given record are determined by three fields: _id, _links_from, and _links_to.

The entries in these fields can indicate several things: 1) The explicit existence and attribute values for a node 2) Override values for a node or pattern of nodes 3) The implicit existence of a node (because an explicit node links explicitly to/from it) 4) The explicit existence of an edge (because an edge is explicitly in the to/from fields)

and attribute values for the edge

5) The implicit existence of an edge (because an edge is implied by a rule in the to/from)

and attribute values for the edge

Note that when metadata is attached to the KeyedArrayAccessors, it the metada will be updated to include the following key-values.

* 'type' = :node, :edge, :rule, :pragma
* 'key' = either a single String of nodes/node-rules or an array of two strings for edges
          and edge-rules
* 'is_implied' = true,nil to indicate whether the entity is implied

Parameters:

  • col_value_array (Array)

    Column values in exact order of column names provided when this object was constructed.

  • metadata (Hash, nil) (defaults to: nil)

    If provided, the given metadata will be attached to each of the KeyedArrayAccessors that are yielded along with metadata about this particular entity. Note that the hash passed in will be altered in two ways. Firstly, if a @metadata_key_prefix is specified, all keys will be changed to include this prefix (if they aren’t already). Secondly, the three additional metadata key-values will be added (type, key, is_implied).

  • metadata_key_prefix (String, nil)

    See KeyedArrayAccessor#initialize for detail. If provided, this string will be foreced to appear at the beginning of every metadata key.

  • source (String, nil, Object) (defaults to: nil)

    see KeyedArrayAccessor#initialize for detail

  • ref_num (Integer, nil) (defaults to: nil)

    see KeyedArrayAccessor#iniialize for detail

Yield Parameters:

  • A (Nodepile::KeyedArrayAccessor)

    single node, edge, or rule taken extracted from the record. Note that the id, links_to, and links_from fields may be altered in the return value.

Returns:

  • (Integer)

    Number of entities encountered. Note that zero is valid.

Raises:



167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
# File 'lib/nodepile/colspecs.rb', line 167

def parse(col_value_array,source: nil, ref_num: nil,metadata: nil,&entity_receiver)
      #see below in this file for the various preprocessing defined
      _preprocs.each{|(ix,preproc_block)|
             col_value_array[ix] = preproc_block.call(col_value_array[ix]) 
            }
      _validators.each{|(vl_col_nums,val_block)|
             errmsg = val_block.call(*vl_col_nums.map{|i| i && col_value_array[i]}) # test the specified column values
             raise InvalidRecordError.new(errmsg) if errmsg 
            }
      if  && (@metadata_key_prefix||'') != ''
          # if necessary, facilitate quick attachment of metadata to KeyedArrayAccessor
          .transform_keys{|k| k.start_with?(@metadata_key_prefix) ? k : @metadata_key_prefix + k}
      end
       ||= Hash.new
      # following proc is used to package up the return value at multiplel places below
      yieldval_bldr =  Proc.new{|kaa,*three_md_fields|
                                (0..(@md_pfxs.length-1)).each{|i| [@md_pfxs[i]] = three_md_fields[i]}
                                kaa.(,metadata_key_prefix: @metadata_key_prefix)
                                kaa  
                               }
      ids, links_from, links_to = @id_cols_indices.map{|i| i && col_value_array[i]}
      return 0 if ids&.start_with?('#')  # ignore these records
      base_kva = KeyedArrayAccessor.new(@col_names, col_value_array, source: source, ref_num: ref_num)
      if @pragma_marker && ids&.start_with?(@pragma_marker) 
          # pragmas get shortcut treatment, not keyed, ignore all other columns
          yield yieldval_bldr(base_kva,:pragma,nil,false) if block_given?
          return 1  # pragmas do not have links, or multiple ids 
      end
      entity_count = 0
      lf_list = split_ids(links_from).to_a
      lt_list = split_ids(links_to).to_a
      if !ids.nil?
          edge_list = Array.new
      else
          # for pure edges, add them to list for later yielding
          edge_list = lf_list.to_a.product(lt_list.to_a)
                                           .map{|(lf,lt)| 
                                                  kva = base_kva.dup
                                                  kva['_links_from'] = lf
                                                  kva['_links_to'] = lt
                                                  [lf,lt,kva ]
                                               }
      end  #detecting pure edges

      split_ids(ids).each{|id| 
               kva = base_kva.dup.tap{|kva| 
                                           kva['_id'] = id
                                           kva['_links_from'] = nil 
                                           kva['_links_to'] = nil
                                     }
               entity_count += 1
               yield yieldval_bldr.call(kva,id[0] == '?' ? :rule : :node,id.freeze,false) if block_given?
               # emit any implicitly existing nodes
               (lf_list + lt_list).each{|link|
                               if !link.start_with?('?')
                                  entity_count += 1
                                  # implied nodes have cleared value except their key
                                  kva = base_kva.dup.tap{|x| 
                                                            x['_id'] = link
                                                            x['_links_from'] = nil
                                                            x['_links_to'] = nil
                                                        }
                                  yield yieldval_bldr.call(kva,:node,link.freeze,true) if block_given?
                               end
                              }
               # Flag edges the go from/to _id.  Note, you can't define rules this way.
               (lf_list.product([id]) + [id].product(lt_list)).each{|a|
                                               next if a.any?{|v| v.start_with?('?')} # rules can't imply an edge
                                               kva = @empty_kv_array.dup
                                               kva['_links_from'] = a[0]
                                               kva['_links_to'] = a[1]
                                               kva.source = base_kva.source
                                               kva.ref_num = base_kva.ref_num
                                               edge_list << [a[0],a[1],kva]
                                              }
              }
        edge_list.each{|(n1,n2,kva)|
                entity_count += 1
                et = (n1.start_with?('?') || n2.start_with?('?')) ? :rule : :edge
                yield yieldval_bldr.call(kva,et,[n1,n2].freeze,false) if block_given?
               }
     return entity_count
end

#split_ids(id_containing_field, &block) ⇒ Array<String>

Given a string representing the contents of the “_id”, “_links_to”, or “_links_from” field, this method will split it into zero or more tokens representing either ids or or else patterns. Patterns start with the question mark character. Leading and trailing spaces are stripped before return.

Parameters:

  • id_containing_field (String)

    Any of the possible id containing fields

Returns:

  • (Array<String>)

    zero or more



114
115
116
117
118
119
120
121
122
123
124
125
# File 'lib/nodepile/colspecs.rb', line 114

def split_ids(id_containing_field, &block)
  # very simple implementation (make smarter later???)
  return [] if id_containing_field.nil?
  return enum_for(:split_ids,id_containing_field) unless block_given?
  raise "A field containing a rule calculation may not contain other ids" if /,\s*\?/ =~ id_containing_field
  id_containing_field.split(@id_delimiter).tap{|a2| 
                                               a2.each{|s| 
                                                          s.strip!
                                                          yield s unless s == ''
                                                      }
                                           }
end