Class: Nodepile::InputColumnSpecs

Inherits:

Object

Object
Nodepile::InputColumnSpecs

show all

Defined in:: lib/nodepile/colspecs.rb

Overview

This class provides information about the valid columns for potential use in documentation and also provides facilities for doing per-line verification of column values within a single line. that can appear on a non-header line of an input file.

Note that the best way to think of this class is as a scanner which is in some sense stateless

Records generated by the #parse method and related methods will by default set metadata fields, particularly including:

'@type' = :node, :edge, :rule, :pragma
'@key' = String or [String,String] for node or edge respectively

Defined Under Namespace

Classes: InvalidRecordError, PatternMatchVerifier

Constant Summary collapse

DEFAULT_ID_DELIMITER = parsing errors throw this

','

DEFAULT_PRAGMA_MARKER = may be used in _link_from and _link_to for multiple edges

"#pragma "

Instance Attribute Summary collapse

#id_delimiter ⇒ Object

Defines the characters that will be interpreted as delimiting entity “id” values.

Class Method Summary collapse

.all_cols ⇒ Object
.bulk_parse(rec_source, source: nil, metadata: nil, metadata_key_prefix: nil, &entity_receiver) ⇒ Integer, Enumerator

Bulk parse is a convenience method for parsing a source of records.
.coldefs ⇒ Object

Provide a simple hash of field names and their meaning/use.
.id_cols ⇒ Object

List the most crucial columns that indicate the existence of nodes, edges, and styling instructions.
.make_pattern_match_verifier(pattern_string) ⇒ Object

“Rule” type entities are characterized by having one or more “patterns” that are used to determine which of the nodes a given rule should apply to.
.val_is_pattern?(s) ⇒ Boolean

Instance Method Summary collapse

#initialize(col_names, id_delimiter: DEFAULT_ID_DELIMITER, pragmas: DEFAULT_PRAGMA_MARKER, metadata_key_prefix: '@') ⇒ InputColumnSpecs constructor

Creates a customized InputColumnSpecs object based on the column names and order that are included in one specific file.
#parse(col_value_array, source: nil, ref_num: nil, metadata: nil) {|A| ... } ⇒ Integer

Given a single “record” (which may define zero or more entities or contain errors) this method will yield once for each “entity” or “rule” that may be inferred by that record.
#split_ids(id_containing_field, &block) ⇒ Array<String>

Given a string representing the contents of the “_id”, “_links_to”, or “_links_from” field, this method will split it into zero or more tokens representing either ids or or else patterns.

Constructor Details

#initialize(col_names, id_delimiter: DEFAULT_ID_DELIMITER, pragmas: DEFAULT_PRAGMA_MARKER, metadata_key_prefix: '@') ⇒ `InputColumnSpecs`

Creates a customized InputColumnSpecs object based on the column names and order that are included in one specific file. That object can then be used ONLY to validate that specific file. See the #coldefs

Parameters:

col_names (Array<String>) —

Order of column data expected from calls to #validate
id_delimiter (String) (defaults to: DEFAULT_ID_DELIMITER) —

Indicates a character that will be considered a delimiter between ids so that multiple may occupy the field
pragmas (String, nil) (defaults to: DEFAULT_PRAGMA_MARKER) —

If nil, “pragmas” are not identified. If true, then when the _id field is started with the “#pragma”, it is identified as a pragma and made available through the #each_pragma method. If a string, then any record whose _id column starts with that string is considered a pragma. Note that ONLY the _id column of a pragma record is captured.
metadata_key_prefix (String, nil) (defaults to: '@') —

During #parse and related methods records are yielded in the form of KeyedArrayAccessor objects that have both the loaded data and also metadata about the records such as the type of the entity and whether its existence was triggered explicitly or implicitly. This value is is passed to the KeyedArrayAccessor.

Raises:

InvalidRecordError

# File 'lib/nodepile/colspecs.rb', line 87

def initialize(col_names,id_delimiter: DEFAULT_ID_DELIMITER,pragmas: DEFAULT_PRAGMA_MARKER,
                metadata_key_prefix: '@')
  @col_names = col_names.dup.freeze
  @id_cols_indices = self.class.id_cols.map{|cnm| @col_names.find_index(cnm)}.freeze
  @id_delimiter = id_delimiter
  @pragma_marker = pragmas
  @empty_kv_array = KeyedArrayAccessor.new(@col_names,Array.new(@col_names.length).freeze)
  raise InvalidRecordError.new(<<~ERRMSG) if @id_cols_indices[0].nil?
      A valid record set must contain an '_id' column
    ERRMSG
  @metadata_key_prefix = metadata_key_prefix
  @md_pfxs = [(@metadata_key_prefix||'')+'type',
             (@metadata_key_prefix||'')+'key',
             (@metadata_key_prefix||'')+'is_implied',
            ]      
    
  @mc = CrudeCalculationCache.new
end

Instance Attribute Details

#id_delimiter ⇒ `Object`

Defines the characters that will be interpreted as delimiting entity “id” values.



63
64
65

# File 'lib/nodepile/colspecs.rb', line 63

def id_delimiter
  @id_delimiter
end

Class Method Details

.all_cols ⇒ `Object`

59	# File 'lib/nodepile/colspecs.rb', line 59 def self.all_cols; coldefs().keys; end

.bulk_parse(rec_source, source: nil, metadata: nil, metadata_key_prefix: nil, &entity_receiver) ⇒ `Integer`, `Enumerator`

Bulk parse is a convenience method for parsing a source of records. It is essentially the same as instantiating an object using the first record and then calling parse multiple times

For information on most of the parameters, see the #parse method

Parameters:

rec_source (Enumerable<Array<String>>) —

first record is presumed to be the header and all other lines will be forced into the #parse method.

Returns:

(Integer, Enumerator) —

If a block is passed in, returns the total of all entities that were yielded from the source. Otherwise returns an enumerator.

# File 'lib/nodepile/colspecs.rb', line 264

def self.bulk_parse(rec_source,source: nil,metadata: nil, metadata_key_prefix: nil, &entity_receiver)
    return enum_for(:bulk_parse,rec_source, source:, metadata:, metadata_key_prefix:) unless block_given?
    hdr_vals = rec_source.next
    specs = InputColumnSpecs.new(hdr_vals)
    rec_count = 0
    begin
        loop do
            next_rec = rec_source.next
            rec_count += specs.parse(next_rec,source:, ref_num: rec_count+2,metadata:,&entity_receiver)
        end
    rescue StopIteration
        #no-op
    end
    return rec_count
end

.coldefs ⇒ `Object`

Provide a simple hash of field names and their meaning/use.

# File 'lib/nodepile/colspecs.rb', line 43

def self.coldefs
  @@class_mcache.cache(__method__){||
       h = YAML.load(defined?(DATA) ? DATA.read : 
                                               /__END__\s+(.*)/m.match(File.read(__FILE__))[1]
                              )['data']['fields']
       h  # this value is cached
     }
end

.id_cols ⇒ `Object`

List the most crucial columns that indicate the existence of nodes, edges, and styling instructions.

58	# File 'lib/nodepile/colspecs.rb', line 58 def self.id_cols; %w(_id _links_from _links_to); end

.make_pattern_match_verifier(pattern_string) ⇒ `Object`

“Rule” type entities are characterized by having one or more “patterns” that are used to determine which of the nodes a given rule should apply to. Most often, the patterns specify sets of node IDs would satisfy them such as through regular expression matching. However, future instances may use field values to determine matching.

For explanation of pattern logic see the PatternMatchVerifier class



333
334
335

# File 'lib/nodepile/colspecs.rb', line 333

def self.make_pattern_match_verifier(pattern_string)
    return PatternMatchVerifier.new(pattern_string)
end

.val_is_pattern?(s) ⇒ `Boolean`

Returns:

(Boolean)



52
53
54

# File 'lib/nodepile/colspecs.rb', line 52

def self.val_is_pattern?(s)
  s[0] == '/' ? :pattern : nil
end

Instance Method Details

#parse(col_value_array, source: nil, ref_num: nil, metadata: nil) {|A| ... } ⇒ `Integer`

Given a single “record” (which may define zero or more entities or contain errors) this method will yield once for each “entity” or “rule” that may be inferred by that record. The “entities” defined by a given record are determined by three fields: _id, _links_from, and _links_to.

The entries in these fields can indicate several things: 1) The explicit existence and attribute values for a node 2) Override values for a node or pattern of nodes 3) The implicit existence of a node (because an explicit node links explicitly to/from it) 4) The explicit existence of an edge (because an edge is explicitly in the to/from fields)

and attribute values for the edge

5) The implicit existence of an edge (because an edge is implied by a rule in the to/from)

and attribute values for the edge

Note that when metadata is attached to the KeyedArrayAccessors, it the metada will be updated to include the following key-values.

* 'type' = :node, :edge, :rule, :pragma
* 'key' = either a single String of nodes/node-rules or an array of two strings for edges
          and edge-rules
* 'is_implied' = true,nil to indicate whether the entity is implied

Parameters:

col_value_array (Array) —

Column values in exact order of column names provided when this object was constructed.
metadata (Hash, nil) (defaults to: nil) —

If provided, the given metadata will be attached to each of the KeyedArrayAccessors that are yielded along with metadata about this particular entity. Note that the hash passed in will be altered in two ways. Firstly, if a @metadata_key_prefix is specified, all keys will be changed to include this prefix (if they aren’t already). Secondly, the three additional metadata key-values will be added (type, key, is_implied).
metadata_key_prefix (String, nil) —

See KeyedArrayAccessor#initialize for detail. If provided, this string will be foreced to appear at the beginning of every metadata key.
source (String, nil, Object) (defaults to: nil) —

see KeyedArrayAccessor#initialize for detail
ref_num (Integer, nil) (defaults to: nil) —

see KeyedArrayAccessor#iniialize for detail

Yield Parameters:

A (Nodepile::KeyedArrayAccessor) —

single node, edge, or rule taken extracted from the record. Note that the id, links_to, and links_from fields may be altered in the return value.

Returns:

(Integer) —

Number of entities encountered. Note that zero is valid.

Raises:

(InvalidRecordError) —

If errors or omissions in data make it uninterpretable

# File 'lib/nodepile/colspecs.rb', line 167

def parse(col_value_array,source: nil, ref_num: nil,metadata: nil,&entity_receiver)
      #see below in this file for the various preprocessing defined
      _preprocs.each{|(ix,preproc_block)|
             col_value_array[ix] = preproc_block.call(col_value_array[ix]) 
            }
      _validators.each{|(vl_col_nums,val_block)|
             errmsg = val_block.call(*vl_col_nums.map{|i| i && col_value_array[i]}) # test the specified column values
             raise InvalidRecordError.new(errmsg) if errmsg 
            }
      if metadata && (@metadata_key_prefix||'') != ''
          # if necessary, facilitate quick attachment of metadata to KeyedArrayAccessor
          metadata.transform_keys{|k| k.start_with?(@metadata_key_prefix) ? k : @metadata_key_prefix + k}
      end
      metadata ||= Hash.new
      # following proc is used to package up the return value at multiplel places below
      yieldval_bldr =  Proc.new{|kaa,*three_md_fields|
                                (0..(@md_pfxs.length-1)).each{|i| metadata[@md_pfxs[i]] = three_md_fields[i]}
                                kaa.reset_metadata(metadata,metadata_key_prefix: @metadata_key_prefix)
                                kaa  
                               }
      ids, links_from, links_to = @id_cols_indices.map{|i| i && col_value_array[i]}
      return 0 if ids&.start_with?('#')  # ignore these records
      base_kva = KeyedArrayAccessor.new(@col_names, col_value_array, source: source, ref_num: ref_num)
      if @pragma_marker && ids&.start_with?(@pragma_marker) 
          # pragmas get shortcut treatment, not keyed, ignore all other columns
          yield yieldval_bldr(base_kva,:pragma,nil,false) if block_given?
          return 1  # pragmas do not have links, or multiple ids 
      end
      entity_count = 0
      lf_list = split_ids(links_from).to_a
      lt_list = split_ids(links_to).to_a
      if !ids.nil?
          edge_list = Array.new
      else
          # for pure edges, add them to list for later yielding
          edge_list = lf_list.to_a.product(lt_list.to_a)
                                           .map{|(lf,lt)| 
                                                  kva = base_kva.dup
                                                  kva['_links_from'] = lf
                                                  kva['_links_to'] = lt
                                                  [lf,lt,kva ]
                                               }
      end  #detecting pure edges

      split_ids(ids).each{|id| 
               kva = base_kva.dup.tap{|kva| 
                                           kva['_id'] = id
                                           kva['_links_from'] = nil 
                                           kva['_links_to'] = nil
                                     }
               entity_count += 1
               yield yieldval_bldr.call(kva,id[0] == '?' ? :rule : :node,id.freeze,false) if block_given?
               # emit any implicitly existing nodes
               (lf_list + lt_list).each{|link|
                               if !link.start_with?('?')
                                  entity_count += 1
                                  # implied nodes have cleared value except their key
                                  kva = base_kva.dup.tap{|x| 
                                                            x['_id'] = link
                                                            x['_links_from'] = nil
                                                            x['_links_to'] = nil
                                                        }
                                  yield yieldval_bldr.call(kva,:node,link.freeze,true) if block_given?
                               end
                              }
               # Flag edges the go from/to _id.  Note, you can't define rules this way.
               (lf_list.product([id]) + [id].product(lt_list)).each{|a|
                                               next if a.any?{|v| v.start_with?('?')} # rules can't imply an edge
                                               kva = @empty_kv_array.dup
                                               kva['_links_from'] = a[0]
                                               kva['_links_to'] = a[1]
                                               kva.source = base_kva.source
                                               kva.ref_num = base_kva.ref_num
                                               edge_list << [a[0],a[1],kva]
                                              }
              }
        edge_list.each{|(n1,n2,kva)|
                entity_count += 1
                et = (n1.start_with?('?') || n2.start_with?('?')) ? :rule : :edge
                yield yieldval_bldr.call(kva,et,[n1,n2].freeze,false) if block_given?
               }
     return entity_count
end

#split_ids(id_containing_field, &block) ⇒ `Array<String>`

Given a string representing the contents of the “_id”, “_links_to”, or “_links_from” field, this method will split it into zero or more tokens representing either ids or or else patterns. Patterns start with the question mark character. Leading and trailing spaces are stripped before return.

Parameters:

id_containing_field (String) —

Any of the possible id containing fields

Returns:

(Array<String>) —

zero or more

# File 'lib/nodepile/colspecs.rb', line 114

def split_ids(id_containing_field, &block)
  # very simple implementation (make smarter later???)
  return [] if id_containing_field.nil?
  return enum_for(:split_ids,id_containing_field) unless block_given?
  raise "A field containing a rule calculation may not contain other ids" if /,\s*\?/ =~ id_containing_field
  id_containing_field.split(@id_delimiter).tap{|a2| 
                                               a2.each{|s| 
                                                          s.strip!
                                                          yield s unless s == ''
                                                      }
                                           }
end

Class: Nodepile::InputColumnSpecs

Overview

Defined Under Namespace

Constant Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(col_names, id_delimiter: DEFAULT_ID_DELIMITER, pragmas: DEFAULT_PRAGMA_MARKER, metadata_key_prefix: '@') ⇒ InputColumnSpecs

Instance Attribute Details

#id_delimiter ⇒ Object

Class Method Details

.all_cols ⇒ Object

.bulk_parse(rec_source, source: nil, metadata: nil, metadata_key_prefix: nil, &entity_receiver) ⇒ Integer, Enumerator

.coldefs ⇒ Object

.id_cols ⇒ Object

.make_pattern_match_verifier(pattern_string) ⇒ Object

.val_is_pattern?(s) ⇒ Boolean

Instance Method Details

#parse(col_value_array, source: nil, ref_num: nil, metadata: nil) {|A| ... } ⇒ Integer

#split_ids(id_containing_field, &block) ⇒ Array<String>

#initialize(col_names, id_delimiter: DEFAULT_ID_DELIMITER, pragmas: DEFAULT_PRAGMA_MARKER, metadata_key_prefix: '@') ⇒ `InputColumnSpecs`

#id_delimiter ⇒ `Object`

.all_cols ⇒ `Object`

.bulk_parse(rec_source, source: nil, metadata: nil, metadata_key_prefix: nil, &entity_receiver) ⇒ `Integer`, `Enumerator`

.coldefs ⇒ `Object`

.id_cols ⇒ `Object`

.make_pattern_match_verifier(pattern_string) ⇒ `Object`

.val_is_pattern?(s) ⇒ `Boolean`

#parse(col_value_array, source: nil, ref_num: nil, metadata: nil) {|A| ... } ⇒ `Integer`

#split_ids(id_containing_field, &block) ⇒ `Array<String>`