Class: Nodepile::InputColumnSpecs
- Inherits:
-
Object
- Object
- Nodepile::InputColumnSpecs
- Defined in:
- lib/nodepile/colspecs.rb
Overview
This class provides information about the valid columns for potential use in documentation and also provides facilities for doing per-line verification of column values within a single line. that can appear on a non-header line of an input file.
Note that the best way to think of this class is as a scanner which is in some sense stateless
Records generated by the #parse method and related methods will by default set metadata fields, particularly including:
'@type' = :node, :edge, :rule, :pragma
'@key' = String or [String,String] for node or edge respectively
Defined Under Namespace
Classes: InvalidRecordError, PatternMatchVerifier
Constant Summary collapse
- DEFAULT_ID_DELIMITER =
parsing errors throw this
','
- DEFAULT_PRAGMA_MARKER =
may be used in _link_from and _link_to for multiple edges
"#pragma "
Instance Attribute Summary collapse
-
#id_delimiter ⇒ Object
Defines the characters that will be interpreted as delimiting entity “id” values.
Class Method Summary collapse
- .all_cols ⇒ Object
-
.bulk_parse(rec_source, source: nil, metadata: nil, metadata_key_prefix: nil, &entity_receiver) ⇒ Integer, Enumerator
Bulk parse is a convenience method for parsing a source of records.
-
.coldefs ⇒ Object
Provide a simple hash of field names and their meaning/use.
-
.id_cols ⇒ Object
List the most crucial columns that indicate the existence of nodes, edges, and styling instructions.
-
.make_pattern_match_verifier(pattern_string) ⇒ Object
“Rule” type entities are characterized by having one or more “patterns” that are used to determine which of the nodes a given rule should apply to.
- .val_is_pattern?(s) ⇒ Boolean
Instance Method Summary collapse
-
#initialize(col_names, id_delimiter: DEFAULT_ID_DELIMITER, pragmas: DEFAULT_PRAGMA_MARKER, metadata_key_prefix: '@') ⇒ InputColumnSpecs
constructor
Creates a customized InputColumnSpecs object based on the column names and order that are included in one specific file.
-
#parse(col_value_array, source: nil, ref_num: nil, metadata: nil) {|A| ... } ⇒ Integer
Given a single “record” (which may define zero or more entities or contain errors) this method will yield once for each “entity” or “rule” that may be inferred by that record.
-
#split_ids(id_containing_field, &block) ⇒ Array<String>
Given a string representing the contents of the “_id”, “_links_to”, or “_links_from” field, this method will split it into zero or more tokens representing either ids or or else patterns.
Constructor Details
#initialize(col_names, id_delimiter: DEFAULT_ID_DELIMITER, pragmas: DEFAULT_PRAGMA_MARKER, metadata_key_prefix: '@') ⇒ InputColumnSpecs
Creates a customized InputColumnSpecs object based on the column names and order that are included in one specific file. That object can then be used ONLY to validate that specific file. See the #coldefs
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
# File 'lib/nodepile/colspecs.rb', line 87 def initialize(col_names,id_delimiter: DEFAULT_ID_DELIMITER,pragmas: DEFAULT_PRAGMA_MARKER, metadata_key_prefix: '@') @col_names = col_names.dup.freeze @id_cols_indices = self.class.id_cols.map{|cnm| @col_names.find_index(cnm)}.freeze @id_delimiter = id_delimiter @pragma_marker = pragmas @empty_kv_array = KeyedArrayAccessor.new(@col_names,Array.new(@col_names.length).freeze) raise InvalidRecordError.new(<<~ERRMSG) if @id_cols_indices[0].nil? A valid record set must contain an '_id' column ERRMSG @metadata_key_prefix = @md_pfxs = [(@metadata_key_prefix||'')+'type', (@metadata_key_prefix||'')+'key', (@metadata_key_prefix||'')+'is_implied', ] @mc = CrudeCalculationCache.new end |
Instance Attribute Details
#id_delimiter ⇒ Object
Defines the characters that will be interpreted as delimiting entity “id” values.
63 64 65 |
# File 'lib/nodepile/colspecs.rb', line 63 def id_delimiter @id_delimiter end |
Class Method Details
.all_cols ⇒ Object
59 |
# File 'lib/nodepile/colspecs.rb', line 59 def self.all_cols; coldefs().keys; end |
.bulk_parse(rec_source, source: nil, metadata: nil, metadata_key_prefix: nil, &entity_receiver) ⇒ Integer, Enumerator
Bulk parse is a convenience method for parsing a source of records. It is essentially the same as instantiating an object using the first record and then calling parse multiple times
For information on most of the parameters, see the #parse method
264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 |
# File 'lib/nodepile/colspecs.rb', line 264 def self.bulk_parse(rec_source,source: nil,metadata: nil, metadata_key_prefix: nil, &entity_receiver) return enum_for(:bulk_parse,rec_source, source:, metadata:, metadata_key_prefix:) unless block_given? hdr_vals = rec_source.next specs = InputColumnSpecs.new(hdr_vals) rec_count = 0 begin loop do next_rec = rec_source.next rec_count += specs.parse(next_rec,source:, ref_num: rec_count+2,metadata:,&entity_receiver) end rescue StopIteration #no-op end return rec_count end |
.coldefs ⇒ Object
Provide a simple hash of field names and their meaning/use.
43 44 45 46 47 48 49 50 |
# File 'lib/nodepile/colspecs.rb', line 43 def self.coldefs @@class_mcache.cache(__method__){|| h = YAML.load(defined?(DATA) ? DATA.read : /__END__\s+(.*)/m.match(File.read(__FILE__))[1] )['data']['fields'] h # this value is cached } end |
.id_cols ⇒ Object
List the most crucial columns that indicate the existence of nodes, edges, and styling instructions.
58 |
# File 'lib/nodepile/colspecs.rb', line 58 def self.id_cols; %w(_id _links_from _links_to); end |
.make_pattern_match_verifier(pattern_string) ⇒ Object
“Rule” type entities are characterized by having one or more “patterns” that are used to determine which of the nodes a given rule should apply to. Most often, the patterns specify sets of node IDs would satisfy them such as through regular expression matching. However, future instances may use field values to determine matching.
For explanation of pattern logic see the PatternMatchVerifier class
333 334 335 |
# File 'lib/nodepile/colspecs.rb', line 333 def self.make_pattern_match_verifier(pattern_string) return PatternMatchVerifier.new(pattern_string) end |
.val_is_pattern?(s) ⇒ Boolean
52 53 54 |
# File 'lib/nodepile/colspecs.rb', line 52 def self.val_is_pattern?(s) s[0] == '/' ? :pattern : nil end |
Instance Method Details
#parse(col_value_array, source: nil, ref_num: nil, metadata: nil) {|A| ... } ⇒ Integer
Given a single “record” (which may define zero or more entities or contain errors) this method will yield once for each “entity” or “rule” that may be inferred by that record. The “entities” defined by a given record are determined by three fields: _id, _links_from, and _links_to.
The entries in these fields can indicate several things: 1) The explicit existence and attribute values for a node 2) Override values for a node or pattern of nodes 3) The implicit existence of a node (because an explicit node links explicitly to/from it) 4) The explicit existence of an edge (because an edge is explicitly in the to/from fields)
and attribute values for the edge
5) The implicit existence of an edge (because an edge is implied by a rule in the to/from)
and attribute values for the edge
Note that when metadata is attached to the KeyedArrayAccessors, it the metada will be updated to include the following key-values.
* 'type' = :node, :edge, :rule, :pragma
* 'key' = either a single String of nodes/node-rules or an array of two strings for edges
and edge-rules
* 'is_implied' = true,nil to indicate whether the entity is implied
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 |
# File 'lib/nodepile/colspecs.rb', line 167 def parse(col_value_array,source: nil, ref_num: nil,metadata: nil,&entity_receiver) #see below in this file for the various preprocessing defined _preprocs.each{|(ix,preproc_block)| col_value_array[ix] = preproc_block.call(col_value_array[ix]) } _validators.each{|(vl_col_nums,val_block)| errmsg = val_block.call(*vl_col_nums.map{|i| i && col_value_array[i]}) # test the specified column values raise InvalidRecordError.new(errmsg) if errmsg } if && (@metadata_key_prefix||'') != '' # if necessary, facilitate quick attachment of metadata to KeyedArrayAccessor .transform_keys{|k| k.start_with?(@metadata_key_prefix) ? k : @metadata_key_prefix + k} end ||= Hash.new # following proc is used to package up the return value at multiplel places below yieldval_bldr = Proc.new{|kaa,*three_md_fields| (0..(@md_pfxs.length-1)).each{|i| [@md_pfxs[i]] = three_md_fields[i]} kaa.(,metadata_key_prefix: @metadata_key_prefix) kaa } ids, links_from, links_to = @id_cols_indices.map{|i| i && col_value_array[i]} return 0 if ids&.start_with?('#') # ignore these records base_kva = KeyedArrayAccessor.new(@col_names, col_value_array, source: source, ref_num: ref_num) if @pragma_marker && ids&.start_with?(@pragma_marker) # pragmas get shortcut treatment, not keyed, ignore all other columns yield yieldval_bldr(base_kva,:pragma,nil,false) if block_given? return 1 # pragmas do not have links, or multiple ids end entity_count = 0 lf_list = split_ids(links_from).to_a lt_list = split_ids(links_to).to_a if !ids.nil? edge_list = Array.new else # for pure edges, add them to list for later yielding edge_list = lf_list.to_a.product(lt_list.to_a) .map{|(lf,lt)| kva = base_kva.dup kva['_links_from'] = lf kva['_links_to'] = lt [lf,lt,kva ] } end #detecting pure edges split_ids(ids).each{|id| kva = base_kva.dup.tap{|kva| kva['_id'] = id kva['_links_from'] = nil kva['_links_to'] = nil } entity_count += 1 yield yieldval_bldr.call(kva,id[0] == '?' ? :rule : :node,id.freeze,false) if block_given? # emit any implicitly existing nodes (lf_list + lt_list).each{|link| if !link.start_with?('?') entity_count += 1 # implied nodes have cleared value except their key kva = base_kva.dup.tap{|x| x['_id'] = link x['_links_from'] = nil x['_links_to'] = nil } yield yieldval_bldr.call(kva,:node,link.freeze,true) if block_given? end } # Flag edges the go from/to _id. Note, you can't define rules this way. (lf_list.product([id]) + [id].product(lt_list)).each{|a| next if a.any?{|v| v.start_with?('?')} # rules can't imply an edge kva = @empty_kv_array.dup kva['_links_from'] = a[0] kva['_links_to'] = a[1] kva.source = base_kva.source kva.ref_num = base_kva.ref_num edge_list << [a[0],a[1],kva] } } edge_list.each{|(n1,n2,kva)| entity_count += 1 et = (n1.start_with?('?') || n2.start_with?('?')) ? :rule : :edge yield yieldval_bldr.call(kva,et,[n1,n2].freeze,false) if block_given? } return entity_count end |
#split_ids(id_containing_field, &block) ⇒ Array<String>
Given a string representing the contents of the “_id”, “_links_to”, or “_links_from” field, this method will split it into zero or more tokens representing either ids or or else patterns. Patterns start with the question mark character. Leading and trailing spaces are stripped before return.
114 115 116 117 118 119 120 121 122 123 124 125 |
# File 'lib/nodepile/colspecs.rb', line 114 def split_ids(id_containing_field, &block) # very simple implementation (make smarter later???) return [] if id_containing_field.nil? return enum_for(:split_ids,id_containing_field) unless block_given? raise "A field containing a rule calculation may not contain other ids" if /,\s*\?/ =~ id_containing_field id_containing_field.split(@id_delimiter).tap{|a2| a2.each{|s| s.strip! yield s unless s == '' } } end |