Class: PROIEL::PROIELXML::Validator
- Inherits:
-
Object
- Object
- PROIEL::PROIELXML::Validator
- Defined in:
- lib/proiel/proiel_xml/validator.rb
Overview
A validator object that uses an XML schema as well as additional integrity checks to validate a PROIEL XML file. Functionality for loading the XML schema and checking the PROIEL XML version number is found in Schema.
Instance Attribute Summary collapse
-
#errors ⇒ Object
readonly
Returns an array of error messages generated during validation.
Instance Method Summary collapse
-
#has_referential_integrity? ⇒ true, false
Checks the referential integrity of the PROIEL XML file.
-
#initialize(filename, aligned_filename = nil) ⇒ Validator
constructor
Creates a new validator for a PROIEL XML file.
-
#valid? ⇒ true, false
Checks if the PROIEL XML file is valid.
-
#valid_schema_version? ⇒ true, false
Checks if the PROIEL XML file has a valid schema version number.
-
#validates? ⇒ true, false
Checks if the PROIEL XML file validates against the schema.
-
#wellformed? ⇒ true, false
Checks if the PROIEL XML file is well-formed XML.
Constructor Details
#initialize(filename, aligned_filename = nil) ⇒ Validator
Creates a new validator for a PROIEL XML file.
21 22 23 24 25 |
# File 'lib/proiel/proiel_xml/validator.rb', line 21 def initialize(filename, aligned_filename = nil) @filename = filename @aligned_filename = aligned_filename @errors = [] end |
Instance Attribute Details
#errors ⇒ Object (readonly)
Returns an array of error messages generated during validation.
14 15 16 |
# File 'lib/proiel/proiel_xml/validator.rb', line 14 def errors @errors end |
Instance Method Details
#has_referential_integrity? ⇒ true, false
Checks the referential integrity of the PROIEL XML file.
If inconsistencies are found, error messages will be appended to ‘errors`.
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
# File 'lib/proiel/proiel_xml/validator.rb', line 106 def has_referential_integrity? tb = PROIEL::Treebank.new tb.load_from_xml(@filename) errors = [] # Pass 1: keep track of all object IDs and look for duplicates sentence_ids = {} token_ids = {} tb.sources.each do |source| source.divs.each do |div| div.sentences.each do |sentence| errors << "Repeated sentence ID #{sentence.id}" if sentence_ids.key?(sentence.id) sentence_ids[sentence.id] = true sentence.tokens.each do |token| errors << "Repeated token ID #{token.id}" if token_ids.key?(token.id) token_ids[token.id] = { sentence: sentence.id, div: div.id, source: source.id } end end end end # Pass 2: check object ID references tb.sources.each do |source| source.tokens.each do |token| # Head IDs and slash IDs should be sentence internal check_reference_locality(errors, token, token_ids, :head_id, token.head_id, domain: :sentence, allow_nil: true) token.slashes.each do |_, target_id| check_reference_locality(errors, token, token_ids, :slash_id, target_id, domain: :sentence, allow_nil: false) end # Antecedent IDs should be source internal check_reference_locality(errors, token, token_ids, :antecedent_id, token.antecedent_id, domain: :source, allow_nil: true) end end # Pass 3: verify that all features are defined # TBD # Pass 4: alignment_id on div, sentence or token requires an alignment_id on source tb.sources.each do |source| if source.alignment_id.nil? if source.divs.any?(&:alignment_id) or source.sentences.any?(&:alignment_id) or source.tokens.any?(&:alignment_id) errors << "Alignment ID(s) on divs, sentences or tokens without alignment ID on source" end end end # Pass 5: if div is aligned, sentences and tokens within should belong # to aligned div(s); if sentence aligned, tokens within should belong # to aligned sentence(s). Skip if no alignment_id on source (see pass # 4) or if aligned source not available. if @aligned_filename aligned_tb = PROIEL::Treebank.new aligned_tb.load_from_xml(@aligned_filename) tb.sources.each do |source| if source.alignment_id aligned_source = aligned_tb.find_source(source.alignment_id) if aligned_source check_alignment_integrity(errors, source, aligned_source) else errors << "Aligned source not available in treebank" end end end end # Decide if there were any errors if errors.empty? true else @errors += errors false end end |
#valid? ⇒ true, false
Checks if the PROIEL XML file is valid. This checks for well-formedness, a valid schema version, validation against the schema and referential integrity.
If invalid, ‘errors` will contain error messages.
35 36 37 |
# File 'lib/proiel/proiel_xml/validator.rb', line 35 def valid? wellformed? and valid_schema_version? and validates? and has_referential_integrity? end |
#valid_schema_version? ⇒ true, false
Checks if the PROIEL XML file has a valid schema version number.
If invalid, an error message will be appended to ‘errors`.
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
# File 'lib/proiel/proiel_xml/validator.rb', line 61 def valid_schema_version? schema_version = PROIEL::PROIELXML::Schema.check_schema_version_of_xml_file(@filename) if schema_version.nil? @errors << 'invalid schema version' false else true end rescue PROIEL::PROIELXML::Schema::InvalidSchemaVersion => e @errors << e. false end |
#validates? ⇒ true, false
Checks if the PROIEL XML file validates against the schema.
If invalid, error messages will be appended to ‘errors`.
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
# File 'lib/proiel/proiel_xml/validator.rb', line 83 def validates? doc = Nokogiri::XML(File.read(@filename)) schema_version = PROIEL::PROIELXML::Schema.check_schema_version_of_xml_file(@filename) schema = PROIEL::PROIELXML::Schema.load_proiel_xml_schema(schema_version) r = schema.validate(doc) if r.empty? true else @errors += r.map { |e| "Line #{e.line}: #{e.}" } false end end |
#wellformed? ⇒ true, false
Checks if the PROIEL XML file is well-formed XML.
If not well-formed, an error message will be appended to ‘errors`.
45 46 47 48 49 50 51 52 53 |
# File 'lib/proiel/proiel_xml/validator.rb', line 45 def wellformed? Nokogiri::XML(File.read(@filename)) { |config| config.strict } true rescue Nokogiri::XML::SyntaxError => _ @errors << 'XML file is not wellformed' false end |