Class: ScraperUtils::DataQualityMonitor
- Inherits:
-
Object
- Object
- ScraperUtils::DataQualityMonitor
- Defined in:
- lib/scraper_utils/data_quality_monitor.rb
Overview
Monitors data quality during scraping by tracking successful vs failed record processing Automatically triggers an exception if the error rate exceeds a threshold
Class Attribute Summary collapse
-
.stats ⇒ Object
readonly
Returns the value of attribute stats.
Class Method Summary collapse
-
.extract_authority(record) ⇒ Object
Extracts authority label and ensures stats are setup for record.
-
.log_saved_record(record) ⇒ void
Logs a successfully saved record.
-
.log_unprocessable_record(exception, record) ⇒ void
Logs an unprocessable record and raises an exception if error threshold is exceeded The threshold is 5 + 10% of saved records.
-
.start_authority(authority_label) ⇒ Object
Notes the start of processing an authority and clears any previous stats.
-
.threshold(authority_label) ⇒ Object
Threshold for unprocessable records Initial base of 5.01 (override using MORPH_UNPROCESSABLE_BASE) Initial percentage of 10% (override using MORPH_UNPROCESSABLE_PERCENTAGE).
Class Attribute Details
.stats ⇒ Object (readonly)
Returns the value of attribute stats.
10 11 12 |
# File 'lib/scraper_utils/data_quality_monitor.rb', line 10 def stats @stats end |
Class Method Details
.extract_authority(record) ⇒ Object
Extracts authority label and ensures stats are setup for record
22 23 24 25 26 27 |
# File 'lib/scraper_utils/data_quality_monitor.rb', line 22 def self.(record) = (record&.key?("authority_label") ? record["authority_label"] : "").to_sym @stats ||= {} @stats[] ||= { saved: 0, unprocessed: 0 } end |
.log_saved_record(record) ⇒ void
This method returns an undefined value.
Logs a successfully saved record
64 65 66 67 68 |
# File 'lib/scraper_utils/data_quality_monitor.rb', line 64 def self.log_saved_record(record) = (record) @stats[][:saved] += 1 ScraperUtils::LogUtils.log "Saving record #{authority_label&.empty? ? '' : "for #{authority_label}: "}#{record['council_reference']} - #{record['address']}" end |
.log_unprocessable_record(exception, record) ⇒ void
This method returns an undefined value.
Logs an unprocessable record and raises an exception if error threshold is exceeded The threshold is 5 + 10% of saved records
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
# File 'lib/scraper_utils/data_quality_monitor.rb', line 44 def self.log_unprocessable_record(exception, record) = (record) @stats[][:unprocessed] += 1 details = if record&.key?('council_reference') && record&.key?('address') "#{record['council_reference']} - #{record['address']}" else record.inspect end ScraperUtils::LogUtils.log "Erroneous record #{details}: #{exception}" return unless @stats[][:unprocessed] > threshold() raise ScraperUtils::UnprocessableSite, "Too many unprocessable_records for #{authority_label}: " \ "#{@stats[authority_label].inspect} - aborting processing of site!" end |
.start_authority(authority_label) ⇒ Object
Notes the start of processing an authority and clears any previous stats
16 17 18 19 |
# File 'lib/scraper_utils/data_quality_monitor.rb', line 16 def self.() @stats ||= {} @stats[] = { saved: 0, unprocessed: 0 } end |
.threshold(authority_label) ⇒ Object
Threshold for unprocessable records Initial base of 5.01 (override using MORPH_UNPROCESSABLE_BASE) Initial percentage of 10% (override using MORPH_UNPROCESSABLE_PERCENTAGE)
32 33 34 35 |
# File 'lib/scraper_utils/data_quality_monitor.rb', line 32 def self.threshold() ENV.fetch('MORPH_UNPROCESSABLE_BASE', 5.01).to_f + (@stats[][:saved].to_i * ENV.fetch('MORPH_UNPROCESSABLE_PERCENTAGE', 10.0).to_f / 100.0) if @stats&.fetch(, nil) end |