Module: ScraperUtils::DbUtils

Defined in:
lib/scraper_utils/db_utils.rb

Overview

Utilities for database operations in scrapers

Class Method Summary collapse

Class Method Details

.collect_saves!Object

Enable in-memory collection mode instead of saving to SQLite



9
10
11
# File 'lib/scraper_utils/db_utils.rb', line 9

def self.collect_saves!
  @collected_saves = []
end

.collected_savesArray<Array>

Get all collected save calls

Returns:

  • Array of [primary_key, record] pairs



20
21
22
# File 'lib/scraper_utils/db_utils.rb', line 20

def self.collected_saves
  @collected_saves
end

.save_immediately!Object

Save to disk rather than collect



14
15
16
# File 'lib/scraper_utils/db_utils.rb', line 14

def self.save_immediately!
  @collected_saves = nil
end

.save_record(record) ⇒ void

This method returns an undefined value.

Saves a record to the SQLite database with validation and logging

Parameters:

  • The record to be saved

Raises:

  • If record fails validation



29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# File 'lib/scraper_utils/db_utils.rb', line 29

def self.save_record(record)
  # Validate required fields
  required_fields = %w[council_reference address description info_url date_scraped]
  required_fields.each do |field|
    if record[field].to_s.empty?
      raise ScraperUtils::UnprocessableRecord, "Missing required field: #{field}"
    end
  end

  # Validate date formats
  %w[date_scraped date_received on_notice_from on_notice_to].each do |date_field|
    Date.parse(record[date_field]) unless record[date_field].to_s.empty?
  rescue ArgumentError
    raise ScraperUtils::UnprocessableRecord,
          "Invalid date format for #{date_field}: #{record[date_field].inspect}"
  end

  # Determine primary key based on presence of authority_label
  primary_key = if record.key?("authority_label")
                  %w[authority_label council_reference]
                else
                  ["council_reference"]
                end
  if @collected_saves
    @collected_saves << record
  else
    ScraperWiki.save_sqlite(primary_key, record)
    ScraperUtils::DataQualityMonitor.log_saved_record(record)
  end
end