Module: ScraperUtils::DbUtils

Defined in:
lib/scraper_utils/db_utils.rb

Overview

Utilities for database operations in scrapers

Class Method Summary collapse

Class Method Details

.collect_saves!Object

Enable in-memory collection mode instead of saving to SQLite



9
10
11
# File 'lib/scraper_utils/db_utils.rb', line 9

def self.collect_saves!
  @collected_saves = []
end

.collected_savesArray<Array>

Get all collected save calls

Returns:

  • (Array<Array>)

    Array of [primary_key, record] pairs



20
21
22
# File 'lib/scraper_utils/db_utils.rb', line 20

def self.collected_saves
  @collected_saves
end

.save_immediately!Object

Save to disk rather than collect



14
15
16
# File 'lib/scraper_utils/db_utils.rb', line 14

def self.save_immediately!
  @collected_saves = nil
end

.save_record(record) ⇒ void

This method returns an undefined value.

Saves a record to the SQLite database with validation and logging

Parameters:

  • record (Hash)

    The record to be saved

Raises:



29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# File 'lib/scraper_utils/db_utils.rb', line 29

def self.save_record(record)
  # Validate required fields
  required_fields = %w[council_reference address description info_url date_scraped]
  required_fields.each do |field|
    if record[field].to_s.empty?
      raise ScraperUtils::UnprocessableRecord, "Missing required field: #{field}"
    end
  end

  # Validate date formats
  %w[date_scraped date_received on_notice_from on_notice_to].each do |date_field|
    Date.parse(record[date_field]) unless record[date_field].to_s.empty?
  rescue ArgumentError
    raise ScraperUtils::UnprocessableRecord,
          "Invalid date format for #{date_field}: #{record[date_field].inspect}"
  end

  # Determine primary key based on presence of authority_label
  primary_key = if record.key?("authority_label")
                  %w[authority_label council_reference]
                else
                  ["council_reference"]
                end
  if @collected_saves
    @collected_saves << record
  else
    ScraperWiki.save_sqlite(primary_key, record)
    ScraperUtils::DataQualityMonitor.log_saved_record(record)
  end
end