Module: ScraperUtils::DbUtils

Defined in:
lib/scraper_utils/db_utils.rb

Overview

Utilities for database operations in scrapers

Class Method Summary collapse

Class Method Details

.cleanup_old_records(force: false) ⇒ Object

Clean up records older than 30 days and approx once a month vacuum the DB



49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# File 'lib/scraper_utils/db_utils.rb', line 49

def self.cleanup_old_records(force: false)
  cutoff_date = (Date.today - 30).to_s
  vacuum_cutoff_date = (Date.today - 35).to_s

  stats = ScraperWiki.sqliteexecute(
    "SELECT COUNT(*) as count, MIN(date_scraped) as oldest FROM data WHERE date_scraped < ?",
    [cutoff_date]
  ).first

  deleted_count = stats["count"]
  oldest_date = stats["oldest"]

  return unless deleted_count.positive? || ENV["VACUUM"] || force

  LogUtils.log "Deleting #{deleted_count} applications scraped between #{oldest_date} and #{cutoff_date}"
  ScraperWiki.sqliteexecute("DELETE FROM data WHERE date_scraped < ?", [cutoff_date])

  return unless rand < 0.03 || (oldest_date && oldest_date < vacuum_cutoff_date) || ENV["VACUUM"] || force

  LogUtils.log "  Running VACUUM to reclaim space..."
  ScraperWiki.sqliteexecute("VACUUM")
rescue SqliteMagic::NoSuchTable => e
  ScraperUtils::LogUtils.log "Ignoring: #{e} whilst cleaning old records" if ScraperUtils::DebugUtils.trace?
end

.collect_saves!Object

Enable in-memory collection mode instead of saving to SQLite



10
11
12
# File 'lib/scraper_utils/db_utils.rb', line 10

def self.collect_saves!
  @collected_saves = []
end

.collected_savesArray<Array>

Get all collected save calls

Returns:

  • (Array<Array>)

    Array of [primary_key, record] pairs



21
22
23
# File 'lib/scraper_utils/db_utils.rb', line 21

def self.collected_saves
  @collected_saves
end

.save_immediately!Object

Save to disk rather than collect



15
16
17
# File 'lib/scraper_utils/db_utils.rb', line 15

def self.save_immediately!
  @collected_saves = nil
end

.save_record(record) ⇒ void

This method returns an undefined value.

Saves a record to the SQLite database with validation and logging

Parameters:

  • record (Hash)

    The record to be saved

Raises:



30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# File 'lib/scraper_utils/db_utils.rb', line 30

def self.save_record(record)
  record = record.transform_keys(&:to_s)
  ScraperUtils::PaValidation.validate_record!(record)

  # Determine the primary key based on the presence of authority_label
  primary_key = if record.key?("authority_label")
                  %w[authority_label council_reference]
                else
                  ["council_reference"]
                end
  if @collected_saves
    @collected_saves << record
  else
    ScraperWiki.save_sqlite(primary_key, record)
    ScraperUtils::DataQualityMonitor.log_saved_record(record)
  end
end