Module: ScraperUtils::DbUtils

Defined in:
lib/scraper_utils/db_utils.rb

Overview

Utilities for database operations in scrapers

Class Method Summary collapse

Class Method Details

.cleanup_old_records(force: false) ⇒ Object

Clean up records older than 30 days and approx once a month vacuum the DB



49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# File 'lib/scraper_utils/db_utils.rb', line 49

def self.cleanup_old_records(force: false)
  cutoff_date = (Date.today - 30).to_s
  vacuum_cutoff_date = (Date.today - 35).to_s

  stats = ScraperWiki.sqliteexecute(
    "SELECT COUNT(*) as count, MIN(date_scraped) as oldest FROM data WHERE date_scraped < ?",
    [cutoff_date]
  ).first

  deleted_count = stats["count"]
  oldest_date = stats["oldest"]

  return unless deleted_count.positive? || ENV["VACUUM"] || force

  LogUtils.log "Deleting #{deleted_count} applications scraped between #{oldest_date} and #{cutoff_date}"
  ScraperWiki.sqliteexecute("DELETE FROM data WHERE date_scraped < ?", [cutoff_date])

  unless rand < 0.03 || (oldest_date && oldest_date < vacuum_cutoff_date) || ENV["VACUUM"] || force
    return
  end

  LogUtils.log "  Running VACUUM to reclaim space..."
  ScraperWiki.sqliteexecute("VACUUM")
rescue SqliteMagic::NoSuchTable => e
  if ScraperUtils::DebugUtils.trace?
    ScraperUtils::LogUtils.log "Ignoring: #{e} whilst cleaning old records"
  end
end

.collect_saves!Object

Enable in-memory collection mode instead of saving to SQLite



10
11
12
# File 'lib/scraper_utils/db_utils.rb', line 10

def self.collect_saves!
  @collected_saves = []
end

.collected_savesArray<Array>

Get all collected save calls

Returns:

  • (Array<Array>)

    Array of [primary_key, record] pairs



21
22
23
# File 'lib/scraper_utils/db_utils.rb', line 21

def self.collected_saves
  @collected_saves
end

.save_immediately!Object

Save to disk rather than collect



15
16
17
# File 'lib/scraper_utils/db_utils.rb', line 15

def self.save_immediately!
  @collected_saves = nil
end

.save_record(record) ⇒ void

This method returns an undefined value.

Saves a record to the SQLite database with validation and logging

Parameters:

  • record (Hash)

    The record to be saved

Raises:



30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# File 'lib/scraper_utils/db_utils.rb', line 30

def self.save_record(record)
  record = record.transform_keys(&:to_s)
  ScraperUtils::PaValidation.validate_record!(record)

  # Determine the primary key based on the presence of authority_label
  primary_key = if record.key?("authority_label")
                  %w[authority_label council_reference]
                else
                  ["council_reference"]
                end
  if @collected_saves
    @collected_saves << record
  else
    ScraperWiki.save_sqlite(primary_key, record)
    ScraperUtils::DataQualityMonitor.log_saved_record(record)
  end
end