Module: ScraperUtils::DbUtils
- Defined in:
- lib/scraper_utils/db_utils.rb
Overview
Utilities for database operations in scrapers
Class Method Summary collapse
-
.cleanup_old_records(force: false) ⇒ Object
Clean up records older than 30 days and approx once a month vacuum the DB.
-
.collect_saves! ⇒ Object
Enable in-memory collection mode instead of saving to SQLite.
-
.collected_saves ⇒ Array<Array>
Get all collected save calls.
-
.save_immediately! ⇒ Object
Save to disk rather than collect.
-
.save_record(record) ⇒ void
Saves a record to the SQLite database with validation and logging.
Class Method Details
.cleanup_old_records(force: false) ⇒ Object
Clean up records older than 30 days and approx once a month vacuum the DB
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# File 'lib/scraper_utils/db_utils.rb', line 49 def self.cleanup_old_records(force: false) cutoff_date = (Date.today - 30).to_s vacuum_cutoff_date = (Date.today - 35).to_s stats = ScraperWiki.sqliteexecute( "SELECT COUNT(*) as count, MIN(date_scraped) as oldest FROM data WHERE date_scraped < ?", [cutoff_date] ).first deleted_count = stats["count"] oldest_date = stats["oldest"] return unless deleted_count.positive? || ENV["VACUUM"] || force LogUtils.log "Deleting #{deleted_count} applications scraped between #{oldest_date} and #{cutoff_date}" ScraperWiki.sqliteexecute("DELETE FROM data WHERE date_scraped < ?", [cutoff_date]) unless rand < 0.03 || (oldest_date && oldest_date < vacuum_cutoff_date) || ENV["VACUUM"] || force return end LogUtils.log " Running VACUUM to reclaim space..." ScraperWiki.sqliteexecute("VACUUM") rescue SqliteMagic::NoSuchTable => e if ScraperUtils::DebugUtils.trace? ScraperUtils::LogUtils.log "Ignoring: #{e} whilst cleaning old records" end end |
.collect_saves! ⇒ Object
Enable in-memory collection mode instead of saving to SQLite
10 11 12 |
# File 'lib/scraper_utils/db_utils.rb', line 10 def self.collect_saves! @collected_saves = [] end |
.collected_saves ⇒ Array<Array>
Get all collected save calls
21 22 23 |
# File 'lib/scraper_utils/db_utils.rb', line 21 def self.collected_saves @collected_saves end |
.save_immediately! ⇒ Object
Save to disk rather than collect
15 16 17 |
# File 'lib/scraper_utils/db_utils.rb', line 15 def self.save_immediately! @collected_saves = nil end |
.save_record(record) ⇒ void
This method returns an undefined value.
Saves a record to the SQLite database with validation and logging
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/scraper_utils/db_utils.rb', line 30 def self.save_record(record) record = record.transform_keys(&:to_s) ScraperUtils::PaValidation.validate_record!(record) # Determine the primary key based on the presence of authority_label primary_key = if record.key?("authority_label") %w[authority_label council_reference] else ["council_reference"] end if @collected_saves @collected_saves << record else ScraperWiki.save_sqlite(primary_key, record) ScraperUtils::DataQualityMonitor.log_saved_record(record) end end |