Module: ScraperUtils::DbUtils
- Defined in:
- lib/scraper_utils/db_utils.rb
Overview
Utilities for database operations in scrapers
Class Method Summary collapse
-
.cleanup_old_records(force: false) ⇒ Object
Clean up records older than 30 days and approx once a month vacuum the DB.
-
.collect_saves! ⇒ Object
Enable in-memory collection mode instead of saving to SQLite.
-
.collected_saves ⇒ Array<Array>
Get all collected save calls.
-
.save_immediately! ⇒ Object
Save to disk rather than collect.
-
.save_record(record) ⇒ void
Saves a record to the SQLite database with validation and logging.
Class Method Details
.cleanup_old_records(force: false) ⇒ Object
Clean up records older than 30 days and approx once a month vacuum the DB
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
# File 'lib/scraper_utils/db_utils.rb', line 49 def self.cleanup_old_records(force: false) cutoff_date = (Date.today - 30).to_s vacuum_cutoff_date = (Date.today - 35).to_s stats = ScraperWiki.sqliteexecute( "SELECT COUNT(*) as count, MIN(date_scraped) as oldest FROM data WHERE date_scraped < ?", [cutoff_date] ).first deleted_count = stats["count"] oldest_date = stats["oldest"] return unless deleted_count.positive? || ENV["VACUUM"] || force LogUtils.log "Deleting #{deleted_count} applications scraped between #{oldest_date} and #{cutoff_date}" ScraperWiki.sqliteexecute("DELETE FROM data WHERE date_scraped < ?", [cutoff_date]) return unless rand < 0.03 || (oldest_date && oldest_date < vacuum_cutoff_date) || ENV["VACUUM"] || force LogUtils.log " Running VACUUM to reclaim space..." ScraperWiki.sqliteexecute("VACUUM") rescue SqliteMagic::NoSuchTable => e ScraperUtils::LogUtils.log "Ignoring: #{e} whilst cleaning old records" if ScraperUtils::DebugUtils.trace? end |
.collect_saves! ⇒ Object
Enable in-memory collection mode instead of saving to SQLite
10 11 12 |
# File 'lib/scraper_utils/db_utils.rb', line 10 def self.collect_saves! @collected_saves = [] end |
.collected_saves ⇒ Array<Array>
Get all collected save calls
21 22 23 |
# File 'lib/scraper_utils/db_utils.rb', line 21 def self.collected_saves @collected_saves end |
.save_immediately! ⇒ Object
Save to disk rather than collect
15 16 17 |
# File 'lib/scraper_utils/db_utils.rb', line 15 def self.save_immediately! @collected_saves = nil end |
.save_record(record) ⇒ void
This method returns an undefined value.
Saves a record to the SQLite database with validation and logging
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/scraper_utils/db_utils.rb', line 30 def self.save_record(record) record = record.transform_keys(&:to_s) ScraperUtils::PaValidation.validate_record!(record) # Determine the primary key based on the presence of authority_label primary_key = if record.key?("authority_label") %w[authority_label council_reference] else ["council_reference"] end if @collected_saves @collected_saves << record else ScraperWiki.save_sqlite(primary_key, record) ScraperUtils::DataQualityMonitor.log_saved_record(record) end end |