Module: ScraperUtils::SpecSupport

Defined in:
lib/scraper_utils/spec_support.rb

Overview

Methods to support specs

Constant Summary collapse

AUSTRALIAN_STATES =
%w[ACT NSW NT QLD SA TAS VIC WA].freeze
COMMON_STREET_TYPES =
%w[
  Avenue Ave Boulevard Court Crt Circle Chase Circuit Close Crescent
  Drive Drv Lane Loop Parkway Place Parade Road Rd Street St Square Terrace Way
].freeze
AUSTRALIAN_POSTCODES =
/\b\d{4}\b/.freeze
PLACEHOLDERS =
[
  /no description/i,
  /not available/i,
  /to be confirmed/i,
  /\btbc\b/i,
  %r{\bn/a\b}i
].freeze

Class Method Summary collapse

Class Method Details

.geocodable?(address, ignore_case: false) ⇒ Boolean

Check if an address is likely to be geocodable by analyzing its format. This is a bit stricter than needed - typically assert >= 75% match

Parameters:

  • address (String)

    The address to check

Returns:

  • (Boolean)

    True if the address appears to be geocodable.



20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# File 'lib/scraper_utils/spec_support.rb', line 20

def self.geocodable?(address, ignore_case: false)
  return false if address.nil? || address.empty?
  check_address = ignore_case ? address.upcase : address

  # Basic structure check - must have a street name, suburb, state and postcode
  has_state = AUSTRALIAN_STATES.any? { |state| check_address.end_with?(" #{state}") || check_address.include?(" #{state} ") }
  has_postcode = address.match?(AUSTRALIAN_POSTCODES)

  has_street_type = COMMON_STREET_TYPES.any? { |type| check_address.include?(" #{type}") || check_address.include?(" #{type.upcase}") }

  has_unit_or_lot = address.match?(/\b(Unit|Lot:?)\s+\d+/i)

  has_suburb_stats = check_address.match?(/\b[A-Z]{2,}(\s+[A-Z]+)*,?\s+(#{AUSTRALIAN_STATES.join('|')})\b/)

  if ENV["DEBUG"]
    missing = []
    unless has_street_type || has_unit_or_lot
      missing << "street type / unit / lot"
    end
    missing << "state" unless has_state
    missing << "postcode" unless has_postcode
    missing << "#{ignore_case ? '' : 'uppercase '}suburb state" unless has_suburb_stats
    puts "  address: #{address} is not geocodable, missing #{missing.join(', ')}" if missing.any?
  end

  (has_street_type || has_unit_or_lot) && has_state && has_postcode && has_suburb_stats
end

.placeholder?(text) ⇒ Boolean

Returns:

  • (Boolean)


56
57
58
# File 'lib/scraper_utils/spec_support.rb', line 56

def self.placeholder?(text)
  PLACEHOLDERS.any? { |placeholder| text.to_s.match?(placeholder) }
end

.reasonable_description?(text) ⇒ Boolean

Check if this looks like a “reasonable” description This is a bit stricter than needed - typically assert >= 75% match

Returns:

  • (Boolean)


62
63
64
# File 'lib/scraper_utils/spec_support.rb', line 62

def self.reasonable_description?(text)
  !placeholder?(text) && text.to_s.split.size >= 3
end