Module: ScraperUtils::SpecSupport
- Defined in:
- lib/scraper_utils/spec_support.rb
Overview
Methods to support specs
Constant Summary collapse
- AUSTRALIAN_STATES =
%w[ACT NSW NT QLD SA TAS VIC WA].freeze
- COMMON_STREET_TYPES =
%w[ Avenue Ave Boulevard Court Crt Circle Chase Circuit Close Crescent Drive Drv Lane Loop Parkway Place Parade Road Rd Street St Square Terrace Way ].freeze
- AUSTRALIAN_POSTCODES =
/\b\d{4}\b/.freeze
- PLACEHOLDERS =
[ /no description/i, /not available/i, /to be confirmed/i, /\btbc\b/i, %r{\bn/a\b}i ].freeze
Class Method Summary collapse
-
.geocodable?(address, ignore_case: false) ⇒ Boolean
Check if an address is likely to be geocodable by analyzing its format.
- .placeholder?(text) ⇒ Boolean
-
.reasonable_description?(text) ⇒ Boolean
Check if this looks like a “reasonable” description This is a bit stricter than needed - typically assert >= 75% match.
Class Method Details
.geocodable?(address, ignore_case: false) ⇒ Boolean
Check if an address is likely to be geocodable by analyzing its format. This is a bit stricter than needed - typically assert >= 75% match
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/scraper_utils/spec_support.rb', line 20 def self.geocodable?(address, ignore_case: false) return false if address.nil? || address.empty? check_address = ignore_case ? address.upcase : address # Basic structure check - must have a street name, suburb, state and postcode has_state = AUSTRALIAN_STATES.any? { |state| check_address.end_with?(" #{state}") || check_address.include?(" #{state} ") } has_postcode = address.match?(AUSTRALIAN_POSTCODES) has_street_type = COMMON_STREET_TYPES.any? { |type| check_address.include?(" #{type}") || check_address.include?(" #{type.upcase}") } has_unit_or_lot = address.match?(/\b(Unit|Lot:?)\s+\d+/i) has_suburb_stats = check_address.match?(/\b[A-Z]{2,}(\s+[A-Z]+)*,?\s+(#{AUSTRALIAN_STATES.join('|')})\b/) if ENV["DEBUG"] missing = [] unless has_street_type || has_unit_or_lot missing << "street type / unit / lot" end missing << "state" unless has_state missing << "postcode" unless has_postcode missing << "#{ignore_case ? '' : 'uppercase '}suburb state" unless has_suburb_stats puts " address: #{address} is not geocodable, missing #{missing.join(', ')}" if missing.any? end (has_street_type || has_unit_or_lot) && has_state && has_postcode && has_suburb_stats end |
.placeholder?(text) ⇒ Boolean
56 57 58 |
# File 'lib/scraper_utils/spec_support.rb', line 56 def self.placeholder?(text) PLACEHOLDERS.any? { |placeholder| text.to_s.match?(placeholder) } end |
.reasonable_description?(text) ⇒ Boolean
Check if this looks like a “reasonable” description This is a bit stricter than needed - typically assert >= 75% match
62 63 64 |
# File 'lib/scraper_utils/spec_support.rb', line 62 def self.reasonable_description?(text) !placeholder?(text) && text.to_s.split.size >= 3 end |