Class: AddressExtractor

Inherits:
Object
  • Object
show all
Defined in:
lib/address_extractor.rb

Constant Summary collapse

CAPTURE_MAP =
[ :street1, :street2, :city, :state, :zip, :zip ]
STATES =
<<-EOF
  ALABAMA  AL
  ALASKA  AK
  AMERICAN SAMOA  AS
  ARIZONA  AZ
  ARKANSAS  AR
  CALIFORNIA  CA
  COLORADO  CO
  CONNECTICUT  CT
  DELAWARE  DE
  DISTRICT OF COLUMBIA  DC
  FEDERATED STATES OF MICRONESIA  FM
  FLORIDA  FL
  GEORGIA  GA
  GUAM  GU
  HAWAII  HI
  IDAHO  ID
  ILLINOIS  IL
  INDIANA  IN
  IOWA  IA
  KANSAS  KS
  KENTUCKY  KY
  LOUISIANA  LA
  MAINE  ME
  MARSHALL ISLANDS  MH
  MARYLAND  MD
  MASSACHUSETTS  MA
  MICHIGAN  MI
  MINNESOTA  MN
  MISSISSIPPI  MS
  MISSOURI  MO
  MONTANA  MT
  NEBRASKA  NE
  NEVADA  NV
  NEW HAMPSHIRE  NH
  NEW JERSEY  NJ
  NEW MEXICO  NM
  NEW YORK  NY
  NORTH CAROLINA  NC
  NORTH DAKOTA  ND
  NORTHERN MARIANA ISLANDS  MP
  OHIO  OH
  OKLAHOMA  OK
  OREGON  OR
  PALAU  PW
  PENNSYLVANIA  PA
  PUERTO RICO  PR
  RHODE ISLAND  RI
  SOUTH CAROLINA  SC
  SOUTH DAKOTA  SD
  TENNESSEE  TN
  TEXAS  TX
  UTAH  UT
  VERMONT  VT
  VIRGIN ISLANDS  VI
  VIRGINIA  VA
  WASHINGTON  WA
  WEST VIRGINIA  WV
  WISCONSIN  WI
  WYOMING  WY
EOF
STATE_REGEX =
STATES.split(/\n/).collect{ |n| n.scan(/(\w.*\w)\s*([A-Z]{2})\s*$/) }.join("|")
SECONDARY_UNIT_DESIGNATORS =
<<-EOF
  APARTMENT APT
  BASEMENT BSMT
  BUILDING BLDG
  DEPARTMENT DEPT
  FLOOR FL
  FRONT FRNT
  HANGAR HNGR
  LOBBY LBBY
  LOT LOT
  LOWER LOWR
  OFFICE OFC
  PENTHOUSE PH
  PIER PIER
  REAR REAR
  ROOM RM
  SIDE SIDE
  SLIP SLIP
  SPACE SPC
  STOP STOP
  SUITE STE
  TRAILER TRLR
  UNIT UNIT
  UPPER UPPR
EOF
SECONDARY_UNIT_DESIGNATORS_REGEX =
SECONDARY_UNIT_DESIGNATORS.split(/\n/).collect{ |n| n.scan(/(\w+)\s*(\w+)\s*$/) }.join("|")
ADDRESS_PATTERN =
/
  (
    \d+                           # A few numbers
    \s+
    (?:[A-Za-z'.-]+\s?){1,5}      # Followed by a street name
  )
  \s* ,?  \s*                     # a comma, optionally
  (
    (?:\d+\s+)?                   # a secondary unit, optionally
    (?:#{SECONDARY_UNIT_DESIGNATORS_REGEX})
    (?:\s+\d+)?
  )?
  \s* ,?  \s*                     # a comma, optionally
  (?:
    (?:
      ( (?:[A-Za-z]+\s?){0,2} (?:[A-Za-z]+) ) # city
      \s* ,?  \s*                 # a comma, optionally
      \b(#{STATE_REGEX})\b        # state
      \s* ,? \s*                  # a comma, optionally
      (\d{5})?                    # a zip code, optionally
    )
    |                             # or, instead of city and state
    (\d{5})?                      # a lone zip code will do
  )
/xi

Class Method Summary collapse

Class Method Details

.find_addresses(string) ⇒ Object

Returns array of hashes for each address found. Returns empty array if no addresses found.



12
13
14
# File 'lib/address_extractor.rb', line 12

def find_addresses(string)
  string.scan(ADDRESS_PATTERN).collect { |a| hashify_results(a) }.compact
end

.first_address(string) ⇒ Object

Returns hash for address if address found. Returns nil if no address found.



6
7
8
# File 'lib/address_extractor.rb', line 6

def first_address(string)
  hashify_results string.scan(ADDRESS_PATTERN).first
end

.replace_addresses(string) ⇒ Object

Same as replace_first_address but applies substition to all identified addresses.



30
31
32
33
34
35
# File 'lib/address_extractor.rb', line 30

def replace_addresses(string)
  string.gsub(ADDRESS_PATTERN) do |match|
    hash = hashify_results match.scan(ADDRESS_PATTERN).first
    useful_address?(hash) ? yield(hash, $&) : match
  end
end

.replace_first_address(string) ⇒ Object

Pass it a block that recieves 2 parameters:

address hash
matched address string ($&)

Whatever your block returns will be used for the substition. Returns new string with substition applied to first identified address. If no address found, returns same string unaltered.



22
23
24
25
26
27
# File 'lib/address_extractor.rb', line 22

def replace_first_address(string)
  hash = first_address(string)
  string.sub(ADDRESS_PATTERN) do |match|
    yield(hash, $&)
  end
end