Class: AddressExtractor
- Inherits:
-
Object
- Object
- AddressExtractor
- Defined in:
- lib/address_extractor.rb
Constant Summary collapse
- CAPTURE_MAP =
[ :street1, :street2, :city, :state, :zip, :zip ]
- STATES =
<<-EOF ALABAMA AL ALASKA AK AMERICAN SAMOA AS ARIZONA AZ ARKANSAS AR CALIFORNIA CA COLORADO CO CONNECTICUT CT DELAWARE DE DISTRICT OF COLUMBIA DC FEDERATED STATES OF MICRONESIA FM FLORIDA FL GEORGIA GA GUAM GU HAWAII HI IDAHO ID ILLINOIS IL INDIANA IN IOWA IA KANSAS KS KENTUCKY KY LOUISIANA LA MAINE ME MARSHALL ISLANDS MH MARYLAND MD MASSACHUSETTS MA MICHIGAN MI MINNESOTA MN MISSISSIPPI MS MISSOURI MO MONTANA MT NEBRASKA NE NEVADA NV NEW HAMPSHIRE NH NEW JERSEY NJ NEW MEXICO NM NEW YORK NY NORTH CAROLINA NC NORTH DAKOTA ND NORTHERN MARIANA ISLANDS MP OHIO OH OKLAHOMA OK OREGON OR PALAU PW PENNSYLVANIA PA PUERTO RICO PR RHODE ISLAND RI SOUTH CAROLINA SC SOUTH DAKOTA SD TENNESSEE TN TEXAS TX UTAH UT VERMONT VT VIRGIN ISLANDS VI VIRGINIA VA WASHINGTON WA WEST VIRGINIA WV WISCONSIN WI WYOMING WY EOF
- STATE_REGEX =
STATES.split(/\n/).collect{ |n| n.scan(/(\w.*\w)\s*([A-Z]{2})\s*$/) }.join("|")
- SECONDARY_UNIT_DESIGNATORS =
<<-EOF APARTMENT APT BASEMENT BSMT BUILDING BLDG DEPARTMENT DEPT FLOOR FL FRONT FRNT HANGAR HNGR LOBBY LBBY LOT LOT LOWER LOWR OFFICE OFC PENTHOUSE PH PIER PIER REAR REAR ROOM RM SIDE SIDE SLIP SLIP SPACE SPC STOP STOP SUITE STE TRAILER TRLR UNIT UNIT UPPER UPPR EOF
- SECONDARY_UNIT_DESIGNATORS_REGEX =
SECONDARY_UNIT_DESIGNATORS.split(/\n/).collect{ |n| n.scan(/(\w+)\s*(\w+)\s*$/) }.join("|")
- ADDRESS_PATTERN =
/ ( \d+ # A few numbers \s+ (?:[A-Za-z'.-]+\s?){1,5} # Followed by a street name ) \s* ,? \s* # a comma, optionally ( (?:\d+\s+)? # a secondary unit, optionally (?:#{SECONDARY_UNIT_DESIGNATORS_REGEX}) (?:\s+\d+)? )? \s* ,? \s* # a comma, optionally (?: (?: ( (?:[A-Za-z]+\s?){0,2} (?:[A-Za-z]+) ) # city \s* ,? \s* # a comma, optionally \b(#{STATE_REGEX})\b # state \s* ,? \s* # a comma, optionally (\d{5})? # a zip code, optionally ) | # or, instead of city and state (\d{5})? # a lone zip code will do ) /xi
Class Method Summary collapse
-
.find_addresses(string) ⇒ Object
Returns array of hashes for each address found.
-
.first_address(string) ⇒ Object
Returns hash for address if address found.
-
.replace_addresses(string) ⇒ Object
Same as
replace_first_address
but applies substition to all identified addresses. -
.replace_first_address(string) ⇒ Object
Pass it a block that recieves 2 parameters: address hash matched address string ($&) Whatever your block returns will be used for the substition.
Class Method Details
.find_addresses(string) ⇒ Object
Returns array of hashes for each address found. Returns empty array if no addresses found.
12 13 14 |
# File 'lib/address_extractor.rb', line 12 def find_addresses(string) string.scan(ADDRESS_PATTERN).collect { |a| hashify_results(a) }.compact end |
.first_address(string) ⇒ Object
Returns hash for address if address found. Returns nil if no address found.
6 7 8 |
# File 'lib/address_extractor.rb', line 6 def first_address(string) hashify_results string.scan(ADDRESS_PATTERN).first end |
.replace_addresses(string) ⇒ Object
Same as replace_first_address
but applies substition to all identified addresses.
30 31 32 33 34 35 |
# File 'lib/address_extractor.rb', line 30 def replace_addresses(string) string.gsub(ADDRESS_PATTERN) do |match| hash = hashify_results match.scan(ADDRESS_PATTERN).first useful_address?(hash) ? yield(hash, $&) : match end end |
.replace_first_address(string) ⇒ Object
Pass it a block that recieves 2 parameters:
address hash
matched address string ($&)
Whatever your block returns will be used for the substition. Returns new string with substition applied to first identified address. If no address found, returns same string unaltered.
22 23 24 25 26 27 |
# File 'lib/address_extractor.rb', line 22 def replace_first_address(string) hash = first_address(string) string.sub(ADDRESS_PATTERN) do |match| yield(hash, $&) end end |