🌐 Atlas Engine
Atlas Engine is a rails engine that provides a global end-to-end address validation API for rails apps.
- Address Validation API
- Rails App Installation
- Local Development Installation
- Address Data Ingestion
- Elasticsearch Matching Strategy
- Hosted Solution
Address Validation API
The validation API is powered by GraphQL, an example request and explanation of the parameters and response follows:
query validation {
validation(
address: {
address1: "151 O'Connor St"
address2: ""
city: "Ottawa"
provinceCode: "ON"
countryCode: CA
zip: "K2P 2L8"
}
locale: "en"
) {
validationScope
concerns {
code
fieldNames
suggestionIds
message
}
suggestions {
address1
address2
city
provinceCode
zip
}
}
}
Response:
{
"data": {
"validation": {
"validationScope": [
"country_code",
"province_code",
"zip",
"city",
"address1"
],
"concerns": [],
"suggestions": []
}
}
}
Address: The raw input for each address line that is to be validated. Requirements for each field's format and even presence or absence differs per country.
Locale: The language in which to render any messages in the validation API response.
Validation Scope: This response object is populated with the field names from the input that have been successfully validated.
Concerns: This response object is populated with a code if there is a validation error with the input address. A concern may also include a suggestion to fix the issue.
Suggestions: This response object provides the corrected value for a field that has a concern if available.
Example request with a concern:
Navigate to http://localhost:3000/graphiql and initiate the following request. Note the invalid zip field.
query validation {
validation(
address: {
address1: "151 O'Connor St"
address2: ""
city: "Ottawa"
provinceCode: "ON"
countryCode: CA
zip: "90210"
}
locale: "en"
) {
validationScope
concerns {
code
fieldNames
suggestionIds
message
}
suggestions {
address1
address2
city
provinceCode
zip
}
}
}
Response:
{
"data": {
"validation": {
"validationScope": [
"country_code",
"province_code",
"city",
"address1"
],
"concerns": [
{
"code": "zip_invalid_for_province",
"fieldNames": [
"zip",
"country",
"province"
],
"suggestionIds": [],
"message": "Enter a valid postal code for Ontario"
}
],
"suggestions": []
}
}
}
The concerns object contains a concern code zip_invalid_for_province
to highlight the validation error of 90210
being an invalid zip code for the province ON
. It also returns the human readable message
"Enter a valid postal code for Ontario"
in the provided language en
.
The validation scope excludes zip because the zip was not successfully validated.
Rails App Installation
Initial setup
Add the engine to your gemfile
gem "atlas_engine"
Run the following commands to install the engine in your rails app
bundle lock rails atlas_engine:install:migrations rails db:migrate
In
config/routes
mount AtlasEngine- Adding the line
mount AtlasEngine::Engine => "/atlas_engine"
- Adding the line
In
app/assets/config/manifest.js
- Adding the line
//= link atlas_engine/application.css
- Adding the line
Install maintenance_tasks - a dependency for Atlas Engine that is used to ingest country data.
Updating to a newer version of the engine
Working with migrations
# Copy any migrations from the engine into your app
rails atlas_engine:install:migrations
# Perform the migrations in your app
rails db:migrate
Local Development Installation
This setup guide is based on a mac os development environment. Your tooling may vary.
Install + Setup Docker
brew install docker
brew install docker-compose
# to setup the docker daemon
brew install colima
# to start the docker daemon
colima start --cpu 4 --memory 8
colima ssh
sudo sysctl -w vm.max_map_count=262144
exit
Verify docker is running with: docker info
Clone the atlas_engine git repository
git clone https://github.com/Shopify/atlas-engine.git
Setup Ruby and Rails
Install ruby >= 3.2.1
In the newly cloned repository directory run:
bundle install
# *Note* If you get an ssl error during the puma installation run the following command:
bundle config build.puma --with-pkg-config=$(brew --prefix openssl@3)/lib/pkgconfig
Setup up Dockerized Elasticsearch and MySQL
In a separate terminal, from the cloned atlas_engine directory run:
docker-compose up
# *Note* If you encounter an error getting docker credentials, remove or update the `credsStore`
key in your Docker configuration file:
# ~/.docker/config.json
"credsStore": "desktop", # remove this line
Verify your connection to the newly created Docker services with the following commands:
- MySQL :
mysql --host=127.0.0.1 --user=root
- Elasticsearch :
curl http://localhost:9200
Setup the local db
rails db:setup
Infrastructure Requirements
The elasticsearch implementation depends on the ICU analysis plugin. Refer to the Dockerfile leveraged in local setup for plugin installation.
Starting the App / Running Tests
bin/rails server
to start the serverbin/rails test
to run testsbundle exec rubocop
to run ruby style checkssrc tc
to run sorbet typechecks
Sorbet
Generate rbi files for custom code
bin/tapioca dsl --app-root="test/dummy/"
Generate rbi files for gems
bin/tapioca gems
# or
bin/tapioca gems --all
Run sorbet check
srb tc
Address Data Ingestion
In order to power the more advanced validation matching strategies that provide city / state / zip and even street
level address validation, your app must have a populated elasticsearch index per country available for atlas_engine
to query.
The data we use to power atlas engine validation is free open source data from the open addresses project.
Supported countries
At the moment, atlas_engine
supports advanced address validation for the following countries.
Country/territory | Two-letter code | Locales | Street | City | Postal Code | Province/State |
---|---|---|---|---|---|---|
Australia | AU | x | x | x | x | |
Austria | AT | x | x | x | x | |
Belgium | BE | fr,nl,de | x | x | x | |
Bermuda | BM | x | x | x | ||
Czechia | CZ | x | x | |||
Denmark | DK | x | x | |||
Faroe Islands | FO | x | x | |||
France | FR | x | x | x | ||
Gurnsey | GG | x | x | |||
Italy | IT | x | ||||
Liechtenstein | LI | x | x | x | ||
Luxembourg | LU | fr,lb | x | x | ||
Netherlands | NL | nl | x | x | x | x |
Poland | PL | x | x | x | x | |
Portugal | PT | x | x | x | ||
Slovenia | SI | x | x | x | ||
South Korea | KR | x | x | x | ||
Switzerland | CH | de,fr,it | x | x | x | |
United States | US | en | x | x | x | x |
Downloading and indexing instructions
The following guide demonstrates how to ingest data with the dummy app, but the process is the same with the engine mounted into your own rails app.
- Go to the open addresses download center, create an account, support the project, and download a GeoJSON+LD file for the country or region you wish to validate.
Restrictions on the file:
- Must be an
addresses
file, as opposed to abuildings
orparcels
file. - Must be gzipped (.gz format)
- Datasets listed under the Individual Sources section work fine. Those under Data Collections must first be unzipped. The
addresses
geojson files within may then be gzipped and imported.
For this example, we will be using the au/countrywide
--> addresses - country
data for Australia, in the GeoJSON+LD format.
Once the file is downloaded, start your app with
rails s
and navigate tohttp://localhost:3000/maintenance_tasks
(see the github repo for more information about maintenance_tasks). There are two tasks available:Maintenance::AtlasEngine::GeoJsonImportTask
andMaintenance::AtlasEngine::ElasticsearchIndexCreateTask
. We will be using both in the ingestion process.Navigate to the
Maintenance::AtlasEngine::GeoJsonImportTask
. This task will transform the raw geo json file into records in our mysql database and has the following parameters:
clear_records: If checked, removes any existing records for the country in the database.
country_code: (required) The ISO country code of the data we are ingesting. In this example, the country code of Australia is
AU
.geojson_file_path: (required) The fully qualified path of the previously downloaded geojson data from OpenAddresses. A comma-delimited list of fully-qualified paths is also accepted.
locale: (optional) The language of the data in the open addresses file.
Once properly parameterized, click run. The process will initialize a
country_import
and should succeed immediately.Navigate to
http://localhost:3000/country_imports
to track the progress of the country import. Click the import id link for a more detailed view. Once the import status has updated fromin_progress
tocomplete
we will have all of the raw open address data imported into our mysql database'satlas_engine_post_addresses
table.Navigate back to
http://localhost:3000/maintenance_tasks
and click on theMaintenance::AtlasEngine::ElasticsearchIndexCreateTask
. This task will ingest the data we have staged in mysql and use it to create documents in a new elasticsearch index which Atlas Engine will ultimately use for validation.The
ElasticsearchIndexCreateTask
includes the following parameters:
country_code: (required) the ISO country code of the data we are ingesting and the name of the elasticsearch index we will be creating. In this example, the country code of Australia is
AU
.locale: (optional) the language of the documents we will be creating. This is required for multi-locale countries as our indexes are separated by language.
province_codes: (optional) an allow list of province codes to create documents for. If left blank the task will create documents for the entire dataset.
shard_override: (optional) the number of shards to create this index with. If left blank the default will be used.
replica_override: (optional) the number of replicas to create this index with. If left blank the default will be used.
activate_index: (optional) if checked, immediately promote this index to be the index queried by atlas engine. If unchecked, the created index will need to be activated manually.
Once properly parameterized, click run. The maintenance task UI will track the progress of the index creation.
When completed, the index documents may be verified manually with an elasticsearch client. We may now use the
es
andes_street
matching strategies withAU
addresses. See below for an example of its usage.
Instructions for US import
Go to the open addresses download center and download the collection-us-region.zip files for each of the four regions (west, midwest, northeast, south).
Run the US create state geojson script to create a statewide geojson.gz file for each state
bin/us_create_state_geojson execute /path/to/us_collection_zips /path/to/output_dir
Start your app with
rails s
and navigate tohttp://localhost:3000/maintenance_tasks
. There is a task only used for the US import calledMaintenance::AtlasEngine::UsGeoJsonDirectoryImportTask
Parameterize the
UsGeoJsonDirectoryImportTask
with the output directory that contains all of the{state}-statewide.geojson.gz
files created in step 2.Once properly parameterized, click run. The process will initialize a
country_import
and should succeed immediately.Navigate to
http://localhost:3000/country_imports
to track the progress of the country import. Once the import is complete and the US data is in mysql the rest of the process for creating the elasticsearch index and verifying should be the same as above.
Elasticsearch Matching Strategy
An optional GraphQL parameter, and the strategy used to evaluate the validity of the address input. Out of the box, Atlas Engine
supports three different matching strategies: local
, es
, and es_street
.
local
matching uses the worldwide gem to provide the most basic level of address validation. This may include simple errors (required fields not populated) or more advanced errors (province not belonging to the country, zip code not belonging to the province). This level of matching does not require ingestion of country data to work, but the level of support and suggestions it can provide in its responses is minimal.es
matching uses data indexed in elasticsearch via our ingestion process to validate the city, province, country, and zip code fields of the input address, in addition to all of the basic functionality provided in thelocal
strategy.es_street
is our most advanced matching strategy and requires the highest quality data indexed in elasticsearch via our ingestion process. This matching strategy provides everything thates
andlocal
does along with validation of the address1 and address2 components of the address input.
Once we have successfully created and activated an elasticsearch index using open address data, we may now use
the more advanced elasticsearch matching strategies es
and es_street
.
Consider the following example of an invalid AU
address:
query validation {
validation(
address: {
address1: "100 miller st"
address2: ""
city: "sydney"
provinceCode: "NSW"
countryCode: AU
zip: "2060"
}
locale: "en"
matchingStrategy: ES
) {
validationScope
concerns {
code
fieldNames
suggestionIds
message
}
suggestions {
address1
address2
city
provinceCode
zip
}
}
}
When input into http://localhost:3000/graphiql
, this query should produce the following response:
{
"data": {
"validation": {
"candidate": ",NSW,,,,2060,[North Sydney],,Miller Street",
"validationScope": [
"country_code",
"province_code",
"zip",
"city",
"address1"
],
"concerns": [
{
"code": "city_inconsistent",
"typeLevel": 3,
"fieldNames": [
"city"
],
"suggestionIds": [
"665ffd09-75b8-584d-8e4a-a0f471bfea01"
],
"message": "Enter a valid city for New South Wales, 2060"
}
],
"suggestions": [
{
"id": "665ffd09-75b8-584d-8e4a-a0f471bfea01",
"address1": null,
"address2": null,
"city": "North Sydney",
"province": null,
"provinceCode": null,
"zip": null
}
]
}
}
}
The concerns object contains a concern code city_inconsistent
to highlight the validation error of sydney
being an incorrect city for the rest of the provided address. The concern message field is the human readable
error nudge "Enter a valid city for New South Wales, 2060"
, pointing to the supporting pieces of evidence (province
and zip) that were used to determine city as the inconsistent value in this address input.
The suggestion object contains a corrected city field North Sydney
which will result in no concerns or suggestions
for the validation endpoint if applied.
The candidate field contains a representation of the matching document in the elasticsearch index that was found and used to determine the suggestions and concerns in the api response.
The es_street
level of validation can also be used to correct errors in the address1
or address2
fields of the
input. In the following request we have modified our query to make a second error in our input - searching for
miller ave
instead of miller st
.
query validation {
validation(
address: {
address1: "100 miller ave"
address2: ""
city: "sydney"
provinceCode: "NSW"
countryCode: AU
zip: "2060"
}
locale: "en"
matchingStrategy: ES_STREET
) {
validationScope
concerns {
code
fieldNames
suggestionIds
message
}
suggestions {
address1
address2
city
provinceCode
zip
}
}
}
This query produces the following response:
{
"data": {
"validation": {
"candidate": ",NSW,,,,2060,[North Sydney],,Miller Street",
"validationScope": [
"country_code",
"province_code",
"zip",
"city",
"address1"
],
"concerns": [
{
"code": "city_inconsistent",
"typeLevel": 3,
"fieldNames": [
"city"
],
"suggestionIds": [
"88779db6-2c5d-5dbb-9f77-f7b07c07206a"
],
"message": "Enter a valid city for New South Wales, 2060"
},
{
"code": "street_inconsistent",
"typeLevel": 3,
"fieldNames": [
"address1"
],
"suggestionIds": [
"88779db6-2c5d-5dbb-9f77-f7b07c07206a"
],
"message": "Enter a valid street name for New South Wales, 2060"
}
],
"suggestions": [
{
"id": "88779db6-2c5d-5dbb-9f77-f7b07c07206a",
"address1": "100 Miller Street",
"address2": null,
"city": "North Sydney",
"province": null,
"provinceCode": null,
"zip": null
}
]
}
}
}
The concerns object now contains an additional concern code street_inconsistent
to highlight the validation error of
miller ave
being an incorrect street for the rest of the address input. The concern message field is the human
readable error nudge "Enter a valid street name for New South Wales, 2060"
, pointing to the supporting pieces of
evidence (province and zip) that were used to determine street as an inconsistent value in this address input.
The suggestion object contains a corrected street field 100 Miller Street
and a corrected city field North Sydney
If both of these suggestions are applied to the input address the subsequent request will be valid.
The corrected input of
query validation {
validation(
address: {
address1: "100 miller st"
address2: ""
city: "north sydney"
provinceCode: "NSW"
countryCode: AU
zip: "2060"
}
locale: "en"
matchingStrategy: ES_STREET
) {
validationScope
concerns {
code
fieldNames
suggestionIds
message
}
suggestions {
address1
address2
city
provinceCode
zip
}
}
}
will produce the response:
{
"data": {
"validation": {
"candidate": ",NSW,,,,2060,[North Sydney],,Miller Street",
"validationScope": [
"country_code",
"province_code",
"zip",
"city",
"address1"
],
"concerns": [],
"suggestions": []
}
}
}
This response has no concerns or suggestions, and the input address is therefore considered to be valid.
Hosted Solution
If you wish to use the Shopify hosted version of the atlas_engine
codebase in your applications, you will need to register for an api key and agree to our terms and conditions.
Once you have successfully redeemed an api key, you will be able to access the /graphql
endpoint, which can be queried using this example curl request:
curl --request POST \
--url https://atlas-validation.shopifyapps.com/graphql \
--header 'Authorization: Bearer {your-api-key}' \
--header 'Content-Type: application/json' \
--data '{"query":"query validation { validation(address: { address1: \"233 S Wacker Dr\" address2: \"\" city: \"Chicago\" countryCode: US provinceCode: \"IL\" zip: \"60606\" } locale: \"EN\" ) { validationScope concerns { code fieldNames suggestionIds type typeLevel message } suggestions { id address1 address2 city province provinceCode zip } }}","operationName":"validation"}'
Any updates to the atlas_engine
codebase that are merged into our main
branch will be deployed to our hosted solution as well.