Overview

We provide some utilities for validating data in a PostgreSQL data table. These utilities can be used as simple terminal commands and can be installed by:

gem install idata

idata comes along with the following commands:

  • iload
  • ivalidate
  • ipatch
  • ieval
  • iexport
  • imerge
  • isanitize

Run a command with --help switch for the details

Prequisites:

  • PostgreSQL 9.0 or above
  • Ruby 2.0 or above
  • An auto ID field is required for data table to be validated using ivalidate

Usage

Suppose we have an items table, and we want to validate its records against certain criteria like:

  • vendor_code must not be null
  • vendor_name must not be null
  • The composite [vendor_code, vendor_name] must be unique
  • One vendor_code corresponds to only ONE vendor_name (in other words, there must not be two items with different vendor_name but with the same vendor_code) and vice-versa
  • vendor_code must reference the code column in the vendors table

Then the validation command could be:

      ivalidate --host=localhost --user=postgres --database=mydb --table=items
                --log-to=validation_errors \
                --not-null="vendor_code" \
                --not-null="vendor_name" \
                --unique="vendor_code,vendor_name" \
                --consistent-by="vendor_code|vendor_name" \
                --consistent-by="vendor_name|vendor_code" \
                --cross-reference="vendor_code|vendors.code"

Validation results for every single record are logged to an additional column named validation_errors of the items table, as specified by the --log-to switch. As you can see, most common checks can be performed using the supported switches:

    --not-null
    --unique
    --consistent-by
    --cross-reference

Custom Validation

For more customized checks, we support some other switches.

The --match="field/pattern/" switch tells the program to check if value of a field matches the provided pattern (which is a regular expression). For example:

    # Check if item_id is a number:
    ivalidate --match="item_id/[0-9]+/"

    # Check if value of status is either 'A' or 'I' (any other value is not allowed)
    ivalidate --match="status/^(A|I)$/"

In case you need even more customized validation other than the supported ones (match, unique, not-null, cross-reference...) then the --query switch may come in handy. For example:

    ivalidate --query="start_date >= string_to_date('01/02/2014') -- invalid date"

You can also use --rquery which is the reversed counterpart of --query For example, the following two checks are equivalent, mark any record whose start_date < '01/02/2014' as "invalid date":

    ivalidate --query="start_date >= string_to_date('01/02/2014') -- invalid date"
    ivalidate --rquery="start_date < string_to_date('01/02/2014') -- invalid date"

Note: run ivalidate --help to see the full list of supported switches

Put it all together

You can put several ivalidate commands (for several data tables) in one single bash/sh file. Besides ivalidate, we also support some other utilities to:

  • Load data from text files to SQL tables
  • Modify data tables
  • Generate summary reports

See our sample.sh for a comprehensive example