Realize
This library provides a pluggable and configurable data transformation framework. The general use-case is:
We need to be able to configure the data transformation pipeline, within an application, for a system-to-system integration.
It is currently used in production at Blue Marble to power the transformation pipeline within a larger ETL framework.
Installation
To install through Rubygems:
gem install realize
You can also add this to your Gemfile:
bundle add realize
Examples
Basic Transformer Example
Here is a simple record we will use for data derivation and transformation:
record = {
id: 1,
created_at: '2020-03-04T12:34:56Z',
first: 'Frank',
last: 'Rizzo'
}
Let's say we wanted to retrieve the created_at formatted as: 'MM/DD/YY', we could write:
transformers = [
{
type: 'r/value/resolve',
key: :created_at
},
{
type: 'r/format/date',
output_format: '%D'
}
]
value = Realize.pipeline(transformers).transform(record) # 03/04/20
Notice how all built-in transformers are prefixed with 'r'. This should help isolate the built-in transformers from potential externally registered transformers.
Transformer Gallery
Here is a list of each built-in transformer, their options, and what their function is:
Collection-oriented Transformers
- r/collection/at_index [index]: Takes an array (or coerces value to an array) and returns the value at the given index position.
- r/collection/first []: Takes an array (or coerces value to an array) and returns the value at the first index position.
- r/collection/join [separator, start_index, end_index]: Takes an array (or coerces value to an array) and returns a new string by concatenating all the elements of the array by a separator character. Can also specifiy which elements to start from and end to that will make up the returned string.
- r/collection/last []: Takes an array (or coerces value to an array) and returns the value at the last index position.
- r/collection/sort [key, direction]: Takes an array (or coerces value to an array) and sort it either ascending or descending by some defined key's value.
File-based Transformers
- r/file/basename [suffix]: Return the filename and extension. If suffix is provided then it will be stripped from the end. If '.*' is passed as the suffix then any extension will be ignored.
r/file/extname [suffix]: Return the extension for value.
Filtering Transformers
r/filter/by_key_record_value [key, value]: Takes an array (or coerces value to an array) and selects only the records that match the key's value. In this case the value is derived off of the main record.
r/filter/by_key_value_presence [key]: Takes an array (or coerces value to an array) and selects only the records where the key's value is present (not nil and not empty).
r/filter/by_key_record_value [key, value]: Takes an array (or coerces value to an array) and selects only the records that match the key's value. In this case, the value is statically defined.
r/filter/inactive [key, value]: Takes an array (or coerces value to an array) and selects only the records where the current time is between the start and end times as derived from the record. Note that current time can be passed in but defaults to current UTC time.
Format-oriented Transformers
- r/format/date [input_format, output_format]: Parses the incoming value into a Time object using the configured input_format and outputs it as formatted by the configured output_format.
- r/format/lowercase []: Calls
#to_s.downcase
on the value so the returned value is guaranteed to be a lower-cased string. - r/format/pad [length, side, with]: Pad a string value with a specified 'with' value (defaults to blank space) up until the passed in length is reached. The 'side' option can be used to specify whether the padding should occur to the left or right side of the value.
- r/format/remove_whitespace []: Removes all whitespace from the incoming value.
- r/format/sha256 [encoding_format]: Convert the value into a SHA256 encoded string. The encoding_format will be hex by default but can also be configured to use base64.
- r/format/split [by]: Take a string value and split it on the specified 'by' value. By default, it will be split on a blank space.
- r/format/string_replace [original, replacement]: Replaces all occurrences of the configured original value with the replacement value.
- r/format/string_template [expression, separator, use_record]: String interpolate an expression using either the record or passed in value. Nested objects can be handled (i.e. key paths like dot-notation) by passing in a separator.
- r/format/substring [start_index, end_index, exclusive]: Cut a string in a given range. All options are optional. If a start_index is not provided, the beginning of the string is used. If no end_index is specified then the end of the string is used. If exclusive is set to true then the last index position will not be included. For example: "hellofriend" with a start_index of 0, an end_index of 5, and exclusive as true would yield: "hello". If exclusive was false then it would yield "hellof"
- r/format/uppercase []: Calls
#to_s.upcase
on the value so the returned value is guaranteed to be an upper-cased string type.
Logical Transformers
- r/logical/switch [cases, default_transformers, key]: Provides a value-based logic branching. If a value matches a specific case, the specific cases transformers will be executed. If it does not match any case then the default_transformers will be executed.
Type Transformers
- r/type/array [nullable]: Ensure the value is an array by calling Kernel#Array on the value.
- r/type/boolean [nullable]: Returns
true
if the input is 'truthy',false
if not. By default nullable is false, which means a nil input will return false. Changing this to true will return nil if nil is passed in. A 'truthy' value is defined as matching: true, t, yes, y, or 1 (case-insensitive). - r/type/string [nullable]: Calls
#to_s
on the value so the returned value is guaranteed to be a string type.
Value-oriented Transformers
- r/value/blank []: Always return a blank string.
- r/value/map [values]: Do a lookup on the value using the
values
hash. If a value is found, then its corresponding value is returned. If it isn't then the input is returned. - r/value/now [utc_offset]: Returns a Time object, defaulting to UTC offset. You can optionally pass in a different offset in the FORM: "+/-HH:MM"
- r/value/null []: Always returns null.
- r/value/resolve [key]: Dynamically resolves a value based on the record's key. It uses the Objectable library by default, which provides some goodies like dot-notation for nested objects and type-insensitivity (works for Ruby objects, hashes, structs, etc.)
- r/value/static [value]: Always returns a hard-coded value.
- r/value/uuid []: Returns a new 36 character UUID (i.e. 6967fec6-bbde-4497-82d9-55ccc7b87cd0)
- r/value/verbatim []: Default transformer, simply echos back the input.
Plugging in Transformers
Custom transformers can be externally created and registered as long as it complies with the general transformer interface:
- constructor accepts keyword arguments from which the configuration provides
- responds to a method called transform with the signature: transform(resolver, value, time, record)
After you have implemented the custom transformer, you can externally register is by:
Realize::Transformers.register('some_custom_transformer', SomeCustomTransformer)
You should now be able to use the type: 'some_custom_transformer' within your transformation configuration.
Contributing
Development Environment Configuration
Basic steps to take to get this repository compiling:
- Install Ruby (check realize.gemspec for versions supported)
- Install bundler (gem install bundler)
- Clone the repository (git clone [email protected]:bluemarblepayroll/realize.git)
- Navigate to the root folder (cd realize)
- Install dependencies (bundle)
Running Tests
To execute the test suite run:
bundle exec rspec spec --format documentation
Alternatively, you can have Guard watch for changes:
bundle exec guard
Also, do not forget to run Rubocop:
bundle exec rubocop
Publishing
Note: ensure you have proper authorization before trying to publish new versions.
After code changes have successfully gone through the Pull Request review process then the following steps should be followed for publishing new versions:
- Merge Pull Request into master
- Update
lib/realize/version.rb
using semantic versioning - Install dependencies:
bundle
- Update
CHANGELOG.md
with release notes - Commit & push master to remote and ensure CI builds master successfully
- Run
bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the.gem
file to rubygems.org.
Code of Conduct
Everyone interacting in this codebase, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.
License
This project is MIT Licensed.