JSON Patterns
The JSON Patterns Ruby Gem is a domain-specific language (DSL) library for validating the structure of JSON documents, using patterns that resemble JSON. Just as regexps look a lot like a string, with special symbols to match patterns, a JSON pattern looks like a JSON value, with special keywords to allow for abstract pattern matching.
It supports the use of alternate object patterns, which can be distinguished by a tag. This allows for a JSON structure to mimic subtyping.
Contents
This gem contains:
- A family of classes descending from Pattern that represents JSON patterns.
- A series of helper functions that provide special keywords to make pattern creation easy.
- A family of classes descending from
Validation
orObjectMemberValidation
, that do the work of validating your JSON structures.
Installation
gem install json_patterns
You will also need a JSON parser, such as yajl-ruby. This gem works only in Ruby 1.9 and higher.
Usage
require 'json_patterns'
- Create a pattern.
- Call
Validation.new_from_pattern
on the pattern to generate aValidation
. - Obtain a JSON string, and use a JSON parser such as yajl-ruby to parse it into a nested Ruby data structure.
- Call the
validate_from_root
method on theValidation
object, passing it the parsed JSON. You'll get an array ofValidationFailure
objects. If it's empty, the validation was successful.
Example:
require 'yajl'
require 'json_patterns'
pattern = { users: array_of({ id: Integer, name: String }) }
validation = Validation.new_from_pattern(pattern)
json = File.new('users.json', 'r')
parser = Yajl::Parser.new
hash = parser.parse(json)
errors = validation.validate(hash)
print errors
Pattern Syntax
The simplest patterns are JSON literals:
- A boolean value, e.g.
true
- A number, e.g.
5
or-3.26
- A string, e.g.
"foo"
- The Ruby value
nil
represents a JSONnull
You can match values of a particular type, by using either a Ruby class or the JSON name as a symbol:
Boolean
or:boolean
matches atrue
orfalse
TrueClass
matchestrue
,FalseClass
matchesfalse
String
or:string
matches a stringInteger
or:integer
matches an integerFloat
,Numeric
or:float
matches a floating-point numberNilClass
matches a JSONnull
Array
or:array
matches any arrayHash
or:object
matches a JSON object
You can match some special kinds of strings, too:
Email
matches an email addressURL
matches anhttp
,https
orftp
URL- A
Regexp
such as/a.*b/
matches strings that match theRegexp
There's a special pattern, __
, that matches any value.
You can match arrays, but only when every element is expected to have a different value. These are called uniform arrays.
array_of(String)
will match a JSON array where all elements are expected to be a string.
You can match JSON objects using Ruby hashes. Please note the terminology differences: hashes in Ruby are called objects in JSON. Key-value pairs in Ruby are called object members in JSON, and Ruby's keys are called names in JSON. All class names and error messages use the JSON terminology.
So, a hash:
{ id: Integer, name: String }
will match a JSON object with names id
and name
, with corresponding integer and string
values, respectively. For example,
{ "id": 5, "name": "Bob" }
Analogous to the ability to match uniform arrays, you can match uniform objects. These are objects that may have an arbitrary number of names, but each value must match the same pattern. For example,
{ many => String }
will match any object where all the values are strings. The many
keyword is a special
helper function provided by the gem.
Optional names can be matched in an object using the optional
keyword:
{ id: Integer, name: String, optional => { address: String } }
will match objects with an id
and a name
, and optionally an address
with a string
value. If multiple members are listed in an optional
clause, they must occur together.
So,
{ id: Integer, optional => { name: String, address: String } }
would match objects with an id
only, or objects with names id
, name
and address
,
but name
cannot appear with out address
, or vice versa.
Alternate patterns (disjunctions)
It is possible to match one of a set of alternate patterns. There are two syntaxes for
this, depending on the context. If you are trying to match one of a set of values,
pass a list of patterns to the one_of
function. E.g.,
one_of('circle', 'square', 'triangle')
will match any of the three strings. Note that the values passed to one_of
need not
all be of the same type.
In an object member context, use the keyword one_of
, with an array value listing the
{ id: Integer, one_of => [ { name: String }, { age: Integer } ] }
This will match an object with an name id
of type integer, and either the name
name
with a string value, or the name age
with an integer value. Once again,
the internal hashes may contain multiple keys. You can think of the curly braces
in this case as mere syntax: they delineate a pattern of object members, not a
nested object. You can think of the one_of
keyword as removing the top level of
curly braces inside the array.
A common use case for alternate object patterns is to indicate different "types"
that can be considered subtypes of some base type. In Ruby, you might use subclasses
to represent alternate forms. For example, if you were representing HTML form
element tags, you might have a base Tag
class, with subclasses such as
InputTag
, TextAreaTag
, or SelectTag
. Each of these would in turn have
different attributes, though they would share some attributes in common.
In JSON, you might represent an <input>
tag with a pattern like this (deliberately
oversimplified):
{ tag: 'input', type: one_of('checkbox', 'text'), value: String }
and a <textarea>
tag with a pattern like this:
{ tag: 'textarea', rows: Integer, cols: Integer }
To handle either type of tag, you could use a one_of
pattern. To simplify the
code, we can use Ruby variables to hold on to each pattern:
inputTag = { tag: 'input', type: one_of('checkbox', 'text'), value: String }
textAreaTag = { tag: 'textarea', rows: Integer, cols: Integer }
tag = one_of(inputTag, textAreaTag)
So one can think of tag
as a base class, and inputTag
and textAreaTag
as
subclasses. They are distinguished by the value of the tag
object member.
The Validation objects are smart enough to look at the first object members (key-value pairs) of alternate patterns, and use them to distinguish the appropriate case to follow. Please note that they must be distinguished by the first name (key) or value encountered in the pattern. There is a deliberate ordering to the object members in the patterns, but no ordering in the matched JSON objects. If the cases are not distinguished by the first object member of the pattern, a ValidationAmbiguity error will occur.
Cyclic patterns
Sometimes it is desirable to match nested patterns that repeat. This can
be done using the cyclic
function:
cyclic { |person| { name: String, email: Email, friends: array_of(person) } }
This matches an object with three names: name
, email
and friends
, where
friends
points to an array of objects with name
, email
, and friends
,
and so on. If you create a cyclic structure, you are responsible for ensuring that
it is well-founded, that is, it should not require an infinite-sized JSON
structure, as all JSON structures are finite. This pattern is well-founded,
because at any level in the hierarchy, the friends
list may be empty.
Note that we are making use of Ruby's block syntax to create a local variable
person
which can be used to refer to the top level of the cyclic structure.
Here is another well-founded example:
tree = cyclic { |tree| one_of(Integer, { left: tree, right: tree }) }
This time, we show storing the result in a variable. Please note that due to
Ruby scoping rules, the tree
variable on the outside is distinct from
the other tree
variables inside the call to cyclic
, which all refer
to the same thing. This pattern would match JSON such as:
{ "left": 4, "right": { "left": 2, "right": 5 } }
It's also possible to create cyclic patterns by manipulating structures using assignment:
branch = {left: nil, right: nil}
tree = one_of(nil, branch)
branch[:left] = tree
branch[:right] = tree
This pattern will work as the one declared using cyclic
above. All
structural sharing and cycles are preserved in the the transformation
from a Pattern
to a Validation
.
The members keyword
You may have noticed that, whenever we use a special keyword within a hash,
the value, that is represented as a hash, is treated as a list of object
members, rather than as an object. There is a special keyword called
members
, which does nothing but apply this transformation. It is useful
if you have a pattern that represents an object, and want to flatten
it into a list of object members.
For example, if you have an object pattern:
address = { street: String, city: String }
You could use it as a nested object:
user = { name: String, address: address }
which would match JSON like this:
{ "name": "Bob", "address": { "street": "10 Forbes Ave", "city": "New York" } }
Using the members
keyword, you can flatten the pattern:
user = { name: String, members => address }
This will match JSON such as:
{ "name": "Bob", "street": "10 Forbes Ave", "city": "New York" }
Note that the same pattern can be used in both ways, in different places. In
addition, the same flattening occurs if a pattern is used along with the
one_of
or optional
keywords.
Errors
Errors are descended from the class ValidationError
, which consists of subclasses
ValidationUnexpected
and ValidationAmbiguity
. Each of them has a path
attribute, which is an array containing the path through the JSON where the error
was found. Each element in the array is either a string (object name) or integer
(array index).
A ValidationUnexpected
has found
and expected
attributes, describing what was
found and what was expected, respectively. The description is minimal for indicating
what went wrong. For example, if a string was expected but an array was found, then
found
would indicate "array", rather than representing the entire contents of the
array. If alternate values were expected, they will be listed in an array, e.g.:
expected: ["square", "circle", "triangle"]
found: "dodecahedron"
If a certain name was expected, but not found, all the names found in the object will be listed. For example,
expected: 'name: "street"'
found: 'names: "name", "city"'
A ValidationAmbiguity
has found
and overlapping_patterns
attributes. The
found
attribute describes what was found, and overlapping_patterns
is an array
of minimal textual descriptions of the alternate patterns that matched what was
found.
All errors have a to_s
method for convenient textual representation.
== Questions and Answers
Q: Why only Ruby 1.9? Can you support 1.8?
A: There are several reasons for using Ruby 1.9, but the most important one is that there is a guaranteed ordering of key-value pairs in hashes. This is required for the patterns to be ordered. The JSON-like key syntax of following the key with a colon instead of using a rocket is nice, too.
Q: I'd like to match arrays of a fixed length, with different patterns at each index. Is that possible?
A: Not at the moment. I may add this in a future version. The syntax would simply be the Ruby array syntax. I just haven't had a use for this yet.
Q: How about checking the length of a string?
A: You can use a Regexp. /.{8,8}
will match 8-character strings.
Q: How about checking a numeric range?
A: Not possible in this version.
Q: I want to use arbitrary functions to validate values.
A: Not possible at the moment, but I might consider adding this. One thing I don't like about it is that it's not serializable.
Q: Your one_of
keyword is like an or
. How about a sort of and
operation
on patterns?
A: Yes, I'm considering creating an all_of
for that. It's too bad you can't
overload the &&
and ||
operators in Ruby.
Q: How about using regexes for keys?
A: This creates too much ambiguity. The number of cases that must be tried is larger, and the error messages become more confusing. Better to be clear with your structure definition.
Instead I have special cased the /.*/
regex using the many
keyword. With
this there is no ambiguity about which names should be matched. All are matched.
Q: What is this "name" thing? Do you mean keys?
A: See the JSON specification. They really do call them object members, and name/value pairs. Just trying to stick with the spec. Sorry for the confusion.
Q: But you can't do mutually recursive patterns with cyclic
!
A: OK, smartypants. I didn't have a use for mutually recursive patterns.
But it is possible to write them, by manipulating the patterns after you
create them, to point at each other. The magic cyclic
keyword isn't really
required. The Pattern -> Validation transformation preserves all sharing
and cyclic structures. Exercise left for the daring reader.
Q: If there's an ambiguity error, it can be hard to find where in the pattern the problem is occuring.
A: At some point I may implement pattern paths. They won't be aware of the Ruby variables you've used, but it could help.
Acknowledgements
Thanks to Janrain, my employer, for permitting me to open source this code under my own name.
Support
Contact [email protected] for help with this gem.
Copyright (c) 2012 Lyle Kopnicky under the BSD2 license.