Yet another JSON iterator

YAJI is a ruby wrapper to YAJL providing iterator interface to streaming JSON parser. Build Status

INSTALL

This gem depends on yajl. So you need development headers installed on your system to build this gem. For Debian GNU/Linux family it will be something like:

sudo apt-get install libyajl-dev

Now you ready to install YAJI gem:

gem install yaji

USAGE

YAJI::Parser initializer accepts IO instance or String.

require 'yaji'

YAJI::Parser.new('{"foo":"bar"}')
YAJI::Parser.new(File.open('data.json'))

There is integration with curb, so you can pass Curl::Easy instance to as input for parser.

require 'curl'
curl = Curl::Easy.new('http://avsej.net/test.json')
parser = YAJI::Parser.new(curl)
parser.each.to_a.first  #=> {"foo"=>"bar", "baz"=>{"nums"=>[42, 3.1415]}}

There no strict requirement though, it could be any instance responding to #on_body and #perform.

Parser instance provides two iterators to get JSON data: event-oriented and object-oriented. YAJI::Parser#parse yields tuple `[path, event, value] describing some parser event. For example, this code

parser = YAJI::Parser.new('{"foo":[1, {"bar":"baz"}]}')
parser.parse do |path, event, value|
  puts [path, event, value].inspect
end

prints all parser events

["", :start_hash, nil]
["", :hash_key, "foo"]
["foo", :start_array, nil]
["foo/", :number, 1]
["foo/", :start_hash, nil]
["foo/", :hash_key, "bar"]
["foo//bar", :string, "baz"]
["foo/", :end_hash, nil]
["foo", :end_array, nil]
["", :end_hash, nil]

You can call #parse method without block and it will return Enumerator object.

The another approach is to use YAJI::Parser#each method to iterate over JSON objects. It accepts optional filter parameter if you'd like to iterated sub-objects. Here is the example

parser = YAJI::Parser.new('{"size":2,"items":[{"id":1}, {"id":2}]}')
parser.each do |obj|
  puts obj.inspect
end

will print only one line:

{"size"=>2, "items"=>[{"id"=>1}, {"id"=>2}]}

But it might be more useful to yield items from inner array:

parser = YAJI::Parser.new('{"size":2,"items":[{"id":1}, {"id":2}]}')
parser.each("/items/") do |obj|
  puts obj.inspect
end

code above will print two lines:

{"id"=>1}
{"id"=>2}

You can use this iterator when the data is huge and you'd like to allow GC to collect yielded object before parser finish its job. You can also specify additional selector if you need to fetch some sibling nodes, e.g. "size" from previous example:

parser = YAJI::Parser.new('{"size":2,"items":[{"id":1}, {"id":2}]}')
parser.each(["size", "/items/"]) do |obj|
  puts obj.inspect
end

it yields

2
{"id"=>1}
{"id"=>2}

Parse objects in top-level array. Without any parameters parser will produce single object for the input:

parser = YAJI::Parser.new('[{"id":1}, {"id":2}]')
parser.each do |obj|
  puts obj.inspect
end

Output:

[{"id"=>1}, {"id"=>2}]

But you and interate over inner objects passing "/" as the argument:

parser = YAJI::Parser.new('[{"id":1}, {"id":2}]')
parser.each("/") do |obj|
  puts obj.inspect
end

Output:

{"id"=>1}
{"id"=>2}

LICENSE

Copyright 2011 Couchbase, Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.