Class: RBook::Onix::StreamReader

Inherits:
Object
  • Object
show all
Defined in:
lib/rbook/onix/stream_reader.rb

Overview

A stream reader for ONIX files. Using a stream reader is preferred for large XML files as the file is processed in stages, removing the need to store the entire thing in memory at once.

This class provides forward only iteration over a single ONIX file, returning a RBook::ONIX::Product object for each product encountered.

Basic usage

require 'rbook/onix'
reader = RBook::ONIX::StreamReader.new("some_onix_file.xml")
reader.each do |product|
  puts product.inspect
end

Instance Method Summary collapse

Constructor Details

#initialize(input) ⇒ StreamReader

creates a new stream reader to read the specified file. file can be specified as a String or File object



99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# File 'lib/rbook/onix/stream_reader.rb', line 99

def initialize(input)
  if input.class == String
    @input = File.new(input)
  elsif input.class == File
    @input = input
  else
    throw "Unable to read from path or file"                
  end

  # create a sized queue to store each product read from the file
  @queue = SizedQueue.new(100)

  # launch a reader thread to process the file and store each
  # product in the queue
  Thread.new do
    producer = Listener.new(@queue)
    REXML::Document.parse_stream(@input, producer)
  end
end

Instance Method Details

#eachObject

iterate over the file and return a product file to a block.

reader = RBook::ONIX::StreamReader.new("some_onix_file.xml")
reader.each do |product|
  puts product.inspect
end


125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# File 'lib/rbook/onix/stream_reader.rb', line 125

def each
  # if the ONIX file we're processing has been truncated (no </ONIXMessage>), then 
  # we will block on the next @queue.pop indefinitely, so give it a time limit
  obj = nil
  Timeout::timeout(5) { obj = @queue.pop }
  while !obj.nil?
    raise obj if obj.kind_of?(Exception)
    yield obj

    Timeout::timeout(5) { obj = @queue.pop }
  end
rescue Timeout::Error
  # do nothing, no more items on the queue - possibly the source
  # file wasn't an XML file?
end