Class: Nokogiri::XML::Reader

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/nokogiri/xml/reader.rb,
lib/nokogiri/ffi/xml/reader.rb,
ext/nokogiri/xml_reader.c

Overview

Nokogiri::XML::Reader parses an XML document similar to the way a cursor would move. The Reader is given an XML document, and yields nodes to an each block.

Here is an example of usage:

reader = Nokogiri::XML::Reader(<<-eoxml)
  <x xmlns:tenderlove='http://tenderlovemaking.com/'>
    <tenderlove:foo awesome='true'>snuggles!</tenderlove:foo>
  </x>
eoxml

reader.each do |node|

  # node is an instance of Nokogiri::XML::Reader
  puts node.name

end

Note that Nokogiri::XML::Reader#each can only be called once!! Once the cursor moves through the entire document, you must parse the document again. So make sure that you capture any information you need during the first iteration.

The Reader parser is good for when you need the speed of a SAX parser, but do not want to write a Document handler.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#cstructObject

Returns the value of attribute cstruct.



6
7
8
# File 'lib/nokogiri/ffi/xml/reader.rb', line 6

def cstruct
  @cstruct
end

#encodingObject (readonly)

The encoding for the document



37
38
39
# File 'lib/nokogiri/xml/reader.rb', line 37

def encoding
  @encoding
end

#errorsObject

A list of errors encountered while parsing



34
35
36
# File 'lib/nokogiri/xml/reader.rb', line 34

def errors
  @errors
end

#reader_callbackObject

Returns the value of attribute reader_callback.



7
8
9
# File 'lib/nokogiri/ffi/xml/reader.rb', line 7

def reader_callback
  @reader_callback
end

#sourceObject (readonly)

The XML source



40
41
42
# File 'lib/nokogiri/xml/reader.rb', line 40

def source
  @source
end

Class Method Details

.from_io(io, url = nil, encoding = nil, options = 0) ⇒ Object

Create a new reader that parses io



516
517
518
519
520
521
522
523
524
525
526
527
528
# File 'ext/nokogiri/xml_reader.c', line 516

def self.from_io(io, url=nil, encoding=nil, options=0)
  raise(ArgumentError, "io cannot be nil") if io.nil?

  cb = IoCallbacks.reader(io) # we will keep a reference to prevent it from being GC'd
  reader_ptr = LibXML.xmlReaderForIO(cb, nil, nil, url, encoding, options)
  raise "couldn't create a parser" if reader_ptr.null?

  reader = allocate
  reader.cstruct = LibXML::XmlTextReader.new(reader_ptr)
  reader.send(:initialize, io, url, encoding)
  reader.reader_callback = cb
  reader
end

.from_memory(string, url = nil, encoding = nil, options = 0) ⇒ Object

Create a new reader that parses string



475
476
477
478
479
480
481
482
483
484
485
486
487
# File 'ext/nokogiri/xml_reader.c', line 475

def self.from_memory(buffer, url=nil, encoding=nil, options=0)
  raise(ArgumentError, "string cannot be nil") if buffer.nil?

  memory = FFI::MemoryPointer.new(buffer.length) # we need to manage native memory lifecycle
  memory.put_bytes(0, buffer)
  reader_ptr = LibXML.xmlReaderForMemory(memory, memory.total, url, encoding, options)
  raise(RuntimeError, "couldn't create a reader") if reader_ptr.null?

  reader = allocate
  reader.cstruct = LibXML::XmlTextReader.new(reader_ptr)
  reader.send(:initialize, memory, url, encoding)
  reader
end

.node_namespaces(ptr) ⇒ Object



205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
# File 'lib/nokogiri/ffi/xml/reader.rb', line 205

def node_namespaces(ptr)
  cstruct = LibXML::XmlNode.new(ptr)
  ahash = {}
  return ahash unless cstruct[:type] == Node::ELEMENT_NODE
  ns = cstruct[:nsDef]
  while ! ns.null?
    ns_cstruct = LibXML::XmlNs.new(ns)
    prefix = ns_cstruct[:prefix]
    key = if prefix.nil? || prefix.empty?
            "xmlns"
          else
            "xmlns:#{prefix}"
          end
    ahash[key] = ns_cstruct[:href] # TODO: encoding?
    ns = ns_cstruct[:next] # TODO: encoding?
  end
  ahash
end

Instance Method Details

#attribute(name) ⇒ Object

Get the value of attribute named name



202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
# File 'ext/nokogiri/xml_reader.c', line 202

def attribute(name)
  return nil if name.nil?
  attr_ptr = LibXML.xmlTextReaderGetAttribute(cstruct, name.to_s)
  if attr_ptr.null?
    # this section is an attempt to workaround older versions of libxml that
    # don't handle namespaces properly in all attribute-and-friends functions
    prefix_ptr = FFI::Buffer.new :pointer
    localname = LibXML.xmlSplitQName2(name, prefix_ptr)
    prefix = prefix_ptr.get_pointer(0)
    if ! localname.null?
      attr_ptr = LibXML.xmlTextReaderLookupNamespace(cstruct, localname.read_string)
      LibXML.xmlFree(localname)
    else
      if prefix.null? || prefix.read_string.length == 0
        attr_ptr = LibXML.xmlTextReaderLookupNamespace(cstruct, nil)
      else
        attr_ptr = LibXML.xmlTextReaderLookupNamespace(cstruct, prefix.read_string)
      end
    end
    LibXML.xmlFree(prefix)
  end
  return nil if attr_ptr.null?

  attr = attr_ptr.read_string
  LibXML.xmlFree(attr_ptr)
  attr
end

#attribute_at(index) ⇒ Object

Get the value of attribute at index



177
178
179
180
181
182
183
184
185
186
# File 'ext/nokogiri/xml_reader.c', line 177

def attribute_at(index)
  return nil if index.nil?
  index = index.to_i
  attr_ptr = LibXML.xmlTextReaderGetAttributeNo(cstruct, index)
  return nil if attr_ptr.null?

  attr = attr_ptr.read_string
  LibXML.xmlFree attr_ptr
  attr
end

#attribute_countObject

Get the number of attributes for the current node



238
239
240
241
# File 'ext/nokogiri/xml_reader.c', line 238

def attribute_count
  count = LibXML.xmlTextReaderAttributeCount(cstruct)
  count == -1 ? nil : count
end

#attribute_nodesObject

Get a list of attributes for the current node



59
60
61
62
63
# File 'lib/nokogiri/xml/reader.rb', line 59

def attribute_nodes
  nodes = attr_nodes
  nodes.each { |v| v.instance_variable_set(:@_r, self) }
  nodes
end

#attributesObject

Get a list of attributes for the current node.



51
52
53
54
55
# File 'lib/nokogiri/xml/reader.rb', line 51

def attributes
  Hash[*(attribute_nodes.map { |node|
    [node.name, node.to_s]
  }.flatten)].merge(namespaces || {})
end

#attributes?Boolean

Does this node have attributes?

Returns:

  • (Boolean)


108
109
110
111
112
113
114
115
116
# File 'ext/nokogiri/xml_reader.c', line 108

def attributes?
  #  this implementation of xmlTextReaderHasAttributes explicitly includes
  #  namespaces and properties, because some earlier versions ignore
  #  namespaces.
  node_ptr = LibXML.xmlTextReaderCurrentNode(cstruct)
  return false if node_ptr.null?
  node = LibXML::XmlNode.new node_ptr
  node[:type] == Node::ELEMENT_NODE && (!node[:properties].null? || !node[:nsDef].null?)
end

#default?Boolean

Was an attribute generated from the default value in the DTD or schema?

Returns:

  • (Boolean)


74
75
76
# File 'ext/nokogiri/xml_reader.c', line 74

def default?
  LibXML.xmlTextReaderIsDefault(cstruct) == 1
end

#depthObject

Get the depth of the node



254
255
256
257
# File 'ext/nokogiri/xml_reader.c', line 254

def depth
  val = LibXML.xmlTextReaderDepth(cstruct)
  val == -1 ? nil : val
end

#each(&block) ⇒ Object

Move the cursor through the document yielding each node to the block



67
68
69
70
71
# File 'lib/nokogiri/xml/reader.rb', line 67

def each(&block)
  while node = self.read
    block.call(node)
  end
end

#inner_xmlObject

Read the contents of the current node, including child nodes and markup.



437
438
439
440
441
442
443
# File 'ext/nokogiri/xml_reader.c', line 437

def inner_xml
  string_ptr = LibXML.xmlTextReaderReadInnerXml(cstruct)
  return nil if string_ptr.null?
  string = string_ptr.read_string
  LibXML.xmlFree(string_ptr)
  string
end

#langObject

Get the xml:lang scope within which the node resides.



286
287
288
289
# File 'ext/nokogiri/xml_reader.c', line 286

def lang
  val = LibXML.xmlTextReaderConstXmlLang(cstruct)
  val.null? ? nil : val.read_string
end

#local_nameObject

Get the local name of the node



350
351
352
353
# File 'ext/nokogiri/xml_reader.c', line 350

def local_name
  val = LibXML.xmlTextReaderConstLocalName(cstruct)
  val.null? ? nil : val.read_string
end

#nameObject

Get the name of the node



366
367
368
369
# File 'ext/nokogiri/xml_reader.c', line 366

def name
  val = LibXML.xmlTextReaderConstName(cstruct)
  val.null? ? nil : val.read_string
end

#namespace_uriObject

Get the URI defining the namespace associated with the node



334
335
336
337
# File 'ext/nokogiri/xml_reader.c', line 334

def namespace_uri
  val = LibXML.xmlTextReaderConstNamespaceUri(cstruct)
  val.null? ? nil : val.read_string
end

#namespacesObject

Get a hash of namespaces for this Node



125
126
127
128
129
130
131
132
# File 'ext/nokogiri/xml_reader.c', line 125

def namespaces
  return {} unless attributes?

  ptr = LibXML.xmlTextReaderExpand(cstruct)
  return nil if ptr.null?

  Reader.node_namespaces(ptr)
end

#node_typeObject

Get the type of readers current node



395
396
397
# File 'ext/nokogiri/xml_reader.c', line 395

def node_type
  LibXML.xmlTextReaderNodeType(cstruct)
end

#outer_xmlObject

Read the current node and its contents, including child nodes and markup.



456
457
458
459
460
461
462
# File 'ext/nokogiri/xml_reader.c', line 456

def outer_xml
  string_ptr = LibXML.xmlTextReaderReadOuterXml(cstruct)
  return nil if string_ptr.null?
  string = string_ptr.read_string
  LibXML.xmlFree(string_ptr)
  string
end

#prefixObject

Get the shorthand reference to the namespace associated with the node.



318
319
320
321
# File 'ext/nokogiri/xml_reader.c', line 318

def prefix
  val = LibXML.xmlTextReaderConstPrefix(cstruct)
  val.null? ? nil : val.read_string
end

#readObject

Move the Reader forward through the XML document.



408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
# File 'ext/nokogiri/xml_reader.c', line 408

def read
  error_list = self.errors

  LibXML.xmlSetStructuredErrorFunc(nil, SyntaxError.error_array_pusher(error_list))
  ret = LibXML.xmlTextReaderRead(cstruct)
  LibXML.xmlSetStructuredErrorFunc(nil, nil)

  return self if ret == 1
  return nil if ret == 0

  error = LibXML.xmlGetLastError()
  if error
    raise SyntaxError.wrap(error)
  else
    raise RuntimeError, "Error pulling: #{ret}"
  end

  nil
end

#stateObject

Get the state of the reader



382
383
384
# File 'ext/nokogiri/xml_reader.c', line 382

def state
  LibXML.xmlTextReaderReadState(cstruct)
end

#valueObject

Get the text value of the node if present



302
303
304
305
# File 'ext/nokogiri/xml_reader.c', line 302

def value
  val = LibXML.xmlTextReaderConstValue(cstruct)
  val.null? ? nil : val.read_string
end

#value?Boolean

Does this node have a text value?

Returns:

  • (Boolean)


91
92
93
# File 'ext/nokogiri/xml_reader.c', line 91

def value?
  LibXML.xmlTextReaderHasValue(cstruct) == 1
end

#xml_versionObject

Get the XML version of the document being read



270
271
272
273
# File 'ext/nokogiri/xml_reader.c', line 270

def xml_version
  val = LibXML.xmlTextReaderConstXmlVersion(cstruct)
  val.null? ? nil : val.read_string
end