Class: Nokogiri::XML::Reader

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/nokogiri/xml/reader.rb,
lib/nokogiri/ffi/xml/reader.rb,
ext/nokogiri/xml_reader.c

Overview

Nokogiri::XML::Reader parses an XML document similar to the way a cursor would move. The Reader is given an XML document, and yields nodes to an each block.

Here is an example of usage:

reader = Nokogiri::XML::Reader(<<-eoxml)
  <x xmlns:tenderlove='http://tenderlovemaking.com/'>
    <tenderlove:foo awesome='true'>snuggles!</tenderlove:foo>
  </x>
eoxml

reader.each do |node|

  # node is an instance of Nokogiri::XML::Reader
  puts node.name

end

Note that Nokogiri::XML::Reader#each can only be called once!! Once the cursor moves through the entire document, you must parse the document again. So make sure that you capture any information you need during the first iteration.

The Reader parser is good for when you need the speed of a SAX parser, but do not want to write a Document handler.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#cstructObject

Returns the value of attribute cstruct.



6
7
8
# File 'lib/nokogiri/ffi/xml/reader.rb', line 6

def cstruct
  @cstruct
end

#encodingObject (readonly)

The encoding for the document



37
38
39
# File 'lib/nokogiri/xml/reader.rb', line 37

def encoding
  @encoding
end

#errorsObject

A list of errors encountered while parsing



34
35
36
# File 'lib/nokogiri/xml/reader.rb', line 34

def errors
  @errors
end

#reader_callbackObject

Returns the value of attribute reader_callback.



7
8
9
# File 'lib/nokogiri/ffi/xml/reader.rb', line 7

def reader_callback
  @reader_callback
end

#sourceObject (readonly)

The XML source



40
41
42
# File 'lib/nokogiri/xml/reader.rb', line 40

def source
  @source
end

Class Method Details

.from_io(io, url = nil, encoding = nil, options = 0) ⇒ Object

Create a new reader that parses io



584
585
586
587
588
589
590
591
592
593
594
595
596
# File 'ext/nokogiri/xml_reader.c', line 584

def self.from_io(io, url=nil, encoding=nil, options=0)
  raise(ArgumentError, "io cannot be nil") if io.nil?

  cb = IoCallbacks.reader(io) # we will keep a reference to prevent it from being GC'd
  reader_ptr = LibXML.xmlReaderForIO(cb, nil, nil, url, encoding, options)
  raise "couldn't create a parser" if reader_ptr.null?

  reader = allocate
  reader.cstruct = LibXML::XmlTextReader.new(reader_ptr)
  reader.send(:initialize, io, url, encoding)
  reader.reader_callback = cb
  reader
end

.from_memory(string, url = nil, encoding = nil, options = 0) ⇒ Object

Create a new reader that parses string



540
541
542
543
544
545
546
547
548
549
550
551
552
# File 'ext/nokogiri/xml_reader.c', line 540

def self.from_memory(buffer, url=nil, encoding=nil, options=0)
  raise(ArgumentError, "string cannot be nil") if buffer.nil?

  memory = FFI::MemoryPointer.new(buffer.length) # we need to manage native memory lifecycle
  memory.put_bytes(0, buffer)
  reader_ptr = LibXML.xmlReaderForMemory(memory, memory.total, url, encoding, options)
  raise(RuntimeError, "couldn't create a reader") if reader_ptr.null?

  reader = allocate
  reader.cstruct = LibXML::XmlTextReader.new(reader_ptr)
  reader.send(:initialize, memory, url, encoding)
  reader
end

.node_namespaces(ptr) ⇒ Object



210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
# File 'lib/nokogiri/ffi/xml/reader.rb', line 210

def node_namespaces(ptr)
  cstruct = LibXML::XmlNode.new(ptr)
  ahash = {}
  return ahash unless cstruct[:type] == Node::ELEMENT_NODE
  ns = cstruct[:nsDef]
  while ! ns.null?
    ns_cstruct = LibXML::XmlNs.new(ns)
    prefix = ns_cstruct[:prefix]
    key = if prefix.nil? || prefix.empty?
            "xmlns"
          else
            "xmlns:#{prefix}"
          end
    ahash[key] = ns_cstruct[:href] # TODO: encoding?
    ns = ns_cstruct[:next] # TODO: encoding?
  end
  ahash
end

Instance Method Details

#attribute(name) ⇒ Object

Get the value of attribute named name



213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
# File 'ext/nokogiri/xml_reader.c', line 213

def attribute(name)
  return nil if name.nil?
  attr_ptr = LibXML.xmlTextReaderGetAttribute(cstruct, name.to_s)
  if attr_ptr.null?
    # this section is an attempt to workaround older versions of libxml that
    # don't handle namespaces properly in all attribute-and-friends functions
    prefix_ptr = FFI::Buffer.new :pointer
    localname = LibXML.xmlSplitQName2(name, prefix_ptr)
    prefix = prefix_ptr.get_pointer(0)
    if ! localname.null?
      attr_ptr = LibXML.xmlTextReaderLookupNamespace(cstruct, localname.read_string)
      LibXML.xmlFree(localname)
    else
      if prefix.null? || prefix.read_string.length == 0
        attr_ptr = LibXML.xmlTextReaderLookupNamespace(cstruct, nil)
      else
        attr_ptr = LibXML.xmlTextReaderLookupNamespace(cstruct, prefix.read_string)
      end
    end
    LibXML.xmlFree(prefix)
  end
  return nil if attr_ptr.null?

  attr = attr_ptr.read_string
  LibXML.xmlFree(attr_ptr)
  attr
end

#attribute_at(index) ⇒ Object

Get the value of attribute at index



185
186
187
188
189
190
191
192
193
194
# File 'ext/nokogiri/xml_reader.c', line 185

def attribute_at(index)
  return nil if index.nil?
  index = index.to_i
  attr_ptr = LibXML.xmlTextReaderGetAttributeNo(cstruct, index)
  return nil if attr_ptr.null?

  attr = attr_ptr.read_string
  LibXML.xmlFree attr_ptr
  attr
end

#attribute_countObject

Get the number of attributes for the current node



251
252
253
254
# File 'ext/nokogiri/xml_reader.c', line 251

def attribute_count
  count = LibXML.xmlTextReaderAttributeCount(cstruct)
  count == -1 ? nil : count
end

#attribute_nodesObject

Get a list of attributes for the current node



59
60
61
62
63
# File 'lib/nokogiri/xml/reader.rb', line 59

def attribute_nodes
  nodes = attr_nodes
  nodes.each { |v| v.instance_variable_set(:@_r, self) }
  nodes
end

#attributesObject

Get a list of attributes for the current node.



51
52
53
54
55
# File 'lib/nokogiri/xml/reader.rb', line 51

def attributes
  Hash[*(attribute_nodes.map { |node|
    [node.name, node.to_s]
  }.flatten)].merge(namespaces || {})
end

#attributes?Boolean

Does this node have attributes?

Returns:

  • (Boolean)


112
113
114
115
116
117
118
119
120
# File 'ext/nokogiri/xml_reader.c', line 112

def attributes?
  #  this implementation of xmlTextReaderHasAttributes explicitly includes
  #  namespaces and properties, because some earlier versions ignore
  #  namespaces.
  node_ptr = LibXML.xmlTextReaderCurrentNode(cstruct)
  return false if node_ptr.null?
  node = LibXML::XmlNode.new node_ptr
  node[:type] == Node::ELEMENT_NODE && (!node[:properties].null? || !node[:nsDef].null?)
end

#base_uriObject

base_uri

Get the xml:base of the node



413
414
415
416
# File 'ext/nokogiri/xml_reader.c', line 413

def base_uri
  val = LibXML.xmlTextReaderConstBaseUri(cstruct)
  val.null? ? nil : val.read_string
end

#default?Boolean

Was an attribute generated from the default value in the DTD or schema?

Returns:

  • (Boolean)


74
75
76
# File 'ext/nokogiri/xml_reader.c', line 74

def default?
  LibXML.xmlTextReaderIsDefault(cstruct) == 1
end

#depthObject

Get the depth of the node



269
270
271
272
# File 'ext/nokogiri/xml_reader.c', line 269

def depth
  val = LibXML.xmlTextReaderDepth(cstruct)
  val == -1 ? nil : val
end

#each(&block) ⇒ Object

Move the cursor through the document yielding each node to the block



67
68
69
70
71
# File 'lib/nokogiri/xml/reader.rb', line 67

def each(&block)
  while node = self.read
    block.call(node)
  end
end

#inner_xmlObject

Read the contents of the current node, including child nodes and markup. Returns a utf-8 encoded string.



491
492
493
494
495
496
497
# File 'ext/nokogiri/xml_reader.c', line 491

def inner_xml
  string_ptr = LibXML.xmlTextReaderReadInnerXml(cstruct)
  return nil if string_ptr.null?
  string = string_ptr.read_string
  LibXML.xmlFree(string_ptr)
  string
end

#langObject

Get the xml:lang scope within which the node resides.



305
306
307
308
# File 'ext/nokogiri/xml_reader.c', line 305

def lang
  val = LibXML.xmlTextReaderConstXmlLang(cstruct)
  val.null? ? nil : val.read_string
end

#local_nameObject

Get the local name of the node



377
378
379
380
# File 'ext/nokogiri/xml_reader.c', line 377

def local_name
  val = LibXML.xmlTextReaderConstLocalName(cstruct)
  val.null? ? nil : val.read_string
end

#nameObject

Get the name of the node. Returns a utf-8 encoded string.



395
396
397
398
# File 'ext/nokogiri/xml_reader.c', line 395

def name
  val = LibXML.xmlTextReaderConstName(cstruct)
  val.null? ? nil : val.read_string
end

#namespace_uriObject

Get the URI defining the namespace associated with the node



359
360
361
362
# File 'ext/nokogiri/xml_reader.c', line 359

def namespace_uri
  val = LibXML.xmlTextReaderConstNamespaceUri(cstruct)
  val.null? ? nil : val.read_string
end

#namespacesObject

Get a hash of namespaces for this Node



131
132
133
134
135
136
137
138
# File 'ext/nokogiri/xml_reader.c', line 131

def namespaces
  return {} unless attributes?

  ptr = LibXML.xmlTextReaderExpand(cstruct)
  return nil if ptr.null?

  Reader.node_namespaces(ptr)
end

#node_typeObject

Get the type of readers current node



444
445
446
# File 'ext/nokogiri/xml_reader.c', line 444

def node_type
  LibXML.xmlTextReaderNodeType(cstruct)
end

#outer_xmlObject

Read the current node and its contents, including child nodes and markup. Returns a utf-8 encoded string.



517
518
519
520
521
522
523
# File 'ext/nokogiri/xml_reader.c', line 517

def outer_xml
  string_ptr = LibXML.xmlTextReaderReadOuterXml(cstruct)
  return nil if string_ptr.null?
  string = string_ptr.read_string
  LibXML.xmlFree(string_ptr)
  string
end

#prefixObject

Get the shorthand reference to the namespace associated with the node.



341
342
343
344
# File 'ext/nokogiri/xml_reader.c', line 341

def prefix
  val = LibXML.xmlTextReaderConstPrefix(cstruct)
  val.null? ? nil : val.read_string
end

#readObject

Move the Reader forward through the XML document.



457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
# File 'ext/nokogiri/xml_reader.c', line 457

def read
  error_list = self.errors

  LibXML.xmlSetStructuredErrorFunc(nil, SyntaxError.error_array_pusher(error_list))
  ret = LibXML.xmlTextReaderRead(cstruct)
  LibXML.xmlSetStructuredErrorFunc(nil, nil)

  return self if ret == 1
  return nil if ret == 0

  error = LibXML.xmlGetLastError()
  if error
    raise SyntaxError.wrap(error)
  else
    raise RuntimeError, "Error pulling: #{ret}"
  end

  nil
end

#stateObject

Get the state of the reader



431
432
433
# File 'ext/nokogiri/xml_reader.c', line 431

def state
  LibXML.xmlTextReaderReadState(cstruct)
end

#valueObject

Get the text value of the node if present. Returns a utf-8 encoded string.



323
324
325
326
# File 'ext/nokogiri/xml_reader.c', line 323

def value
  val = LibXML.xmlTextReaderConstValue(cstruct)
  val.null? ? nil : val.read_string
end

#value?Boolean

Does this node have a text value?

Returns:

  • (Boolean)


93
94
95
# File 'ext/nokogiri/xml_reader.c', line 93

def value?
  LibXML.xmlTextReaderHasValue(cstruct) == 1
end

#xml_versionObject

Get the XML version of the document being read



287
288
289
290
# File 'ext/nokogiri/xml_reader.c', line 287

def xml_version
  val = LibXML.xmlTextReaderConstXmlVersion(cstruct)
  val.null? ? nil : val.read_string
end