Class: Nokogiri::XML::Document
- Defined in:
- lib/nokogiri/xml/document.rb,
lib/nokogiri/ffi/xml/document.rb,
ext/nokogiri/xml_document.c,
ext/nokogiri/html_document.c
Overview
Nokogiri::XML::Document is the main entry point for dealing with XML documents. The Document is created by parsing an XML document. See Nokogiri.XML()
For searching a Document, see Nokogiri::XML::Node#css and Nokogiri::XML::Node#xpath
Direct Known Subclasses
Constant Summary
Constants inherited from Node
Node::ATTRIBUTE_DECL, Node::ATTRIBUTE_NODE, Node::CDATA_SECTION_NODE, Node::COMMENT_NODE, Node::DOCB_DOCUMENT_NODE, Node::DOCUMENT_FRAG_NODE, Node::DOCUMENT_NODE, Node::DOCUMENT_TYPE_NODE, Node::DTD_NODE, Node::ELEMENT_DECL, Node::ELEMENT_NODE, Node::ENTITY_DECL, Node::ENTITY_NODE, Node::ENTITY_REF_NODE, Node::HTML_DOCUMENT_NODE, Node::NAMESPACE_DECL, Node::NOTATION_NODE, Node::PI_NODE, Node::TEXT_NODE, Node::XINCLUDE_END, Node::XINCLUDE_START
Instance Attribute Summary collapse
-
#cstruct ⇒ Object
Returns the value of attribute cstruct.
-
#errors ⇒ Object
A list of Nokogiri::XML::SyntaxError found when parsing a document.
Class Method Summary collapse
-
.new(version = default) ⇒ Object
Create a new document with
version
(defaults to “1.0”). -
.parse(string_or_io, url = nil, encoding = nil, options = ParseOptions::DEFAULT_XML) {|options| ... } ⇒ Object
Parse an XML file.
-
.read_io(io, url, encoding, options) ⇒ Object
Create a new document from an IO object.
-
.read_memory(string, url, encoding, options) ⇒ Object
Create a new document from a String.
- .recursively_remove_namespaces_from_node(node) ⇒ Object
- .wrap(doc_struct) ⇒ Object
- .wrap_with_error_handling(&block) ⇒ Object
Instance Method Summary collapse
- #add_child(child) ⇒ Object (also: #<<)
- #clone ⇒ Object
-
#collect_namespaces ⇒ Object
Recursively get all namespaces from this node and its subtree and return them as a hash.
-
#create_cdata(text) ⇒ Object
Create a CDATA element containing
text
. -
#create_element(name, *args, &block) ⇒ Object
Create an element with
name
, and optionally setting the content and attributes. -
#create_entity(name, type, external_id, system_id, content) ⇒ Object
Create a new entity named
name
. -
#create_text_node(text, &block) ⇒ Object
Create a text node with
text
. -
#decorate(node) ⇒ Object
Apply any decorators to
node
. -
#decorators(key) ⇒ Object
Get the list of decorators given
key
. -
#document ⇒ Object
A reference to
self
. -
#dup ⇒ Object
Copy this Document.
-
#encoding ⇒ Object
Get the encoding for this Document.
-
#encoding=(encoding) ⇒ Object
Set the encoding string for this Document.
-
#fragment(tags = nil) ⇒ Object
Create a Nokogiri::XML::DocumentFragment from
tags
Returns an empty fragment iftags
is nil. -
#initialize(*args) ⇒ Document
constructor
:nodoc:.
-
#name ⇒ Object
The name of this document.
-
#namespaces ⇒ Object
Get the hash of namespaces on the root Nokogiri::XML::Node.
-
#remove_namespaces! ⇒ Object
Remove all namespaces from all nodes in the document.
-
#root ⇒ Object
Get the root node for this document.
-
#root= ⇒ Object
Set the root element on this document.
-
#slop! ⇒ Object
Explore a document with shortcut methods.
-
#url ⇒ Object
Get the url name for this document.
-
#validate ⇒ Object
Validate this Document against it’s DTD.
-
#version ⇒ Object
Get the XML version for this Document.
Methods inherited from Node
#<=>, #==, #>, #[], #[]=, #accept, #add_namespace, #add_namespace_definition, #add_next_sibling, #add_previous_sibling, #after, #ancestors, #at, #at_css, #at_xpath, #attribute, #attribute_nodes, #attribute_with_ns, #attributes, #before, #blank?, #cdata?, #child, #children, #children=, #comment?, #content, #content=, #create_external_subset, #create_internal_subset, #css, #css_path, #decorate!, #default_namespace=, #description, #each, #element?, #element_children, #elements, #encode_special_chars, #external_subset, #first_element_child, #fragment?, #has_attribute?, #html?, #inner_html, #inner_html=, #inner_text, #internal_subset, #key?, #keys, #last_element_child, #line, #matches?, #name=, #namespace, #namespace=, #namespace_definitions, #namespace_scopes, #namespaced_key?, #next, #next_element, #next_sibling, #node_name, #node_name=, node_properties, #node_type, #parent, #parent=, #parse, #path, #pointer_id, #previous, #previous_element, #previous_sibling, #read_only?, #remove, #remove_attribute, #replace, #search, #serialize, #set_attribute, #swap, #text, #text?, #to_html, #to_s, #to_xhtml, #traverse, #type, #unlink, #values, #write_html_to, #write_to, #write_xhtml_to, #write_xml_to, #xml?, #xpath
Methods included from PP::Node
Constructor Details
#initialize(*args) ⇒ Document
:nodoc:
39 40 41 42 |
# File 'lib/nokogiri/xml/document.rb', line 39 def initialize *args # :nodoc: @errors = [] @decorators = nil end |
Instance Attribute Details
#cstruct ⇒ Object
Returns the value of attribute cstruct.
6 7 8 |
# File 'lib/nokogiri/ffi/xml/document.rb', line 6 def cstruct @cstruct end |
#errors ⇒ Object
A list of Nokogiri::XML::SyntaxError found when parsing a document
37 38 39 |
# File 'lib/nokogiri/xml/document.rb', line 37 def errors @errors end |
Class Method Details
.new(version = default) ⇒ Object
Create a new document with version
(defaults to “1.0”)
305 306 307 308 309 310 |
# File 'ext/nokogiri/xml_document.c', line 305 def new(*args) version = args.first || "1.0" doc = wrap(LibXML.xmlNewDoc(version)) doc.send :initialize, *args doc end |
.parse(string_or_io, url = nil, encoding = nil, options = ParseOptions::DEFAULT_XML) {|options| ... } ⇒ Object
Parse an XML file. thing
may be a String, or any object that responds to read and close such as an IO, or StringIO. url
is resource where this document is located. encoding
is the encoding that should be used when processing the document. options
is a number that sets options in the parser, such as Nokogiri::XML::ParseOptions::RECOVER. See the constants in Nokogiri::XML::ParseOptions.
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
# File 'lib/nokogiri/xml/document.rb', line 19 def self.parse string_or_io, url = nil, encoding = nil, = ParseOptions::DEFAULT_XML, &block = Nokogiri::XML::ParseOptions.new() if Fixnum === # Give the options to the user yield if block_given? if string_or_io.respond_to?(:read) url ||= string_or_io.respond_to?(:path) ? string_or_io.path : nil return read_io(string_or_io, url, encoding, .to_i) end # read_memory pukes on empty docs return new if string_or_io.nil? or string_or_io.empty? read_memory(string_or_io, url, encoding, .to_i) end |
.read_io(io, url, encoding, options) ⇒ Object
Create a new document from an IO object
187 188 189 190 191 |
# File 'ext/nokogiri/xml_document.c', line 187 def self.read_io io, url, encoding, wrap_with_error_handling do LibXML.xmlReadIO(IoCallbacks.reader(io), nil, nil, url, encoding, ) end end |
.read_memory(string, url, encoding, options) ⇒ Object
Create a new document from a String
237 238 239 240 241 |
# File 'ext/nokogiri/xml_document.c', line 237 def self.read_memory(string, url, encoding, ) wrap_with_error_handling do LibXML.xmlReadMemory(string, string.length, url, encoding, ) end end |
.recursively_remove_namespaces_from_node(node) ⇒ Object
151 152 153 154 155 156 157 158 159 160 |
# File 'lib/nokogiri/ffi/xml/document.rb', line 151 def recursively_remove_namespaces_from_node(node) node.cstruct[:ns] = nil node.children.each do |child| recursively_remove_namespaces_from_node(child) end unless node.cstruct[:nsDef].nil? LibXML.xmlFreeNsList(node.cstruct[:nsDef]) node.cstruct[:nsDef] = nil end end |
.wrap(doc_struct) ⇒ Object
108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
# File 'lib/nokogiri/ffi/xml/document.rb', line 108 def wrap doc_struct if doc_struct.is_a?(FFI::Pointer) # cast native pointers up into a doc cstruct return nil if doc_struct.null? doc_struct = LibXML::XmlDocument.new(doc_struct) end doc = self.allocate doc.cstruct = doc_struct doc.cstruct.ruby_doc = doc doc.instance_eval { @decorators = nil; @node_cache = [] } doc.send :initialize doc end |
.wrap_with_error_handling(&block) ⇒ Object
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
# File 'lib/nokogiri/ffi/xml/document.rb', line 127 def wrap_with_error_handling(&block) error_list = [] LibXML.xmlInitParser() LibXML.xmlResetLastError() LibXML.xmlSetStructuredErrorFunc(nil, SyntaxError.error_array_pusher(error_list)) ptr = yield LibXML.xmlSetStructuredErrorFunc(nil, nil) if ptr.null? error = LibXML.xmlGetLastError() if error raise SyntaxError.wrap(error) else raise RuntimeError, "Could not parse document" end end document = wrap(ptr) document.errors = error_list return document end |
Instance Method Details
#add_child(child) ⇒ Object Also known as: <<
197 198 199 200 201 202 203 204 205 |
# File 'lib/nokogiri/xml/document.rb', line 197 def add_child child raise "Document already has a root node" if root if child.type == Node::DOCUMENT_FRAG_NODE raise "Document cannot have multiple root nodes" if child.children.size > 1 super(child.children.first) else super end end |
#clone ⇒ Object
179 |
# File 'lib/nokogiri/xml/document.rb', line 179 alias :clone :dup |
#collect_namespaces ⇒ Object
Recursively get all namespaces from this node and its subtree and return them as a hash.
For example, given this document:
<root xmlns:foo="bar">
<bar xmlns:hello="world" />
</root>
This method will return:
{ 'xmlns:foo' => 'bar', 'xmlns:hello' => 'world' }
WARNING: this method will clobber duplicate names in the keys. For example, given this document:
<root xmlns:foo="bar">
<bar xmlns:foo="baz" />
</root>
The hash returned will look like this: { ‘xmlns:foo’ => ‘bar’ }
Non-prefixed default namespaces (as in “xmlns=”) are not included in the hash.
Note this is a very expensive operation in current implementation, as it traverses the entire graph, and also has to bring each node accross the libxml bridge into a ruby object.
123 124 125 126 127 |
# File 'lib/nokogiri/xml/document.rb', line 123 def collect_namespaces ns = {} traverse { |j| ns.merge!(j.namespaces) } ns end |
#create_cdata(text) ⇒ Object
Create a CDATA element containing text
80 81 82 |
# File 'lib/nokogiri/xml/document.rb', line 80 def create_cdata text Nokogiri::XML::CDATA.new(self, text.to_s) end |
#create_element(name, *args, &block) ⇒ Object
Create an element with name
, and optionally setting the content and attributes.
doc.create_element "div" # <div></div>
doc.create_element "div", :class => "container" # <div class='container'></div>
doc.create_element "div", "contents" # <div>contents</div>
doc.create_element "div", "contents", :class => "container" # <div class='container'>contents</div>
doc.create_element "div" { |node| node['class'] = "container" } # <div class='container'></div>
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
# File 'lib/nokogiri/xml/document.rb', line 53 def create_element name, *args, &block elm = Nokogiri::XML::Element.new(name, self, &block) args.each do |arg| case arg when Hash arg.each { |k,v| key = k.to_s if key =~ /^xmlns(:\w+)?$/ ns_name = key.split(":", 2)[1] elm.add_namespace_definition ns_name, v next end elm[k.to_s] = v.to_s } else elm.content = arg end end elm end |
#create_entity(name, type, external_id, system_id, content) ⇒ Object
Create a new entity named name
.
type
is an integer representing the type of entity to be created, and it defaults to Nokogiri::XML::EntityDecl::INTERNAL_GENERAL. See the constants on Nokogiri::XML::EntityDecl for more information.
external_id
, system_id
, and content
set the External ID, System ID, and content respectively. All of these parameters are optional.
377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 |
# File 'ext/nokogiri/xml_document.c', line 377 def create_entity(name, entity_type=Nokogiri::XML::EntityDecl::INTERNAL_GENERAL, external_id=nil, system_id=nil, content=nil) LibXML.xmlResetLastError() ptr = LibXML.xmlAddDocEntity(cstruct, name, entity_type, external_id, system_id, content) if ptr.null? error = LibXML.xmlGetLastError() if error raise SyntaxError.wrap(error) else raise RuntimeError, "Could not create entity" end end Node.wrap(LibXML::XmlEntity.new(ptr)) end |
#create_text_node(text, &block) ⇒ Object
Create a text node with text
75 76 77 |
# File 'lib/nokogiri/xml/document.rb', line 75 def create_text_node text, &block Nokogiri::XML::Text.new(text.to_s, self, &block) end |
#decorate(node) ⇒ Object
Apply any decorators to node
170 171 172 173 174 175 176 |
# File 'lib/nokogiri/xml/document.rb', line 170 def decorate node return unless @decorators @decorators.each { |klass,list| next unless node.is_a?(klass) list.each { |moodule| node.extend(moodule) } } end |
#decorators(key) ⇒ Object
Get the list of decorators given key
130 131 132 133 |
# File 'lib/nokogiri/xml/document.rb', line 130 def decorators key @decorators ||= Hash.new @decorators[key] ||= [] end |
#document ⇒ Object
A reference to self
90 91 92 |
# File 'lib/nokogiri/xml/document.rb', line 90 def document self end |
#dup ⇒ Object
Copy this Document. An optional depth may be passed in, but it defaults to a deep copy. 0 is a shallow copy, 1 is a deep copy.
282 283 284 285 286 287 288 289 290 291 |
# File 'ext/nokogiri/xml_document.c', line 282 def dup deep = 1 dup_ptr = LibXML.xmlCopyDoc(cstruct, deep) return nil if dup_ptr.null? # xmlCopyDoc does not preserve document type. wtf? cstruct = LibXML::XmlDocumentCast.new(dup_ptr) cstruct[:type] = self.type self.class.wrap(dup_ptr) end |
#encoding ⇒ Object
Get the encoding for this Document
157 158 159 160 |
# File 'ext/nokogiri/xml_document.c', line 157 def encoding ptr = cstruct[:encoding] ptr.null? ? nil : ptr.read_string end |
#encoding=(encoding) ⇒ Object
Set the encoding string for this Document
141 142 143 144 |
# File 'ext/nokogiri/xml_document.c', line 141 def encoding= encoding # TODO: if :encoding is already set, then it's probably getting leaked. cstruct[:encoding] = LibXML.xmlStrdup(encoding) end |
#fragment(tags = nil) ⇒ Object
Create a Nokogiri::XML::DocumentFragment from tags
Returns an empty fragment if tags
is nil.
189 190 191 |
# File 'lib/nokogiri/xml/document.rb', line 189 def fragment = nil DocumentFragment.new(self, , self.root) end |
#name ⇒ Object
The name of this document. Always returns “document”
85 86 87 |
# File 'lib/nokogiri/xml/document.rb', line 85 def name 'document' end |
#namespaces ⇒ Object
Get the hash of namespaces on the root Nokogiri::XML::Node
182 183 184 |
# File 'lib/nokogiri/xml/document.rb', line 182 def namespaces root ? root.namespaces : {} end |
#remove_namespaces! ⇒ Object
Remove all namespaces from all nodes in the document.
This could be useful for developers who either don’t understand namespaces or don’t care about them.
The following example shows a use case, and you can decide for yourself whether this is a good thing or not:
doc = Nokogiri::XML <<-EOXML
<root>
<car xmlns:part="http://general-motors.com/">
<part:tire>Michelin Model XGV</part:tire>
</car>
<bicycle xmlns:part="http://schwinn.com/">
<part:tire>I'm a bicycle tire!</part:tire>
</bicycle>
</root>
EOXML
doc.xpath("//tire").to_s # => ""
doc.xpath("//part:tire", "part" => "http://general-motors.com/").to_s # => "<part:tire>Michelin Model XGV</part:tire>"
doc.xpath("//part:tire", "part" => "http://schwinn.com/").to_s # => "<part:tire>I'm a bicycle tire!</part:tire>"
doc.remove_namespaces!
doc.xpath("//tire").to_s # => "<tire>Michelin Model XGV</tire><tire>I'm a bicycle tire!</tire>"
doc.xpath("//part:tire", "part" => "http://general-motors.com/").to_s # => ""
doc.xpath("//part:tire", "part" => "http://schwinn.com/").to_s # => ""
For more information on why this probably is not a good thing in general, please direct your browser to tenderlovemaking.com/2009/04/23/namespaces-in-xml/
357 358 359 |
# File 'ext/nokogiri/xml_document.c', line 357 def remove_namespaces! self.class.recursively_remove_namespaces_from_node(root) end |
#root ⇒ Object
Get the root node for this document.
122 123 124 125 |
# File 'ext/nokogiri/xml_document.c', line 122 def root ptr = LibXML.xmlDocGetRootElement(cstruct) ptr.null? ? nil : Node.wrap(LibXML::XmlNode.new(ptr)) end |
#root= ⇒ Object
Set the root element on this document
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
# File 'ext/nokogiri/xml_document.c', line 78 def root= new_root old_root = nil if new_root.nil? old_root_ptr = LibXML.xmlDocGetRootElement(cstruct) if (! old_root_ptr.null?) old_root = Node.wrap(old_root_ptr) LibXML.xmlUnlinkNode(old_root.cstruct) old_root.cstruct.keep_reference_from_document! end return new_root end if new_root.cstruct[:doc] != cstruct[:doc] old_root_ptr = LibXML.xmlDocGetRootElement(cstruct) new_root_ptr = LibXML.xmlDocCopyNode(new_root.cstruct, cstruct, 1) raise RuntimeError "Could not reparent node (xmlDocCopyNode)" if new_root_ptr.null? new_root = Node.wrap(new_root_ptr) end LibXML.xmlDocSetRootElement(cstruct, new_root.cstruct) if old_root_ptr && ! old_root_ptr.null? LibXML::XmlNode.new(old_root_ptr).keep_reference_from_document! end new_root end |
#slop! ⇒ Object
Explore a document with shortcut methods. See Nokogiri::Slop for details.
Note that any nodes that have been instantiated before #slop! is called will not be decorated with sloppy behavior. So, if you’re in irb, the preferred idiom is:
irb> doc = Nokogiri::Slop my_markup
and not
irb> doc = Nokogiri::HTML my_markup
... followed by irb's implicit inspect (and therefore instantiation of every node) ...
irb> doc.slop!
... which does absolutely nothing.
159 160 161 162 163 164 165 166 |
# File 'lib/nokogiri/xml/document.rb', line 159 def slop! unless decorators(XML::Node).include? Nokogiri::Decorators::Slop decorators(XML::Node) << Nokogiri::Decorators::Slop decorate! end self end |
#url ⇒ Object
Get the url name for this document.
62 63 64 |
# File 'ext/nokogiri/xml_document.c', line 62 def url cstruct[:URL] end |
#validate ⇒ Object
Validate this Document against it’s DTD. Returns a list of errors on the document or nil
when there is no DTD.
138 139 140 141 |
# File 'lib/nokogiri/xml/document.rb', line 138 def validate return nil unless internal_subset internal_subset.validate self end |
#version ⇒ Object
Get the XML version for this Document
172 173 174 |
# File 'ext/nokogiri/xml_document.c', line 172 def version cstruct[:version] end |