Class: Bio::PhyloXML::Parser
- Includes:
- LibXML
- Defined in:
- lib/bio/db/phyloxml/phyloxml_parser.rb
Overview
Description
Bio::PhyloXML::Parser is for parsing phyloXML format files.
Requirements
Libxml2 XML parser is required. Install libxml-ruby bindings from libxml.rubyforge.org or
gem install -r libxml-ruby
Usage
require 'bio'
# Create new phyloxml parser
phyloxml = Bio::PhyloXML::Parser.open('example.xml')
# Print the names of all trees in the file
phyloxml.each do |tree|
puts tree.name
end
References
www.phyloxml.org/documentation/version_100/phyloxml.xsd.html
Defined Under Namespace
Classes: ClosedPhyloXMLParser
Instance Attribute Summary collapse
-
#other ⇒ Object
readonly
After parsing all the trees, if there is anything else in other xml format, it is saved in this array of PhyloXML::Other objects.
Class Method Summary collapse
-
.for_io(io, validate = true) ⇒ Object
Initializes LibXML::Reader and reads from the IO until it reaches the first phylogeny element.
-
.open(filename, validate = true) ⇒ Object
Initializes LibXML::Reader and reads the file until it reaches the first phylogeny element.
-
.open_uri(uri, validate = true) ⇒ Object
Initializes LibXML::Reader and reads the file until it reaches the first phylogeny element.
Instance Method Summary collapse
-
#[](i) ⇒ Object
Access the specified tree in the file.
-
#close ⇒ Object
Closes the LibXML::Reader inside the object.
-
#closed? ⇒ Boolean
If the object is closed by using the close method or equivalent, returns true.
-
#each ⇒ Object
Iterate through all trees in the file.
-
#initialize(str, validate = true) ⇒ Parser
constructor
Initializes LibXML::Reader and reads the PhyloXML-formatted string until it reaches the first phylogeny element.
-
#next_tree ⇒ Object
Parse and return the next phylogeny tree.
Constructor Details
#initialize(str, validate = true) ⇒ Parser
Initializes LibXML::Reader and reads the PhyloXML-formatted string until it reaches the first phylogeny element.
Create a new Bio::PhyloXML::Parser object.
str = File.read("./phyloxml_examples.xml")
p = Bio::PhyloXML::Parser.new(str)
Deprecated usage: Reads data from a file. <em>str<em> is a filename.
p = Bio::PhyloXML::Parser.new("./phyloxml_examples.xml")
Taking filename is deprecated. Use Bio::PhyloXML::Parser.open(filename).
Arguments:
-
(required) str: PhyloXML-formatted string
-
(optional) validate: Whether to validate the file against schema or not. Default value is true.
- Returns
-
Bio::PhyloXML::Parser object
318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 |
# File 'lib/bio/db/phyloxml/phyloxml_parser.rb', line 318 def initialize(str, validate=true) @other = [] return unless str # For compatibility, if filename-like string is given, # treat it as a filename. if /[\<\>\r\n]/ !~ str and File.exist?(str) then # assume that str is filename warn "Bio::PhyloXML::Parser.new(filename) is deprecated. Use Bio::PhyloXML::Parser.open(filename)." filename = _secure_filename(str) _validate(:file, filename) if validate @reader = XML::Reader.file(filename) _skip_leader return end # initialize for string @reader = XML::Reader.string(str, { :options => LibXML::XML::Parser::Options::NONET }) _skip_leader end |
Instance Attribute Details
#other ⇒ Object (readonly)
After parsing all the trees, if there is anything else in other xml format, it is saved in this array of PhyloXML::Other objects
76 77 78 |
# File 'lib/bio/db/phyloxml/phyloxml_parser.rb', line 76 def other @other end |
Class Method Details
.for_io(io, validate = true) ⇒ Object
Initializes LibXML::Reader and reads from the IO until it reaches the first phylogeny element.
Create a new Bio::PhyloXML::Parser object.
p = Bio::PhyloXML::Parser.for_io($stdin)
Arguments:
-
(required) io: IO object
-
(optional) validate: For IO reader, the “validate” option is ignored and no validation is executed.
- Returns
-
Bio::PhyloXML::Parser object
218 219 220 221 222 223 224 225 226 227 |
# File 'lib/bio/db/phyloxml/phyloxml_parser.rb', line 218 def self.for_io(io, validate=true) obj = new(nil, validate) obj.instance_eval { @reader = XML::Reader.io(io, { :options => LibXML::XML::Parser::Options::NONET }) _skip_leader } obj end |
.open(filename, validate = true) ⇒ Object
Initializes LibXML::Reader and reads the file until it reaches the first phylogeny element.
Example: Create a new Bio::PhyloXML::Parser object.
p = Bio::PhyloXML::Parser.open("./phyloxml_examples.xml")
If the optional code block is given, Bio::PhyloXML object is passed to the block as an argument. When the block terminates, the Bio::PhyloXML object is automatically closed, and the open method returns the value of the block.
Example: Get the first tree in the file.
tree = Bio::PhyloXML::Parser.open("example.xml") do |px|
px.next_tree
end
Arguments:
-
(required) filename: Path to the file to parse.
-
(optional) validate: Whether to validate the file against schema or not. Default value is true.
- Returns
-
(without block) Bio::PhyloXML::Parser object
- Returns
-
(with block) the value of the block
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
# File 'lib/bio/db/phyloxml/phyloxml_parser.rb', line 102 def self.open(filename, validate=true) obj = new(nil, validate) obj.instance_eval { filename = _secure_filename(filename) _validate(:file, filename) if validate # XML::Parser::Options::NONET for security reason @reader = XML::Reader.file(filename, { :options => LibXML::XML::Parser::Options::NONET }) _skip_leader } if block_given? then begin ret = yield obj ensure obj.close if obj and !obj.closed? end ret else obj end end |
.open_uri(uri, validate = true) ⇒ Object
Initializes LibXML::Reader and reads the file until it reaches the first phylogeny element.
Create a new Bio::PhyloXML::Parser object.
p = Bio::PhyloXML::Parser.open_uri("http://www.phyloxml.org/examples/apaf.xml")
If the optional code block is given, Bio::PhyloXML object is passed to the block as an argument. When the block terminates, the Bio::PhyloXML object is automatically closed, and the open_uri method returns the value of the block.
Arguments:
-
(required) uri: (URI or String) URI to the data to parse
-
(optional) validate: For URI reader, the “validate” option is ignored and no validation is executed.
- Returns
-
(without block) Bio::PhyloXML::Parser object
- Returns
-
(with block) the value of the block
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
# File 'lib/bio/db/phyloxml/phyloxml_parser.rb', line 143 def self.open_uri(uri, validate=true) case uri when URI uri = uri.to_s else # raises error if not a String uri = uri.to_str # raises error if invalid URI URI.parse(uri) end obj = new(nil, validate) obj.instance_eval { @reader = XML::Reader.file(uri) _skip_leader } if block_given? then begin ret = yield obj ensure obj.close if obj and !obj.closed? end else obj end end |
Instance Method Details
#[](i) ⇒ Object
364 365 366 367 368 369 370 |
# File 'lib/bio/db/phyloxml/phyloxml_parser.rb', line 364 def [](i) tree = nil (i+1).times do tree = self.next_tree end return tree end |
#close ⇒ Object
Closes the LibXML::Reader inside the object. It also closes the opened file if it is created by using Bio::PhyloXML::Parser.open method.
When closed object is closed again, or closed object is used, it raises LibXML::XML::Error.
- Returns
-
nil
188 189 190 191 192 |
# File 'lib/bio/db/phyloxml/phyloxml_parser.rb', line 188 def close @reader.close @reader = ClosedPhyloXMLParser.new nil end |
#closed? ⇒ Boolean
If the object is closed by using the close method or equivalent, returns true. Otherwise, returns false.
- Returns
-
true or false
198 199 200 201 202 203 204 |
# File 'lib/bio/db/phyloxml/phyloxml_parser.rb', line 198 def closed? if @reader.kind_of?(ClosedPhyloXMLParser) then true else false end end |
#each ⇒ Object
351 352 353 354 355 |
# File 'lib/bio/db/phyloxml/phyloxml_parser.rb', line 351 def each while tree = next_tree yield tree end end |
#next_tree ⇒ Object
Parse and return the next phylogeny tree. If there are no more phylogeny element, nil is returned. If there is something else besides phylogeny elements, it is saved in the PhyloXML::Parser#other.
p = Bio::PhyloXML::Parser.open("./phyloxml_examples.xml")
tree = p.next_tree
- Returns
-
Bio::PhyloXML::Tree
381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 |
# File 'lib/bio/db/phyloxml/phyloxml_parser.rb', line 381 def next_tree() if not is_element?('phylogeny') if @reader.node_type == XML::Reader::TYPE_END_ELEMENT if is_end_element?('phyloxml') return nil else @reader.read @reader.read if is_end_element?('phyloxml') return nil end end end # phyloxml can hold only phylogeny and "other" elements. If this is not # phylogeny element then it is other. Also, "other" always comes after # all phylogenies @other << parse_other #return nil for tree, since this is not valid phyloxml tree. return nil end tree = Bio::PhyloXML::Tree.new # keep track of current node in clades array/stack. Current node is the # last element in the clades array clades = [] clades.push tree #keep track of current edge to be able to parse branch_length tag current_edge = nil # we are going to parse clade iteratively by pointing (and changing) to # the current node in the tree. Since the property element is both in # clade and in the phylogeny, we need some boolean to know if we are # parsing the clade (there can be only max 1 clade in phylogeny) or # parsing phylogeny parsing_clade = false while not is_end_element?('phylogeny') do break if is_end_element?('phyloxml') # parse phylogeny elements, except clade if not parsing_clade if is_element?('phylogeny') @reader["rooted"] == "true" ? tree.rooted = true : tree.rooted = false @reader["rerootable"] == "true" ? tree.rerootable = true : tree.rerootable = false parse_attributes(tree, ["branch_length_unit", 'type']) end parse_simple_elements(tree, [ "name", 'description', "date"]) if is_element?('confidence') tree.confidences << parse_confidence end end if @reader.node_type == XML::Reader::TYPE_ELEMENT case @reader.name when 'clade' #parse clade element parsing_clade = true node= Bio::PhyloXML::Node.new branch_length = @reader['branch_length'] parse_attributes(node, ["id_source"]) #add new node to the tree tree.add_node(node) # The first clade will always be root since by xsd schema phyloxml can # have 0 to 1 clades in it. if tree.root == nil tree.root = node else current_edge = tree.add_edge(clades[-1], node, Bio::Tree::Edge.new(branch_length)) end clades.push node #end if clade element else parse_clade_elements(clades[-1], current_edge) if parsing_clade end end #end clade element, go one parent up if is_end_element?('clade') #if we have reached the closing tag of the top-most clade, then our # curent node should point to the root, If thats the case, we are done # parsing the clade element if clades[-1] == tree.root parsing_clade = false else # set current node (clades[-1) to the previous clade in the array clades.pop end end #parsing phylogeny elements if not parsing_clade if @reader.node_type == XML::Reader::TYPE_ELEMENT case @reader.name when 'property' tree.properties << parse_property when 'clade_relation' clade_relation = CladeRelation.new parse_attributes(clade_relation, ["id_ref_0", "id_ref_1", "distance", "type"]) #@ add unit test for this if not @reader.empty_element? @reader.read if is_element?('confidence') clade_relation.confidence = parse_confidence end end tree.clade_relations << clade_relation when 'sequence_relation' sequence_relation = SequenceRelation.new parse_attributes(sequence_relation, ["id_ref_0", "id_ref_1", "distance", "type"]) if not @reader.empty_element? @reader.read if is_element?('confidence') sequence_relation.confidence = parse_confidence end end tree.sequence_relations << sequence_relation when 'phylogeny' #do nothing else tree.other << parse_other #puts "Not recognized element. #{@reader.name}" end end end # go to next element @reader.read end #end while not </phylogeny> #move on to the next tag after /phylogeny which is text, since phylogeny #end tag is empty element, which value is nil, therefore need to move to #the next meaningful element (therefore @reader.read twice) @reader.read @reader.read return tree end |