Version: 1.06 12. September, 2003
This is a Ruby library for building trees representing HTML structure.
See the file INSTALL for installation instructions.
Copyright (C) 2003, Johannes Brodwall [email protected] Copyright (C) 2002, Ned Konz [email protected]
License: Ruby's
See http://rubyforge.org/projects/ruby-htmltools for the most recent version.
This project includes SGML-parser, ported from Python by Takahiro Maebashi [email protected] (see: http://www.jin.gr.jp/~nahi/Ruby/html-parser/README.html)
=============
PREREQUISITES
Ruby 1.8
Test::Unit
The tests run using Test::Unit. Test::Unit is part of the standard Ruby install as of 1.8
REXML
XPath support requires REXML. REXML is part of the standard Ruby install as of 1.8
===========
CHANGES
Changes from 1.09:
- Some minor bugfixes
- SGMLParser.src_range makes it very easy to write applications which parse HTML files into components and manipulate the corresponding source code without altering it. (by Philip Dorrell)
Changes from 1.08:
- Fixed xpath script and added tests
- Fixed bug #681 (xhtml)
- Added GemSpec
Changes from 1.07:
- Fixed tc_xpath test_match_all after it was broken by upgrade of REXML.
- Refactored utility code for printing node paths into rexml-nodepath.rb
Changes from 1.06:
- Included stuff that I had forgot to package into the tarball.
Changes from 1.05:
- Updated everything to work with Ruby 1.8.
Changes from 1.04:
-
Made sure that unknown entities and characters are not discarded, in both html/tree.rb and html/xmltree.rb
-
Added handling of DOCTYPE to html/xmltree.rb
Changes from 1.03:
-
Added HTMLTree::XMLParser, which makes a REXML document from the given HTML.
-
Changed HTMLTree::Element::print_on() to write()
-
Made it so that a string or IO can be passed to HTMLTree::Element::dump()
-
Made it so that a string or IO can be passed to HTMLTree::Element::write()
Changes from 1.02:
-
added XPath and XML conversion (needs REXML)
-
Wrapped all code in namespaces. The following class names have changed:
-- in html/element.rb HTMLDocument => HTMLTree::Document HTMLElement => HTMLTree::Element HTMLData => HTMLTree::Data HTMLComment => HTMLTree::Comment HTMLSpecial => HTMLTree::Special
-- in html/tags.rb HTMLTag => HTML::Tag HTMLBlockTag => HTML::BlockTag HTMLInlineTag => HTML::InlineTag HTMLBlockOrInlineTag => HTML::BlockOrInlineTag HTMLEmptyTag => HTML::EmptyTag
-- in html/tree.rb HTMLTreeParser => HTMLTree::Parser
-- in html/stparser.rb StackingParser => HTML::StackingParser
-
added HTMLTree::Element.root()
Changes from 1.01:
-
documented change to sgml-parser.
-
added bin/ebaySearch.rb example
Changes from 1.0:
-
attributes now maintain their order. Though this probably isn't strictly necessary under HTML, it may make it easier to compare document versions.
-
the generated tree now has a top-level node for the document itself, so the DTD can be stored. THIS WILL REQUIRE CODE CHANGES if you have code that assumes that the root node is always . To find the
node, you can use the new methods HTMLTreeParser#html() or HTMLDocument#html_node():html = parser.html()Or, querying the tree:
html = parser.tree.html_node() -
comments are stored in the tree
-
added HTMLElement#print_on() to print a (sub)tree to an IO stream
vim: ts=2 sw=2 et