Module: Nekohtml
- Defined in:
- lib/nekohtml.rb,
lib/nekohtml/html_document.rb
Defined Under Namespace
Classes: HtmlDocument, HtmlNode, HtmlNodeList, HtmlThing
Class Method Summary collapse
-
.parse(string) ⇒ Object
Parse the string.
- .parser ⇒ Object
Class Method Details
.parse(string) ⇒ Object
Parse the string. case_sensitive controls whether you can use lower-case xpath elements for tag names or not. case_sensitive=true uses the default NekoHTML parser, which forces everything to be upper case per HTML 4.01. This is a pain.
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# File 'lib/nekohtml.rb', line 18 def parse(string) if string jparser = parser() jinput_reader = java.io.StringReader.new(string.to_java_string) jinput_source = org.xml.sax.InputSource.new(jinput_reader) jparser.parse(jinput_source) jdocument = jparser.get_document() # We know that the document has successfully been parsed # at this point. return HtmlDocument.new(jdocument) else raise ArgumentError.new end end |
.parser ⇒ Object
7 8 9 10 11 12 13 |
# File 'lib/nekohtml.rb', line 7 def parser() configuration = org.cyberneko.html.HTMLConfiguration.new jparser = org.apache.xerces.parsers.DOMParser.new(configuration) jparser.setProperty("http://cyberneko.org/html/properties/names/elems", "lower"); jparser.setFeature("http://xml.org/sax/features/namespaces", false) return jparser end |