Module: Nekohtml

Defined in:
lib/nekohtml.rb,
lib/nekohtml/html_document.rb

Defined Under Namespace

Classes: HtmlDocument, HtmlNode, HtmlNodeList, HtmlThing

Class Method Summary collapse

Class Method Details

.parse(string) ⇒ Object

Parse the string. case_sensitive controls whether you can use lower-case xpath elements for tag names or not. case_sensitive=true uses the default NekoHTML parser, which forces everything to be upper case per HTML 4.01. This is a pain.



18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# File 'lib/nekohtml.rb', line 18

def parse(string)
  if string
    jparser = parser()

    jinput_reader = java.io.StringReader.new(string.to_java_string)
    jinput_source = org.xml.sax.InputSource.new(jinput_reader)
    jparser.parse(jinput_source)
    jdocument = jparser.get_document()
    # We know that the document has successfully been parsed 
    # at this point.

    return HtmlDocument.new(jdocument)
  else
    raise ArgumentError.new
  end
end

.parserObject



7
8
9
10
11
12
13
# File 'lib/nekohtml.rb', line 7

def parser()
  configuration = org.cyberneko.html.HTMLConfiguration.new
  jparser = org.apache.xerces.parsers.DOMParser.new(configuration)
  jparser.setProperty("http://cyberneko.org/html/properties/names/elems", "lower");
  jparser.setFeature("http://xml.org/sax/features/namespaces", false)
  return jparser
end