Module: Spidr::Body

Included in:
Page
Defined in:
lib/spidr/body.rb

Instance Method Summary collapse

Instance Method Details

#at(*arguments) ⇒ Nokogiri::HTML::Node, ... Also known as: %

Searches for the first occurrence an XPath or CSS Path expression.

Examples:

page.at('//title')

Returns:

  • (Nokogiri::HTML::Node, Nokogiri::XML::Node, nil)

    The first matched node. Returns nil if no nodes could be matched, or if the page is not a HTML or XML document.

See Also:



75
76
77
78
79
# File 'lib/spidr/body.rb', line 75

def at(*arguments)
  if doc
    doc.at(*arguments)
  end
end

#bodyString Also known as: to_s

The body of the response.

Returns:

  • (String)

    The body of the response.



11
12
13
# File 'lib/spidr/body.rb', line 11

def body
  (response.body || '')
end

#docNokogiri::HTML::Document, ...

Returns a parsed document object for HTML, XML, RSS and Atom pages.

Returns:

  • (Nokogiri::HTML::Document, Nokogiri::XML::Document, nil)

    The document that represents HTML or XML pages. Returns nil if the page is neither HTML, XML, RSS, Atom or if the page could not be parsed properly.

See Also:



26
27
28
29
30
31
32
33
34
35
36
37
# File 'lib/spidr/body.rb', line 26

def doc
  unless body.empty?
    begin
      if html?
        @doc ||= Nokogiri::HTML(body, @url.to_s, content_charset)
      elsif (rss? || atom? || xml? || xsl?)
        @doc ||= Nokogiri::XML(body, @url.to_s, content_charset)
      end
    rescue
    end
  end
end

#search(*paths) ⇒ Array Also known as: /

Searches the document for XPath or CSS Path paths.

Examples:

page.search('//a[@href]')

Parameters:

  • paths (Array<String>)

    CSS or XPath expressions to search the document with.

Returns:

  • (Array)

    The matched nodes from the document. Returns an empty Array if no nodes were matched, or if the page is not an HTML or XML document.

See Also:



55
56
57
58
59
60
61
# File 'lib/spidr/body.rb', line 55

def search(*paths)
  if doc
    doc.search(*paths)
  else
    []
  end
end

#titleString

The title of the HTML page.

Returns:

  • (String)

    The inner-text of the title element of the page.



90
91
92
93
94
# File 'lib/spidr/body.rb', line 90

def title
  if (node = at('//title'))
    node.inner_text
  end
end