Class: Html2rss::ItemExtractors::Text

Inherits:
Object
  • Object
show all
Defined in:
lib/html2rss/item_extractors/text.rb

Overview

Return the text content of the attribute. This is the default extractor used, when no extractor is explicitly given.

Example HTML structure:

<p>Lorem <b>ipsum</b> dolor ...</p>

YAML usage example:

selectors:
  description:
    selector: p
    extractor: text

Would return:

'Lorem ipsum dolor ...'

Defined Under Namespace

Classes: Options

Instance Method Summary collapse

Constructor Details

#initialize(xml, options) ⇒ Text

Initializes the Text extractor.

Parameters:

  • xml (Nokogiri::XML::Element)
  • options (Options)


31
32
33
# File 'lib/html2rss/item_extractors/text.rb', line 31

def initialize(xml, options)
  @element = ItemExtractors.element(xml, options.selector)
end

Instance Method Details

#getString

Retrieves and returns the text content of the element.

Returns:

  • (String)

    The text content.



39
40
41
# File 'lib/html2rss/item_extractors/text.rb', line 39

def get
  @element.text.to_s.strip.gsub(/\s+/, ' ')
end