Class: Scrubyt::PreFilterDocument

Inherits:
Object
  • Object
show all
Defined in:
lib/scrubyt/core/scraping/pre_filter_document.rb

Overview

Apply different functions on the input document

Before the document is passed to Hpricot for parsing, we may need to do different stuff with it which are clumsy/not appropriate/impossible to do once the document is loaded.

Class Method Summary collapse

Class Method Details

.br_to_newline(doc) ⇒ Object

Replace <br/> tags with newlines



9
10
11
12
# File 'lib/scrubyt/core/scraping/pre_filter_document.rb', line 9

def self.br_to_newline(doc)
  doc.gsub(/<br[ \/]*>/i, "\r\n")
  doc = doc.tr("\240"," ")
end