Class: WebPageParser::GuardianPageParserV1
- Inherits:
-
BaseParser
- Object
- BaseParser
- WebPageParser::GuardianPageParserV1
- Defined in:
- lib/web-page-parser/parsers/guardian_page_parser.rb
Overview
BbcNewsPageParserV1 parses BBC News web pages exactly like the old News Sniffer BbcNewsPage class did. This should only ever be used for backwards compatability with News Sniffer and is never supplied for use by a factory.
Constant Summary collapse
- ICONV =
nil
- TITLE_RE =
ORegexp.new('<meta property="og:title" content="(.*)"', 'i')
- DATE_RE =
ORegexp.new('<meta property="article:published_time" content="(.*)"', 'i')
- CONTENT_RE =
ORegexp.new('article-body-blocks">(.*?)<div id="related"', 'm')
- STRIP_TAGS_RE =
ORegexp.new('</?(a|span|div|img|tr|td|!--|table)[^>]*>','i')
- PARA_RE =
Regexp.new(/<(p|h2)[^>]*>(.*?)<\/\1>/i)
Constants inherited from BaseParser
BaseParser::HTML_ENTITIES_DECODER, BaseParser::KILL_CHARS_RE
Instance Attribute Summary
Attributes inherited from BaseParser
Method Summary
Methods inherited from BaseParser
#content, #date, #decode_entities, #hash, #initialize, #title
Constructor Details
This class inherits a constructor from WebPageParser::BaseParser