Class: Html2rss::AutoSource::Scraper::WordpressApi
- Inherits:
-
Object
- Object
- Html2rss::AutoSource::Scraper::WordpressApi
- Includes:
- Enumerable
- Defined in:
- lib/html2rss/auto_source/scraper/wordpress_api.rb,
lib/html2rss/auto_source/scraper/wordpress_api/page_scope.rb,
lib/html2rss/auto_source/scraper/wordpress_api/posts_endpoint.rb
Overview
Scrapes WordPress sites through their REST API instead of parsing article HTML.
Defined Under Namespace
Classes: PageScope, PostsEndpoint
Constant Summary collapse
- API_LINK_SELECTOR =
'link[rel="https://api.w.org/"][href]'- CANONICAL_LINK_SELECTOR =
'link[rel="canonical"][href]'- POSTS_FIELDS =
%w[id title excerpt content link date categories].freeze
Class Method Summary collapse
-
.articles?(parsed_body) ⇒ Boolean
Whether the page advertises a WordPress REST API endpoint.
- .options_key ⇒ Object
Instance Method Summary collapse
-
#each {|article| ... } ⇒ Enumerator, void
Yields article hashes from the WordPress posts API.
- #initialize(parsed_body, url:, request_session: nil, **_opts) ⇒ void constructor
Constructor Details
#initialize(parsed_body, url:, request_session: nil, **_opts) ⇒ void
33 34 35 36 37 38 |
# File 'lib/html2rss/auto_source/scraper/wordpress_api.rb', line 33 def initialize(parsed_body, url:, request_session: nil, **_opts) @parsed_body = parsed_body @url = Html2rss::Url.from_absolute(url) @request_session = request_session @page_scope = PageScope.from(parsed_body:, url: @url) end |
Class Method Details
.articles?(parsed_body) ⇒ Boolean
Returns whether the page advertises a WordPress REST API endpoint.
21 22 23 24 25 |
# File 'lib/html2rss/auto_source/scraper/wordpress_api.rb', line 21 def self.articles?(parsed_body) return false unless parsed_body !parsed_body.at_css(API_LINK_SELECTOR).nil? end |
.options_key ⇒ Object
16 |
# File 'lib/html2rss/auto_source/scraper/wordpress_api.rb', line 16 def self. = :wordpress_api |
Instance Method Details
#each {|article| ... } ⇒ Enumerator, void
Yields article hashes from the WordPress posts API.
45 46 47 48 49 50 |
# File 'lib/html2rss/auto_source/scraper/wordpress_api.rb', line 45 def each return enum_for(:each) unless block_given? return unless (posts = fetch_posts) posts.filter_map { article_from(_1) }.each { yield(_1) } end |