Class: Html2rss::AutoSource::Scraper::Schema::Base
- Inherits:
-
Object
- Object
- Html2rss::AutoSource::Scraper::Schema::Base
- Defined in:
- lib/html2rss/auto_source/scraper/schema/base.rb
Overview
Base class for Schema.org schema_objects.
Constant Summary collapse
- DEFAULT_ATTRIBUTES =
%i[id title description url image published_at].freeze
Instance Method Summary collapse
-
#call ⇒ Hash
The scraped article hash with DEFAULT_ATTRIBUTES.
- #description ⇒ Object
- #id ⇒ Object
- #image ⇒ Object
-
#initialize(schema_object, url:) ⇒ Base
constructor
A new instance of Base.
- #published_at ⇒ Object
- #title ⇒ Object
-
#url ⇒ Addressable::URI?
The URL of the schema object.
Constructor Details
#initialize(schema_object, url:) ⇒ Base
Returns a new instance of Base.
16 17 18 19 |
# File 'lib/html2rss/auto_source/scraper/schema/base.rb', line 16 def initialize(schema_object, url:) @schema_object = schema_object @url = url end |
Instance Method Details
#call ⇒ Hash
Returns the scraped article hash with DEFAULT_ATTRIBUTES.
22 23 24 25 26 |
# File 'lib/html2rss/auto_source/scraper/schema/base.rb', line 22 def call DEFAULT_ATTRIBUTES.to_h do |attribute| [attribute, public_send(attribute)] end end |
#description ⇒ Object
31 32 33 34 |
# File 'lib/html2rss/auto_source/scraper/schema/base.rb', line 31 def description [schema_object[:description], schema_object[:schema_object_body], schema_object[:abstract]] .max_by { |desc| desc.to_s.size } end |
#id ⇒ Object
28 |
# File 'lib/html2rss/auto_source/scraper/schema/base.rb', line 28 def id = schema_object[:@id] || url&.path || title.to_s.downcase.gsub(/\s+/, '-') |
#image ⇒ Object
47 |
# File 'lib/html2rss/auto_source/scraper/schema/base.rb', line 47 def image = images.first || nil |
#published_at ⇒ Object
48 |
# File 'lib/html2rss/auto_source/scraper/schema/base.rb', line 48 def published_at = schema_object[:datePublished] |
#title ⇒ Object
29 |
# File 'lib/html2rss/auto_source/scraper/schema/base.rb', line 29 def title = schema_object[:title] |
#url ⇒ Addressable::URI?
Returns the URL of the schema object.
37 38 39 40 41 42 43 44 45 |
# File 'lib/html2rss/auto_source/scraper/schema/base.rb', line 37 def url url = schema_object[:url] if url.to_s.empty? Log.debug("Schema#Base.url: no url in schema_object: #{schema_object.inspect}") return end Utils.build_absolute_url_from_relative(url, @url) end |