Class: Html2rss::AutoSource::Scraper::Schema::Thing

Inherits:
Object
  • Object
show all
Defined in:
lib/html2rss/auto_source/scraper/schema/thing.rb

Overview

A Thing is kind of the ‘base class’ for Schema.org schema_objects.

Direct Known Subclasses

ItemList, ListItem

Constant Summary collapse

SUPPORTED_TYPES =
%w[
  AdvertiserContentArticle
  AnalysisNewsArticle
  APIReference
  Article
  AskPublicNewsArticle
  BackgroundNewsArticle
  BlogPosting
  DiscussionForumPosting
  LiveBlogPosting
  NewsArticle
  OpinionNewsArticle
  Report
  ReportageNewsArticle
  ReviewNewsArticle
  SatiricalArticle
  ScholarlyArticle
  SocialMediaPosting
  TechArticle
].to_set.freeze
DEFAULT_ATTRIBUTES =
%i[id title description url image published_at].freeze

Instance Method Summary collapse

Constructor Details

#initialize(schema_object, url:) ⇒ Thing

Returns a new instance of Thing.

[View source]

37
38
39
40
# File 'lib/html2rss/auto_source/scraper/schema/thing.rb', line 37

def initialize(schema_object, url:)
  @schema_object = schema_object
  @url = url
end

Instance Method Details

#callHash

Returns the scraped article hash with DEFAULT_ATTRIBUTES.

Returns:

  • (Hash)

    the scraped article hash with DEFAULT_ATTRIBUTES

[View source]

43
44
45
46
47
# File 'lib/html2rss/auto_source/scraper/schema/thing.rb', line 43

def call
  DEFAULT_ATTRIBUTES.to_h do |attribute|
    [attribute, public_send(attribute)]
  end
end

#descriptionObject

[View source]

61
62
63
64
# File 'lib/html2rss/auto_source/scraper/schema/thing.rb', line 61

def description
  schema_object.values_at(:description, :schema_object_body, :abstract)
               .max_by { |string| string.to_s.size }
end

#idObject

[View source]

49
50
51
52
53
54
55
56
57
# File 'lib/html2rss/auto_source/scraper/schema/thing.rb', line 49

def id
  return @id if defined?(@id)

  id = (schema_object[:@id] || url&.path).to_s

  return if id.empty?

  @id = id
end

#imageObject

[View source]

77
78
79
80
81
# File 'lib/html2rss/auto_source/scraper/schema/thing.rb', line 77

def image
  if (image_url = image_urls.first)
    Utils.build_absolute_url_from_relative(image_url, @url)
  end
end

#published_atObject

[View source]

83
# File 'lib/html2rss/auto_source/scraper/schema/thing.rb', line 83

def published_at = schema_object[:datePublished]

#titleObject

[View source]

59
# File 'lib/html2rss/auto_source/scraper/schema/thing.rb', line 59

def title = schema_object[:title]

#urlAddressable::URI?

Returns the URL of the schema object.

Returns:

  • (Addressable::URI, nil)

    the URL of the schema object

[View source]

67
68
69
70
71
72
73
74
75
# File 'lib/html2rss/auto_source/scraper/schema/thing.rb', line 67

def url
  url = schema_object[:url]
  if url.to_s.empty?
    Log.debug("Schema#Thing.url: no url in schema_object: #{schema_object.inspect}")
    return
  end

  Utils.build_absolute_url_from_relative(url, @url)
end