Class: Jekyll::Embed

Inherits:
Object
  • Object
show all
Defined in:
lib/jekyll/embed.rb,
lib/jekyll/embed/cache.rb,
lib/jekyll/embed/filter.rb,
lib/jekyll/embed/content.rb

Overview

The idea with this class is to find the best safe representation of a link. For a YouTube video it could be the sandboxed iframe. This loads the video and allows you to reproduce it while preventing YT to call home and send data about your users. But other social networks will try to take control of their containers by modifying the page. They resist sandboxing and don’t work correctly. For them, we cleanup unwanted HTML tags such as <script>, and return the HTML, which you can style using CSS. Twitter does this.

Others are only available through OGP, so we retrieve the metadata and render a template, which you can provide in your own theme too.

We also try for microformats and we would look at Schema.org too but doesn’t seem to be a gem for it yet.

If the URL doesn’t provide anything at all we get the URL, title and date of last visit.

Isn’t it nice that the corporations that requires us to use OEmbed, OGP, Twitter Cards, Schema.org and other metadata, don’t do use themselves?

Also we’re going to use heavy caching so we don’t hit rate limits or lose the representation if the service is down or the URL is removed. We may be tempted to store the resources locally (images, videos, audio) but we have to take into account that people have legitimate reasons to remove media from the Internet.

Defined Under Namespace

Modules: Filter Classes: Cache, Content

Constant Summary collapse

IFRAME_ATTRIBUTES =

Attributes to apply by HTMLElement

%w[allow sandbox referrerpolicy loading height width].freeze
IMAGE_ATTRIBUTES =
%w[referrerpolicy loading height width].freeze
MEDIA_ATTRIBUTES =
%w[controls height width].freeze
A_ATTRIBUTES =
%w[referrerpolicy rel target].freeze
DIRECTIVES =

Directive from Feature Policy

%w[accelerometer ambient-light-sensor autoplay battery camera display-capture document-domain
encrypted-media execution-while-not-rendered execution-while-out-of-viewport fullscreen gamepad geolocation gyroscope layout-animations legacy-image-formats magnetometer microphone midi navigation-override oversized-images payment picture-in-picture publickey-credentials-get speaker-selection sync-xhr usb screen-wake-lock web-share xr-spatial-tracking].freeze
INCLUDE_OGP =

Templates

'{% include ogp.html %}'
INCLUDE_FALLBACK =
'{% include fallback.html %}'
INCLUDE_EMBED =
'{% include embed.html %}'
DEFAULT_CONFIG =

The default referrer policy only sends the origin URL (not the full URL, only the protocol/scheme and domain part) if the remote URL is HTTPS.

The default sandbox restrictions only allow scripts in the context of the iframe and opening new tabs.

{
  'scrub' => %w[form input textarea button fieldset select option optgroup canvas area map],
  'attributes' => {
    'referrerpolicy' => 'strict-origin-when-cross-origin',
    'sandbox' => %w[allow-scripts allow-popups allow-popups-to-escape-sandbox],
    'allow' => %w[fullscreen; gyroscope; picture-in-picture; clipboard-write;],
    'loading' => 'lazy',
    'controls' => true,
    'rel' => %w[noopener noreferrer],
    'target' => '_blank',
    'height' => nil,
    'width' => nil
  }
}.freeze

Class Method Summary collapse

Class Method Details

.cacheJekyll::Embed::Cache



276
277
278
# File 'lib/jekyll/embed.rb', line 276

def cache
  @cache ||= Jekyll::Embed::Cache.new('Jekyll::Embed')
end

.cleanup(html_fragment, url) ⇒ String

Returns:

  • (String)


295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
# File 'lib/jekyll/embed.rb', line 295

def cleanup(html_fragment, url)
  html = Loofah.fragment(html_fragment).scrub!(:prune)

  # Add our own attributes
  html.css('iframe').each do |iframe|
    IFRAME_ATTRIBUTES.each do |attr|
      set_value_for_attr(iframe, attr)
    end

    # Embedding itself require allow-same-origin
    iframe['sandbox'] += allow_same_origin(url)
  end

  html.css('audio, video').each do |media|
    MEDIA_ATTRIBUTES.each do |attr|
      set_value_for_attr(media, attr)
    end

    media['src'] = UrlPrivacy.clean media['src']
  end

  html.css('img').each do |img|
    IMAGE_ATTRIBUTES.each do |attr|
      set_value_for_attr(img, attr)
    end
  end

  html.css('a').each do |a|
    A_ATTRIBUTES.each do |attr|
      set_value_for_attr(a, attr)
    end
  end

  html.css('[src]').each do |element|
    element['src'] = UrlPrivacy.clean(element['src'])
  end

  html.css('[href]').each do |element|
    element['href'] = UrlPrivacy.clean(element['href'])
  end

  # Return the cleaned up HTML as a String
  html.to_s
end

.configHash

Returns:

  • (Hash)


178
179
180
181
182
183
184
# File 'lib/jekyll/embed.rb', line 178

def config
  @config ||= Jekyll::Utils.deep_merge_hashes(DEFAULT_CONFIG, (site.config['embed'] || {})).tap do |c|
    c['attributes']['allow'].concat (DIRECTIVES - c.dig('attributes', 'allow').join.split(';').map do |s|
                                                    s.split(' ').first
                                                  end).join(" 'none';|").split('|')
  end
end

.embed(url) ⇒ String

Render the URL as HTML

  1. Try oembed for video and image

  2. If rich oembed, cleanup

  3. If OGP, render templates

  4. Else, render fallback template

Parameters:

  • URL (String)

Returns:

  • (String)

    HTML



158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
# File 'lib/jekyll/embed.rb', line 158

def embed(url)
  raise URI::Error unless url.is_a? String

  url = url.strip

  # Quick check
  raise URI::Error unless url.start_with? 'http'

  # Just to verify the URL is valid
  # TODO: Use Addressable
  URI.parse url

  oembed(url) || ogp(url) || fallback(url) || url
rescue URI::Error
  Jekyll.logger.warn "#{url.inspect} is not a valid URL"

  url
end

.fallback(url) ⇒ Object

Try something



240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
# File 'lib/jekyll/embed.rb', line 240

def fallback(url)
  cache.getset("fallback+#{url}") do
    html        = Nokogiri::HTML.fragment get(url).body
    element     = html.css('article').first
    element   ||= html.css('section').first
    element   ||= html.css('main').first
    element   ||= html.css('body').first
    title       = html.css('title').first
    description = html.css('meta[name="description"]').first

    context = info.dup
    context[:registers][:page] = payload['page'] = {
      'title' => text(title),
      'description' => text(description),
      'url' => url,
      'image' => element&.css('img')&.first&.public_send(:[], 'src'),
      'locale' => html.css('html')&.first&.public_send(:[], 'lang')
    }

    cleanup fallback_template.render!(payload, context), url
  end
rescue ArgumentError
  Jekyll.logger.warn 'Invalid contents (fallback):', url
  nil
rescue Faraday::Error, Nokogiri::SyntaxError
  nil
end

.get(url) ⇒ Faraday::Response

Parameters:

  • URL (String)

Returns:

  • (Faraday::Response)


270
271
272
273
# File 'lib/jekyll/embed.rb', line 270

def get(url)
  @get_cache ||= {}
  @get_cache[url] ||= http_client.get url
end

.http_clientFaraday::Connection

Returns:

  • (Faraday::Connection)


281
282
283
284
285
286
287
288
289
290
# File 'lib/jekyll/embed.rb', line 281

def http_client
  @http_client ||= Faraday.new do |builder|
    builder.options.timeout = 4
    builder.options.open_timeout = 1
    builder.options.read_timeout = 1
    builder.options.write_timeout = 1
    builder.use FaradayMiddleware::FollowRedirects
    builder.use :http_cache, shared_cache: false, store: cache, serializer: Marshal
  end
end

.oembed(url) ⇒ String, NilClass

Try for OEmbed

Parameters:

  • URL (String)

Returns:

  • (String, NilClass)

    Sanitized HTML or nil



190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
# File 'lib/jekyll/embed.rb', line 190

def oembed(url)
  cache.getset("oembed+#{url}") do
    oembed = OEmbed::Providers.get url

    # Prevent caching of nil?
    raise OEmbed::Error unless oembed.respond_to? :html

    context = info.dup
    context[:registers][:page] = payload['page'] = cleanup(oembed.html, url)

    embed_template.render!(payload, context)
  end
rescue OEmbed::Error
  nil
end

.ogp(url) ⇒ String, NilClass

Try for OGP.

Parameters:

  • URL (String)

Returns:

  • (String, NilClass)


209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
# File 'lib/jekyll/embed.rb', line 209

def ogp(url)
  cache.getset("ogp+#{url}") do
    ogp = OGP::OpenGraph.new get(url).body
    page = {
      locale: ogp.locales.first,
      title: ogp.title,
      url: ogp.url,
      description: ogp.description,
      type: ogp.type,
      data: ogp.data
    }.transform_keys(&:to_s)

    %w[image video audio].each do |attr|
      page[attr] = ogp.public_send(:"#{attr}s").find do |a|
        a && a.url && http?(a.url)
      end&.url
    end

    context = info.dup
    context[:registers][:page] = payload['page'] = page

    cleanup ogp_template.render!(payload, context), url
  end
rescue ArgumentError
  Jekyll.logger.warn 'Invalid contents (OGP):', url
  nil
rescue LL::ParserError, OGP::MalformedSourceError, OGP::MissingAttributeError, Faraday::Error
  nil
end

.resetnil

Reset variables

Returns:

  • (nil)


135
136
137
138
139
140
141
142
143
144
145
146
147
# File 'lib/jekyll/embed.rb', line 135

def reset
  @allow_same_origin =
    @cache =
      @config =
        @fallback_template =
          @get_cache =
            @http_client =
              @info =
                @ogp_template =
                  @payload =
                    @value_for_attr =
                      nil
end

.siteObject



95
96
97
98
99
100
101
102
# File 'lib/jekyll/embed.rb', line 95

def site
  unless @site
    raise Jekyll::Errors::InvalidConfigurationError,
          'Site is missing, configure with `Jekyll::Embed.site = site`'
  end

  @site
end

.site=(site) ⇒ Jekyll::Site

This is an initializer of sorts

Parameters:

  • (Jekyll::Site)

Returns:

  • (Jekyll::Site)

Raises:

  • (ArgumentError)


108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# File 'lib/jekyll/embed.rb', line 108

def site=(site)
  raise ArgumentError, 'Site must be a Jekyll::Site' unless site.is_a? Jekyll::Site

  @site = site

  # Add the _includes dir so we can provide default templates that
  # can be overriden locally or by the theme.
  includes_dir = File.expand_path(File.join(__dir__, '..', '..', '_includes'))
  site.includes_load_paths << includes_dir unless site.includes_load_paths.include? includes_dir
  # Since we're embedding, we're allowing iframes
  Loofah::HTML5::SafeList::ALLOWED_ELEMENTS_WITH_LIBXML2 << 'iframe'

  reset

  # Other elements that are disallowed
  config['scrub']&.each do |scrub|
    Loofah::HTML5::SafeList::ALLOWED_ELEMENTS_WITH_LIBXML2.delete(scrub)
  end

  payload['embed'] = config['attributes']

  site
end

.text(node) ⇒ Object



340
341
342
# File 'lib/jekyll/embed.rb', line 340

def text(node)
  node&.text&.tr("\n", '')&.tr("\r", '')&.strip&.squeeze(' ')
end