Class: Burly::Parsers::HtmlParser Private

Inherits:
Burly::Parser show all
Defined in:
lib/burly/parsers/html_parser.rb

This class is part of a private API. You should avoid using this class if possible, as it may be removed or be changed in the future.

Constant Summary collapse

SRCSET_ATTRIBUTES_MAP =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

A map of HTML srcset attributes and their associated element names.

{
  "imagesrcset" => ["link"],
  "srcset"      => ["img", "source"],
}.freeze
URL_ATTRIBUTES_MAP =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

A map of HTML URL attributes and their associated element names.

{
  "action"     => ["form"],
  "cite"       => ["blockquote", "del", "ins", "q"],
  "data"       => ["object"],
  "formaction" => ["button", "input"],
  "href"       => ["a", "area", "base", "link"],
  "ping"       => ["a", "area"],
  "poster"     => ["video"],
  "src"        => ["audio", "embed", "iframe", "img", "input", "script", "source", "track", "video"],
}.freeze
ATTRIBUTES_XPATHS =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

URL_ATTRIBUTES_MAP.merge(SRCSET_ATTRIBUTES_MAP).flat_map do |attribute, names|
  names.map { |name| "//#{name} / @#{attribute}" }
end

Constants inherited from Burly::Parser

Burly::Parser::URI_PARSER, Burly::Parser::URI_REGEXP

Instance Method Summary collapse

Methods inherited from Burly::Parser

#initialize

Constructor Details

This class inherits a constructor from Burly::Parser

Instance Method Details

#parseArray<String>

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Parse an HTML document for absolute or relative URLs.

Returns:

  • (Array<String>)


42
43
44
45
46
47
48
49
50
# File 'lib/burly/parsers/html_parser.rb', line 42

def parse
  attr_nodes.flat_map do |attr_node|
    if SRCSET_ATTRIBUTES_MAP.key?(attr_node.name)
      urls_from_candidate_strings(attr_node.value.split(/\s*,\s*/))
    else
      attr_node.value.strip
    end
  end
end