Class: Banzai::Filter::SpacedLinkFilter

Inherits:

HTML::Pipeline::Filter

Object
HTML::Pipeline::Filter
Banzai::Filter::SpacedLinkFilter

show all

Includes:: ActionView::Helpers::TagHelper, Concerns::PipelineTimingCheck

Defined in:: lib/banzai/filter/spaced_link_filter.rb

Overview

HTML Filter for markdown links with spaces in the URLs

Based on Banzai::Filter::AutolinkFilter

CommonMark does not allow spaces in the url portion of a link/url. For example, [example](page slug) is not valid. Neither is ![example](test image.jpg). However, particularly in our wikis, we support (via RedCarpet) this type of link, allowing wiki pages to be easily linked by their title. This filter adds that functionality.

This is a small extension to the CommonMark spec. If they start allowing spaces in urls, we could then remove this filter.

Note: Filter::SanitizationFilter/Filter::SanitizeLinkFilter should always be run sometime after this filter to prevent XSS attacks

Constant Summary collapse

LINK_OR_IMAGE_PATTERN = Pattern to match a standard markdown link Rubular: http://rubular.com/r/2EXEQ49rg5 This pattern is vulnerable to malicious inputs, so use Gitlab::UntrustedRegexp to place bounds on execution time

Gitlab::UntrustedRegexp.new(
  '(?P<preview_operator>!)?' \
  '\[(?P<text>.+?)\]' \
  '\(' \
    '(?P<new_link>.+?)' \
    '(?P<title>\ ".+?")?' \
  '\)'
)

IGNORE_PARENTS = Text matching LINK_OR_IMAGE_PATTERN inside these elements will not be linked

%w[a code kbd pre script style span[@data-math-style]].to_set

TEXT_QUERY = The XPath query to use for finding text nodes to parse.

%(descendant-or-self::text()[
  not(#{IGNORE_PARENTS.map { |p| "ancestor::#{p}" }.join(' or ')})
  and contains(., ']\(')
])

Constants included from Concerns::PipelineTimingCheck

Concerns::PipelineTimingCheck::MAX_PIPELINE_SECONDS

Instance Method Summary collapse

#call ⇒ Object

Methods included from Concerns::PipelineTimingCheck

#exceeded_pipeline_max?

Instance Method Details

#call ⇒ `Object`

# File 'lib/banzai/filter/spaced_link_filter.rb', line 51

def call
  doc.xpath(TEXT_QUERY).each do |node|
    content = node.to_html

    next unless LINK_OR_IMAGE_PATTERN.match(content)

    html = spaced_link_filter(content)

    next if html == content

    node.replace(html)
  end

  doc
end