Class: MarkdownLint::Doc

Inherits:

Object

Object
MarkdownLint::Doc

show all

Defined in:: lib/mdl/doc.rb

Overview

Representation of the markdown document passed to rule checks

Instance Attribute Summary collapse

#elements ⇒ Object readonly

A list of raw markdown source lines.
#lines ⇒ Object readonly

A list of raw markdown source lines.
#offset ⇒ Object readonly

A list of raw markdown source lines.
#parsed ⇒ Object readonly

A list of raw markdown source lines.

Class Method Summary collapse

.new_from_file(filename, ignore_front_matter = false) ⇒ Object

Alternate ‘constructor’ passing in a filename.

Instance Method Summary collapse

#element_line(element) ⇒ Object

Returns the actual source line for a given element.
#element_linenumber(element) ⇒ Object

Returns the line number a given element is located on in the source file.
#element_linenumbers(elements) ⇒ Object

Returns a list of line numbers for all elements passed in.
#element_lines(elements) ⇒ Object

Returns the actual source lines for a list of elements.
#extract_as_text(element) ⇒ Object

Returns the element as plaintext.
#extract_text(element, prefix = '', restore_whitespace = true) ⇒ Object

Extracts the text from an element whose children consist of text elements and other things.
#find_type(type, nested = true) ⇒ Object

Find all elements of a given type, returning their options hash.
#find_type_elements(type, nested = true, elements = @elements) ⇒ Object

Find all elements of a given type, returning a list of the element objects themselves.
#find_type_elements_except(type, nested_except = [], elements = @elements) ⇒ Object

A variation on find_type_elements that allows you to skip drilling down into children of specific element types.
#header_style(header) ⇒ Object

Returns the header ‘style’ - :atx (hashes at the beginning), :atx_closed (atx header style, but with hashes at the end of the line also), :setext (underlined).
#indent_for(line) ⇒ Object

Returns how much a given line is indented.
#initialize(text, ignore_front_matter = false) ⇒ Doc constructor

Create a new document given a string containing the markdown source.
#list_style(item) ⇒ Object

Returns the list style for a list: :asterisk, :plus, :dash, :ordered or :ordered_paren depending on which symbol is used to denote the list item.
#matching_lines(regex) ⇒ Object

Returns line numbers for lines that match the given regular expression.
#matching_text_element_lines(regex, exclude_nested = [:a]) ⇒ Object

Returns line numbers for lines that match the given regular expression.

Constructor Details

#initialize(text, ignore_front_matter = false) ⇒ `Doc`

Create a new document given a string containing the markdown source

# File 'lib/mdl/doc.rb', line 29

def initialize(text, ignore_front_matter = false)
  regex = /^---\n(.*?)---\n\n?/m
  if ignore_front_matter && regex.match(text)
    @offset = regex.match(text).to_s.split("\n").length
    text.sub!(regex, '')
  else
    @offset = 0
  end
  # The -1 is to cause split to preserve an extra entry in the array so we
  # can tell if there's a final newline in the file or not.
  @lines = text.split(/\R/, -1)
  @parsed = Kramdown::Document.new(text, :input => 'MarkdownLint')
  @elements = @parsed.root.children
  add_annotations(@elements)
end

Instance Attribute Details

#elements ⇒ `Object` (readonly)

A list of raw markdown source lines. Note that the list is 0-indexed, while line numbers in the parsed source are 1-indexed, so you need to subtract 1 from a line number to get the correct line. The element_line* methods take care of this for you.



14
15
16

# File 'lib/mdl/doc.rb', line 14

def elements
  @elements
end

#lines ⇒ `Object` (readonly)



14
15
16

# File 'lib/mdl/doc.rb', line 14

def lines
  @lines
end

#offset ⇒ `Object` (readonly)



14
15
16

# File 'lib/mdl/doc.rb', line 14

def offset
  @offset
end

#parsed ⇒ `Object` (readonly)



14
15
16

# File 'lib/mdl/doc.rb', line 14

def parsed
  @parsed
end

Class Method Details

.new_from_file(filename, ignore_front_matter = false) ⇒ `Object`

Alternate ‘constructor’ passing in a filename

# File 'lib/mdl/doc.rb', line 48

def self.new_from_file(filename, ignore_front_matter = false)
  if filename == '-'
    new($stdin.read, ignore_front_matter)
  else
    new(File.read(filename, :encoding => 'UTF-8'), ignore_front_matter)
  end
end

Instance Method Details

#element_line(element) ⇒ `Object`

Returns the actual source line for a given element. You can pass in an element object or an options hash here. This is useful if you need to examine the source line directly for your rule to make use of information that isn’t present in the parsed document.



135
136
137

# File 'lib/mdl/doc.rb', line 135

def element_line(element)
  @lines[element_linenumber(element) - 1]
end

#element_linenumber(element) ⇒ `Object`

Returns the line number a given element is located on in the source file. You can pass in either an element object or an options hash here.

# File 'lib/mdl/doc.rb', line 124

def element_linenumber(element)
  element = element.options if element.is_a?(Kramdown::Element)
  element[:location]
end

#element_linenumbers(elements) ⇒ `Object`

Returns a list of line numbers for all elements passed in. You can pass in a list of element objects or a list of options hashes here.



143
144
145

# File 'lib/mdl/doc.rb', line 143

def element_linenumbers(elements)
  elements.map { |e| element_linenumber(e) }
end

#element_lines(elements) ⇒ `Object`

Returns the actual source lines for a list of elements. You can pass in a list of elements objects or a list of options hashes here.



151
152
153

# File 'lib/mdl/doc.rb', line 151

def element_lines(elements)
  elements.map { |e| element_line(e) }
end

#extract_as_text(element) ⇒ `Object`

Returns the element as plaintext

# File 'lib/mdl/doc.rb', line 274

def extract_as_text(element)
  quotes = {
    :rdquo => '"',
    :ldquo => '"',
    :lsquo => "'",
    :rsquo => "'",
  }
  # If anything goes amiss here, e.g. unknown type, then nil will be
  # returned and we'll just not catch that part of the text, which seems
  # like a sensible failure mode.
  element.children.map do |e|
    if e.type == :text || e.type == :codespan
      e.value
    elsif %i{strong em p a}.include?(e.type)
      extract_as_text(e).join("\n")
    elsif e.type == :smart_quote
      quotes[e.value]
    end
  end.join.split("\n")
end

#extract_text(element, prefix = '', restore_whitespace = true) ⇒ `Object`

Extracts the text from an element whose children consist of text elements and other things

# File 'lib/mdl/doc.rb', line 245

def extract_text(element, prefix = '', restore_whitespace = true)
  quotes = {
    :rdquo => '"',
    :ldquo => '"',
    :lsquo => "'",
    :rsquo => "'",
  }
  # If anything goes amiss here, e.g. unknown type, then nil will be
  # returned and we'll just not catch that part of the text, which seems
  # like a sensible failure mode.
  lines = element.children.map do |e|
    if e.type == :text
      e.value
    elsif %i{strong em p codespan}.include?(e.type)
      extract_text(e, prefix, restore_whitespace).join("\n")
    elsif e.type == :smart_quote
      quotes[e.value]
    end
  end.join.split("\n")
  # Text blocks have whitespace stripped, so we need to add it back in at
  # the beginning. Because this might be in something like a blockquote,
  # we optionally strip off a prefix given to the function.
  lines[0] = element_line(element).sub(prefix, '') if restore_whitespace
  lines
end

#find_type(type, nested = true) ⇒ `Object`

Find all elements of a given type, returning their options hash. The options hash has most of the useful data about an element and often you can just use this in your rules.

# Returns [ { :location => 1, :element_level => 2 }, ... ]
elements = find_type(:li)

If nested is set to false, this returns only top level elements of a given type.



67
68
69

# File 'lib/mdl/doc.rb', line 67

def find_type(type, nested = true)
  find_type_elements(type, nested).map(&:options)
end

#find_type_elements(type, nested = true, elements = @elements) ⇒ `Object`

Find all elements of a given type, returning a list of the element objects themselves.

Instead of a single type, a list of types can be provided instead to find all types.

If nested is set to false, this returns only top level elements of a given type.

# File 'lib/mdl/doc.rb', line 81

def find_type_elements(type, nested = true, elements = @elements)
  results = []
  type = [type] if type.instance_of?(Symbol)
  elements.each do |e|
    results.push(e) if type.include?(e.type)
    if nested && !e.children.empty?
      results.concat(find_type_elements(type, nested, e.children))
    end
  end
  results
end

#find_type_elements_except(type, nested_except = [], elements = @elements) ⇒ `Object`

A variation on find_type_elements that allows you to skip drilling down into children of specific element types.

Instead of a single type, a list of types can be provided instead to find all types.

Unlike find_type_elements, this method will always search for nested elements, and skip the element types given to nested_except.

# File 'lib/mdl/doc.rb', line 103

def find_type_elements_except(
  type, nested_except = [], elements = @elements
)
  results = []
  type = [type] if type.instance_of?(Symbol)
  nested_except = [nested_except] if nested_except.instance_of?(Symbol)
  elements.each do |e|
    results.push(e) if type.include?(e.type)
    next if nested_except.include?(e.type) || e.children.empty?

    results.concat(
      find_type_elements_except(type, nested_except, e.children),
    )
  end
  results
end

#header_style(header) ⇒ `Object`

Returns the header ‘style’ - :atx (hashes at the beginning), :atx_closed (atx header style, but with hashes at the end of the line also), :setext (underlined). You can pass in the element object or an options hash here.

# File 'lib/mdl/doc.rb', line 161

def header_style(header)
  if header.type != :header
    raise 'header_style called with non-header element'
  end

  line = element_line(header)
  if line.start_with?('#')
    if line.strip.end_with?('#')
      :atx_closed
    else
      :atx
    end
  else
    :setext
  end
end

#indent_for(line) ⇒ `Object`

Returns how much a given line is indented. Hard tabs are treated as an indent of 8 spaces. You need to pass in the raw string here.



206
207
208

# File 'lib/mdl/doc.rb', line 206

def indent_for(line)
  line.match(/^\s*/)[0].gsub("\t", ' ' * 8).length
end

#list_style(item) ⇒ `Object`

Returns the list style for a list: :asterisk, :plus, :dash, :ordered or :ordered_paren depending on which symbol is used to denote the list item. You can pass in either the element itself or an options hash here.

# File 'lib/mdl/doc.rb', line 183

def list_style(item)
  raise 'list_style called with non-list element' if item.type != :li

  line = element_line(item).strip.gsub(/^>\s+/, '')
  if line.start_with?('*')
    :asterisk
  elsif line.start_with?('+')
    :plus
  elsif line.start_with?('-')
    :dash
  elsif line.match('[0-9]+\.')
    :ordered
  elsif line.match('[0-9]+\)')
    :ordered_paren
  else
    :unknown
  end
end

#matching_lines(regex) ⇒ `Object`

Returns line numbers for lines that match the given regular expression

# File 'lib/mdl/doc.rb', line 213

def matching_lines(regex)
  @lines.each_with_index.select { |text, _linenum| regex.match(text) }
        .map do |i|
          i[1] + 1
        end
end

#matching_text_element_lines(regex, exclude_nested = [:a]) ⇒ `Object`

Returns line numbers for lines that match the given regular expression. Only considers text inside of ‘text’ elements (i.e. regular markdown text and not code/links or other elements).

# File 'lib/mdl/doc.rb', line 224

def matching_text_element_lines(regex, exclude_nested = [:a])
  matches = []
  find_type_elements_except(:text, exclude_nested).each do |e|
    first_line = e.options[:location]
    # We'll error out if kramdown doesn't have location information for
    # the current element. It's better to just not match in these cases
    # rather than crash.
    next if first_line.nil?

    lines = e.value.split("\n")
    lines.each_with_index do |l, i|
      matches << (first_line + i) if regex.match(l)
    end
  end
  matches
end

Class: MarkdownLint::Doc

Overview

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(text, ignore_front_matter = false) ⇒ Doc

Instance Attribute Details

#elements ⇒ Object (readonly)

#lines ⇒ Object (readonly)

#offset ⇒ Object (readonly)

#parsed ⇒ Object (readonly)

Class Method Details

.new_from_file(filename, ignore_front_matter = false) ⇒ Object

Instance Method Details

#element_line(element) ⇒ Object

#element_linenumber(element) ⇒ Object

#element_linenumbers(elements) ⇒ Object

#element_lines(elements) ⇒ Object

#extract_as_text(element) ⇒ Object

#extract_text(element, prefix = '', restore_whitespace = true) ⇒ Object

#find_type(type, nested = true) ⇒ Object

#find_type_elements(type, nested = true, elements = @elements) ⇒ Object

#find_type_elements_except(type, nested_except = [], elements = @elements) ⇒ Object

#header_style(header) ⇒ Object

#indent_for(line) ⇒ Object

#list_style(item) ⇒ Object

#matching_lines(regex) ⇒ Object

#matching_text_element_lines(regex, exclude_nested = [:a]) ⇒ Object