Class: Kramdown::Parser::Latexish

Inherits:
Kramdown
  • Object
show all
Includes:
Latexish::Bibliographical
Defined in:
lib/kramdown/latexish.rb

Overview

An extension of Kramdown parser aimed at mathematical articles

The way the kramdown library is structured, the parser class must be in module Kramdown::Parser, so that the option ‘:input => “Latexish”` can be passed to Kramdown::Document to make it use that parser.

Constant Summary collapse

THEOREM_LIKE_TAGS =

Tags we support for theorem-like environments

[:definition, :postulate, :property, :lemma,
:theorem, :corollary]
SPECIAL_TAGS =

All our special tags defined above

THEOREM_LIKE_TAGS + [:section]
LATEX_INLINE_MATH_RX =

Parse $…$ which do not make a block We do not need to start the regex with (?<!$) because the scanner is placed at the first $ it encounters.

/\$ (?!\$) (.*?) (?<!\$) \$ (?!\$) /xm
SPECIAL_REF_RX =

Regexes for reference links to sections

/\[ \s* (C|c)ref : \s* ( [^\]]+ ) \s* \]/x
BIB_CITE_RX =

Regex for bibliographic citations

/ \[ \s* cite(p|t) : \s+ ( [^\]]+ ) \s* \]/x

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Latexish::Bibliographical

#citation_for, #clean_bibtex

Constructor Details

#initialize(source, options) ⇒ Latexish

Initialise the parser

This supports the following options in addition of those supported by the base class

:language

A symbol identifying the language. Currently supported are :english
and :french (default :english)

:theorem_header_level

A theorem-like environment starts with a header: this option is the level
of that header (default 5)

:auto_number_headers

Whether to automatically number headers

:no_number

A list of symbols identifying which type of headers should not be
automatically numbered (default [:references], i.e. the Reference
section)

:bibliography

A `BibTeX::Bibliography` object containing the references to appear
at the end of the document, and which may be cited in the rest of it.
(default nil)

:bibliography_style

A symbol designating the CSL style to use to format the reference section
A complete list can be found
[here](https://github.com/citation-style-language/styles)
where the basename without the extension is the symbol to be passed.
(default :apa, for the APA style)

:latex_macros

A list of LaTeX macros that all equations in the document shall be able
to use. To do so they are put in a math block at the beginning of the
document.
(default [])

:hide_latex_macros?

Whether the math block containing the LaTeX macros is completely hidden
when converted to HTML
(default true)


66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# File 'lib/kramdown/latexish.rb', line 66

def initialize(source, options)
  super

  # Initialise language and lexical delegate
  @lex = Kramdown::Latexish::Lexical.new(@options[:language] ||= :english)

  # Initialise the rest of our custom options
  @options[:theorem_header_level] ||= 5
  @options[:auto_number_headers] = true if @options[:auto_number_headers].nil?
  @options[:no_number] ||= [reference_section_name]
  @options[:bibliography_style] ||= :apa
  @options[:latex_macros] ||= []
  @options[:hide_latex_macros?] = true if @options[:hide_latex_macros?].nil?

  # Add our new parsers
  @span_parsers.unshift(:latex_inline_math)

  # For parsing theorem environments
  rx = THEOREM_LIKE_TAGS
       .map{|tag| @lex.localise(tag)}
       .map(&:capitalize)
       .join('|')
  rx = rx + '|' + @lex.localise(:abstract).capitalize
  @environment_start_rx = / \A (#{rx}) (?: [ \t] ( \( .+? \) ) )? \s*? \Z /xm
  @environment_end_rx =  / \A \\ (#{rx}) \s*? \Z /xm

  # Last encountered theorem header
  @th = nil

  # For assigning a number to each header
  @next_section_number = []
  @last_header_level = 0

  # For tracking references to our special constructs
  @number_for = {}
  @category_for = {}

  # For numbering theorem-like environments
  @next_theorem_like_number = Hash[THEOREM_LIKE_TAGS.map{|tag| [tag, 0]}]

  # Bibtex keys found in citations
  @cited_bibkeys = Set[]
end

Class Method Details

.redefine_parser(name, start_re, span_start = nil, meth_name = "parse_#{name}") ⇒ Object

Redefine a parser previously added with ‘define_parser`



123
124
125
126
127
# File 'lib/kramdown/latexish.rb', line 123

def self.redefine_parser(name, start_re, span_start = nil,
                         meth_name = "parse_#{name}")
  @@parsers.delete(name)
  define_parser(name, start_re, span_start, meth_name)
end

Instance Method Details

#add_abstract(tag, start_idx, start_label, start_loc, end_loc) ⇒ Object

Add an abstract (internal helper method)



224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
# File 'lib/kramdown/latexish.rb', line 224

def add_abstract(tag, start_idx, start_label,
                 start_loc, end_loc)
  els = @tree.children
  header = els[start_idx]

  # Merge header ial's with .abstract
  ial = header.options[:ial] || {}
  update_ial_with_ial(ial, {'class' => 'abstract'})

  # Create a <div> for the abstract
  el = new_block_el(:html_element, 'div', ial,
                    :category => :block, :content_model => :block)

  # Add all the other elements processed after the header paragraph
  el.children += els[start_idx + 1 .. -2]

  # Replace all the elements processed since the header paragraph
  # by our div
  els[start_idx ..] = el
end

#add_header(level, text, id) ⇒ Object

Auto-numbering of headers

We override this method so that it will work with both setext and atx headers out of the box



249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
# File 'lib/kramdown/latexish.rb', line 249

def add_header(level, text, id)
  # Only h2, h3, … as h1 is for title
  lvl = level - 1
  if lvl > 0
    if @options[:auto_number_headers] && !@options[:no_number].include?(text)
      # Compute the number a la 2.1.3
      if lvl == @last_header_level
        @next_section_number[-1] += 1
      elsif lvl > @last_header_level
        ones = [1]*(lvl - @last_header_level)
        @next_section_number.push(*ones)
      else
        @next_section_number.pop(@last_header_level - lvl)
        @next_section_number[-1] += 1
      end
      @last_header_level = lvl
      nb = @next_section_number.join('.')

      # Prepend it to header text, removing a leading number if any
      text.gsub!(/^\s*[\d.]*\s*/, '')
      text = "#{nb} #{text}"

      # If it has an id, keep track of the association with its number
      @number_for[id] = nb if id
      @category_for[id] = :section
    end
  end

  # Let Kramdown handle it now
  super(level, text, id)
end

#add_theorem_like(tag, start_idx, start_label, start_loc, end_loc) ⇒ Object

Add a theorem-like environment (internal helper method)



185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
# File 'lib/kramdown/latexish.rb', line 185

def add_theorem_like(tag, start_idx, start_label,
                     start_loc, end_loc)
  category = @lex.symbolise(tag)
  els = @tree.children
  header = els[start_idx]

  # Merge header ial's with .theorem-like
  ial = header.options[:ial] || {}
  update_ial_with_ial(ial, {'class' => 'theorem-like'})

  # Increment number
  nb = @next_theorem_like_number[category] += 1

  # Process id
  unless (id = ial['id']).nil?
    @number_for[id] = nb
    @category_for[id] = category
  end

  # Create a <section> for the theorem with those ial's
  el = new_block_el(:html_element, 'section', ial,
                    :category => :block, :content_model => :block)

  # Create header and add it in the section
  elh = new_block_el(:header, nil, nil,
                     :level => @options[:theorem_header_level])
  # We can add Kramdown here as this is yet to be seen by the span parsers
  add_text("**#{tag} #{nb}** #{start_label}".rstrip, elh)
  el.children << elh

  # Add all the other elements processed after the header paragraph
  el.children += els[start_idx + 1 .. -2]

  # Replace all the elements processed since the header paragraph
  # by our section
  els[start_idx ..] = el
end

#bibliographyObject



114
115
116
# File 'lib/kramdown/latexish.rb', line 114

def bibliography
  @options[:bibliography]
end


370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
# File 'lib/kramdown/latexish.rb', line 370

def handle_bibliographic_citation_link
  return false if bibliography.nil?
  loc = @src.current_line_number
  style = @src[1] == 'p' ? :parenthetical : :textual
  bibkeys = @src[2].split /\s*,\s*/
  unless bibkeys.empty?
    # Array of Element's for each key
    elements = bibkeys.map do |key|
      et_al = false
      if key[0] == '*'
        et_al = true
        key = key[1..]
      end
      # Keep track of the keys that have been cited
      @cited_bibkeys << key if bibliography.key?(key)

      el = Element.new(:a, nil, nil, location: loc)
      el.attr['href'] = "##{key}"
      el.children << Element.new(:text,
                                 citation_for(key, style, et_al, loc),
                                 nil,
                                 location: loc)
      el
    end
    # Then we put them together with commas and the word "and"
    conjonction = @lex.and(elements, joined: false) do |word|
      Element.new(:text, word, nil, location: loc)
    end
    # Then output that array of Element's
    @tree.children += conjonction
    # Done
    true
  else
    warning("Empty bibliographic citation at line #{loc}")
    false
  end
end


308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
# File 'lib/kramdown/latexish.rb', line 308

def handle_special_ref_link
  loc = @src.current_line_number
  capital = @src[1] == 'C'
  if @src[2].nil?
    warning("No reference specified at line #{loc}")
    @tree.children << Element.new(:text, @src[0], nil, location: loc)
  else
    # Group the keys by header category
    ids_for = {}
    @src[2].split(/\s*,\s*/).map do |id|
      (ids_for[@category_for[id] || :undefined] ||= []) << id
    end
    # For each category, and each ids of that category...
    ref_chunks = ids_for.each_with_index.map do |(category, ids), i|
      # Generate the reference for each id
      nums = ids.map do |id|
        case category
        when :undefined
          warning("No element with id '#{id}' at line #{loc}")
          el = Element.new(:text, "¿#{id}?", nil, location: loc)
        when :eqn
          # Referencing equation shall be delegated to Mathjax by client code
          el = Element.new(:text, "\\eqref{#{id}}", nil, location: loc)
        else
          nb = @number_for[id]
          el = Element.new(:a, nil, nil, location: loc)
          el.attr['href'] = "##{id}"
          el.attr['title'] = "#{@lex.localise(category).capitalize} #{nb}"
          el.children << Element.new(:text, nb.to_s, nil, location: loc)
        end
        el
      end
      # Join all the references and put the title in front
      # We don't want "and" to be separated from the following link
      refs = @lex.and(nums, joined: false, nbsp: true) {|word|
        Element.new(:text, word, nil, location: loc)
      }
      if category != :undefined
        form = ids.size == 1 ? :singular : :plural
        label = @lex.localise(category, form)
        label = label.capitalize if capital and i == 0
        label = Element.new(:text, label + '&nbsp;', nil, location: loc)
        [label] + refs
      else
        refs
      end
    end
    # Conjunct again and append all that to the tree
    # This time "and" should get separated from the following label so as
    # not to stress the layout engine when it wraps lines
    references = @lex.and(ref_chunks, joined:false) {|word|
      [Element.new(:text, word, nil, location: loc)]
    }
    .flatten(1)
    @tree.children += references
  end
  true
end

#languageObject



110
111
112
# File 'lib/kramdown/latexish.rb', line 110

def language
  @options[:language]
end

#parseObject

Override parse to produce the Reference section



409
410
411
412
413
# File 'lib/kramdown/latexish.rb', line 409

def parse
  super
  produce_latex_macros
  produce_reference_section
end

#parse_block_mathObject

Override parsing of block math to gather label’s



416
417
418
419
420
421
422
# File 'lib/kramdown/latexish.rb', line 416

def parse_block_math
  result = super
  @tree.children.last.value.scan(/\\label\s*\{(.*?)\}/) do
    @category_for[$~[1]] = :eqn
  end
  result
end

#parse_latex_inline_mathObject



133
134
135
# File 'lib/kramdown/latexish.rb', line 133

def parse_latex_inline_math
  parse_inline_math
end

Parse reference links to sections

We override parse_link, look whether we have one of our special reference links or one of our bibliographical citations, if so process it, otherwise let super handle it. Since this method is called by Kramdown when it thinks it ready to handle links. So we can assume that all id’s are known by then, and therefore all their associated numbers.



288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
# File 'lib/kramdown/latexish.rb', line 288

def parse_link
  start_pos = @src.save_pos
  parsed = false
  # Nothing to do if it is an image link
  if @src.peek(1) != '!'
    if @src.scan(SPECIAL_REF_RX)
      parsed = handle_special_ref_link
    elsif @src.scan(BIB_CITE_RX)
      parsed = handle_bibliographic_citation_link
    end
  end
  unless parsed
    @src.revert_pos(start_pos)
    super
  end
end

#parse_paragraphObject

Parsing of environments

We override the parsing of paragraphs, by detecting the start and end markers of an environment, then reshuffling the elements parsed by super.



142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
# File 'lib/kramdown/latexish.rb', line 142

def parse_paragraph
  return false unless super

  # We do indeed have a paragraph: we will return true in any case
  # but we may do some processing beforehand if we find one of our
  # environments
  els = @tree.children
  case els.last.children[0].value
  when @environment_start_rx
    # We have an environment header: keep necessary info
    @th = [els.size - 1, $1, $2, @src.current_line_number]
  when @environment_end_rx
    # We have an end tag: do we have a starting one?
    end_tag = $1
    end_loc = @src.current_line_number
    unless @th
      warning(
        "`\\#{end_tag}` on line #{end_loc} without " \
        "any `#{end_tag}` earlier on")
    else
      # We have a beginning tag: does it match the end tag?
      start_idx, start_tag, start_label, start_loc = @th
      unless end_tag == start_tag
        warning("\\#{end_tag} on line #{end_loc} does not match " \
                "#{start_tag} on line #{start_loc}")
      else
        # We have a valid environment: discriminate
        if @lex.symbolise(start_tag) == :abstract
          add_abstract(start_tag, start_idx, start_label,
                       start_loc, end_loc)
        else
          add_theorem_like(start_tag, start_idx, start_label,
                           start_loc, end_loc)
        end
        # Prepare for a new paragraph
        @th = nil
      end
    end
  end
  true
end

#produce_latex_macrosObject

Produce math block with LaTeX macros



453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
# File 'lib/kramdown/latexish.rb', line 453

def produce_latex_macros
  macros = @options[:latex_macros]
  unless macros.empty?
    opts = {
      :style => "display:#{@options[:hide_latex_macros?] ? 'none' : 'block'}"
    }
    el = Element.new(
      :html_element, 'div', opts,
      category: :block, content_model: :block)
    macros = (['\text{\LaTeX Macros:}'] + macros).join("\n")
    el.children << Element.new(:math, macros, nil, category: :block)
    # TODO: fix line numbers
    @root.children.prepend(el)
  end
end

#produce_reference_sectionObject

Produce the section containing the bibliographic references at the end of the document



426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
# File 'lib/kramdown/latexish.rb', line 426

def produce_reference_section
  unless @cited_bibkeys.empty?
    cp = CiteProc::Processor.new(style: @options[:bibliography_style],
                                 format: 'html')
    cp.import(bibliography.to_citeproc)
    references = @cited_bibkeys.map {|key|
      html = cp.render(:bibliography, id: key)[0]
      html = clean_bibtex(html)
      html += "\n{: .bibliography-item ##{key}}"
    }
    .join("\n\n")
    biblio = <<~"MD"

      ## #{reference_section_name}

      #{references}
    MD
    # Since we monkey-patched it, this will use this parser
    # and not the default one. In particular, $...$ will produce
    # inline equations
    bib_doc = Kramdown::Document.new(biblio, @options)
    # TODO: fix line numbers
    @root.children += bib_doc.root.children
  end
end

#reference_section_nameObject



118
119
120
# File 'lib/kramdown/latexish.rb', line 118

def reference_section_name
  @lex.localise(:reference, :plural).capitalize
end