Module: Polytexnic::Preprocessor::Polytex

Includes:
Literal
Included in:
Polytexnic::Preprocessor
Defined in:
lib/polytexnic/preprocessors/polytex.rb

Constant Summary

Constants included from Literal

Literal::CODE_INCLUSION_REGEX, Literal::LANG_REGEX

Instance Method Summary collapse

Methods included from Literal

#cache_display_inline_math, #cache_display_math, #cache_inline_math, #cache_literal, #cache_literal_environments, #cache_unicode, #code_error, #code_language, #code_salt, #element, #equation_element, #hyperrefs, #include_code, #literal_types, #math_environments

Instance Method Details

#cache_code_environments(source) ⇒ Object

Caches Markdown code environments. Included are indented environments, Leanpub-style indented environments, and GitHub-style code fencing.



202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
# File 'lib/polytexnic/preprocessors/polytex.rb', line 202

def cache_code_environments(source)
  output = []
  lines = source.split("\n")
  indentation = ' ' * 4
  while (line = lines.shift)
    if line =~ /\{lang="(.*?)"\}/
      language = $1
      code = []
      while (line = lines.shift) && line.match(/^#{indentation}(.*)$/) do
        code << $1
      end
      code = code.join("\n")
      key = digest(code)
      code_cache[key] = [code, language]
      output << key
      output << line
    elsif line =~ /^```(\w*)(,\s*options:.*)?$/  # highlighted fences
      count = 1
      language = $1.empty? ? 'text' : $1
      options  = $2
      code = []
      while (line = lines.shift) do
        count += 1 if line =~ /^```.+$/
        count -= 1 if line.match(/^```\s*$/)
        break if count.zero?
        code << line
      end
      code = code.join("\n")
      data = [code, language, false, options]
      key = digest(data.join("--"))
      code_cache[key] = data
      output << key
    else
      output << line
    end
  end
  output.join("\n")
end

#cache_image_locations(text) ⇒ Object

Caches the locations of images to be passed through the pipeline. This works around a Kramdown bug, which fails to convert images properly when their location includes a URL.



181
182
183
184
185
186
187
188
# File 'lib/polytexnic/preprocessors/polytex.rb', line 181

def cache_image_locations(text)
  # Matches '![Image caption](/path/to/image)'
  text.gsub!(/^\s*(!\[.*?\])\((.*?)\)/) do
    key = digest($2)
    $cache[key] = $2
    "\n#{$1}(#{key})"
  end
end

#cache_latex_literal(markdown) ⇒ Object

Caches literal LaTeX environments.



123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# File 'lib/polytexnic/preprocessors/polytex.rb', line 123

def cache_latex_literal(markdown)
  # Add tabular and tabularx support.
  literal_types = Polytexnic::Literal.literal_types +
                  %w[tabular tabularx longtable]
  literal_types.each do |literal|
    regex = /(\\begin\{#{Regexp.escape(literal)}\}
            .*?
            \\end\{#{Regexp.escape(literal)}\})
            /xm
    markdown.gsub!(regex) do
      content = $1
      key = digest(content)
      $cache[key] = content
      key
    end
  end
end

#cache_math(text, cache) ⇒ Object

Caches math. Leanpub uses the notation $$…/$$ for both inline and block math, with the only difference being the presences of newlines:

{$$} x^2 {/$$}  % inline

and

{$$}
x^2             % block
{/$$}

I personally hate this notation and convention, so we also support LaTeX-style ( x ) and [ x^2 - 2 = 0 ] notation.



276
277
278
279
280
281
282
283
284
285
286
287
# File 'lib/polytexnic/preprocessors/polytex.rb', line 276

def cache_math(text, cache)
  text.gsub!(/(?:\{\$\$\}\n(.*?)\n\{\/\$\$\}|\\\[(.*?)\\\])/m) do
    key = digest($1 || $2)
    cache[[:block, key]] = $1 || $2
    key
  end
  text.gsub!(/(?:\{\$\$\}(.*?)\{\/\$\$\}|\\\((.*?)\\\))/m) do
    key = digest($1 || $2)
    cache[[:inline, key]] = $1 || $2
    key
  end
end

#cache_raw_latex(markdown) ⇒ Object

Caches raw LaTeX commands to be passed through the pipeline.



142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
# File 'lib/polytexnic/preprocessors/polytex.rb', line 142

def cache_raw_latex(markdown)
  command_regex = /(
                    ^[ \t]*\\\w+.*\}[ \t]*$ # Command on line with arg
                    |
                    ~\\ref\{.*?\}     # reference with a tie
                    |
                    ~\\eqref\{.*?\}   # eq reference with a tie
                    |
                    \\[^\s]+\{.*?\}   # command with one arg
                    |
                    \\\w+             # normal command
                    |
                    \\-               # hyphenation
                    |
                    \\[ %&$\#@]       # space or special character
                    |
                    \\\\              # double backslashes
                    )
                  /x
  markdown.gsub!(command_regex) do
    content = $1
    puts content.inspect if debug?
    key = digest(content)
    # Used to speed up has_label? in convert_standalone_image.
    key += $label_salt if content.include?('\label')
    $cache[key] = content

    if content =~ /\{table\}|\\caption\{/
      # Pad tables & captions with newlines for kramdown compatibility.
      "\n#{key}\n"
    else
      key
    end
  end
end

#convert_code_inclusion(text) ⇒ Object

Adds support for <<(path/to/code) inclusion.



114
115
116
117
118
119
120
# File 'lib/polytexnic/preprocessors/polytex.rb', line 114

def convert_code_inclusion(text)
  text.gsub!(/^\s*(<<\(.*?\))/) do
    key = digest($1)
    $cache[key] = "%= #{$1}"  # reduce to a previously solved case
    key
  end
end

#convert_includegraphics(text) ⇒ Object

Converts includegraphics to image inside figures. The reason is that raw includegraphics is almost always too wide in the PDF. Instead, we use the custom-defined image command, which is specifically designed to fix this issue.



252
253
254
255
256
257
258
259
260
261
262
263
264
# File 'lib/polytexnic/preprocessors/polytex.rb', line 252

def convert_includegraphics(text)
  in_figure = false
  newtext = text.split("\n").map do |line|
    line.gsub!('\includegraphics', '\image') if in_figure
    if line =~ /^\s*\\begin\{figure\}/
      in_figure = true
    elsif line =~ /^\s*\\end\{figure\}/
      in_figure = false
    end
    line
  end.join("\n")
  text.replace(newtext)
end

#remove_kramdown_comments(text) ⇒ Object

Removes comments produced by kramdown. These have the special form of always being at the beginning of the line.



244
245
246
# File 'lib/polytexnic/preprocessors/polytex.rb', line 244

def remove_kramdown_comments(text)
  text.gsub!(/^% (.*)$/, '')
end

#restore_hashed_content(text) ⇒ Object

Restores raw code from the cache.



191
192
193
194
195
196
197
# File 'lib/polytexnic/preprocessors/polytex.rb', line 191

def restore_hashed_content(text)
  $cache.each do |key, value|
    # Because of the way backslashes get interpolated, we need to add
    # some extra ones to cover all the cases of hashed LaTeX.
    text.gsub!(key, value.gsub(/\\/, '\\\\\\'))
  end
end

#restore_math(text, cache) ⇒ Object

Restores the Markdown math. This is easy because we’re running everything through our LaTeX pipeline.



292
293
294
295
296
297
298
299
300
301
302
303
304
# File 'lib/polytexnic/preprocessors/polytex.rb', line 292

def restore_math(text, cache)
  cache.each do |(kind, key), value|
    case kind
    when :inline
      open  = '\('
      close =  '\)'
    when :block
      open  = '\['
      close = '\]'
    end
    text.gsub!(key, open + value + close)
  end
end

#to_polytexObject

Converts Markdown to PolyTeX. We adopt a unified approach: rather than convert “Markdown” (I use the term loosely*) directly to HTML, we convert it to PolyTeX and then run everything through the PolyTeX pipeline. Happily, kramdown comes equipped with a ‘to_latex` method that does most of the heavy lifting. The ouput isn’t as clean as that produced by Pandoc (our previous choice), but it comes with significant advantages: (1) It’s written in Ruby, available as a gem, so its use eliminates an external dependency. (2) It’s the foundation for the “Markdown” interpreter used by Leanpub, so by using it ourselves we ensure greater compatibility with Leanpub books.

  • <rant>The number of mutually incompatible markup languages going

by the name “Markdown” is truly mind-boggling. Most of them add things to John Gruber’s original Markdown language in an ever-expanding attempt to bolt on the functionality needed to write longer documents. At this point, I fear that “Markdown” has become little more than a marketing term.</rant>



86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# File 'lib/polytexnic/preprocessors/polytex.rb', line 86

def to_polytex
  math_cache = {}
  cleaned_markdown = cache_code_environments(@source)
  puts cleaned_markdown if debug?
  cleaned_markdown.tap do |markdown|
    convert_code_inclusion(markdown)
    cache_latex_literal(markdown)
    cache_raw_latex(markdown)
    cache_image_locations(markdown)
    puts markdown if debug?
    cache_math(markdown, math_cache)
  end
  puts cleaned_markdown if debug?
  # Override the header ordering, which starts with 'section' by default.
  lh = 'chapter,section,subsection,subsubsection,paragraph,subparagraph'
  kramdown = Kramdown::Document.new(cleaned_markdown, latex_headers: lh)
  puts kramdown.inspect if debug?
  puts kramdown.to_html if debug?
  puts kramdown.to_latex if debug?
  @source = kramdown.to_latex.tap do |polytex|
              remove_kramdown_comments(polytex)
              convert_includegraphics(polytex)
              restore_math(polytex, math_cache)
              restore_hashed_content(polytex)
            end
end