Module: Polytexnic::Preprocessor::Polytex
- Includes:
- Literal
- Included in:
- Polytexnic::Preprocessor
- Defined in:
- lib/polytexnic/preprocessors/polytex.rb
Constant Summary
Constants included from Literal
Instance Method Summary collapse
-
#cache_code_environments(source) ⇒ Object
Caches Markdown code environments.
-
#cache_image_locations(text) ⇒ Object
Caches the locations of images to be passed through the pipeline.
-
#cache_latex_literal(markdown) ⇒ Object
Caches literal LaTeX environments.
-
#cache_math(text, cache) ⇒ Object
Caches math.
-
#cache_raw_latex(markdown) ⇒ Object
Caches raw LaTeX commands to be passed through the pipeline.
-
#convert_code_inclusion(text) ⇒ Object
Adds support for <<(path/to/code) inclusion.
-
#convert_includegraphics(text) ⇒ Object
Converts includegraphics to image inside figures.
-
#remove_kramdown_comments(text) ⇒ Object
Removes comments produced by kramdown.
-
#restore_hashed_content(text) ⇒ Object
Restores raw code from the cache.
-
#restore_math(text, cache) ⇒ Object
Restores the Markdown math.
-
#to_polytex ⇒ Object
Converts Markdown to PolyTeX.
Methods included from Literal
#cache_display_inline_math, #cache_display_math, #cache_inline_math, #cache_literal, #cache_literal_environments, #cache_unicode, #code_salt, #element, #equation_element, #hyperrefs, #literal_types, #math_environments
Instance Method Details
#cache_code_environments(source) ⇒ Object
Caches Markdown code environments. Included are indented environments, Leanpub-style indented environments, and GitHub-style code fencing.
205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
# File 'lib/polytexnic/preprocessors/polytex.rb', line 205 def cache_code_environments(source) output = [] lines = source.split("\n") indentation = ' ' * 4 while (line = lines.shift) if line =~ /\{lang="(.*?)"\}/ language = $1 code = [] while (line = lines.shift) && line.match(/^#{indentation}(.*)$/) do code << $1 end code = code.join("\n") key = digest(code) code_cache[key] = [code, language] output << key output << line elsif line =~ /^```([\w+]*)(,\s*options:.*)?$/ # highlighted fences count = 1 language = $1.empty? ? 'text' : $1 = $2 code = [] while (line = lines.shift) do count += 1 if line =~ /^```.+$/ count -= 1 if line.match(/^```\s*$/) break if count.zero? code << line end code = code.join("\n") data = [code, language, false, ] key = digest(data.join("--")) code_cache[key] = data output << key else output << line end end output.join("\n") end |
#cache_image_locations(text) ⇒ Object
Caches the locations of images to be passed through the pipeline. This works around a Kramdown bug, which fails to convert images properly when their location includes a URL.
184 185 186 187 188 189 190 191 |
# File 'lib/polytexnic/preprocessors/polytex.rb', line 184 def cache_image_locations(text) # Matches '![Image caption](/path/to/image)' text.gsub!(/^\s*(!\[.*?\])\((.*?)\)/) do key = digest($2) $cache[key] = $2 "\n#{$1}(#{key})" end end |
#cache_latex_literal(markdown) ⇒ Object
Caches literal LaTeX environments.
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
# File 'lib/polytexnic/preprocessors/polytex.rb', line 126 def cache_latex_literal(markdown) # Add tabular and tabularx support. literal_types = Polytexnic::Literal.literal_types + %w[tabular tabularx longtable] literal_types.each do |literal| regex = /(\\begin\{#{Regexp.escape(literal)}\} .*? \\end\{#{Regexp.escape(literal)}\}) /xm markdown.gsub!(regex) do content = $1 key = digest(content) $cache[key] = content key end end end |
#cache_math(text, cache) ⇒ Object
Caches math. Leanpub uses the notation $$…/$$ for both inline and block math, with the only difference being the presences of newlines:
{$$} x^2 {/$$} % inline
and
{$$}
x^2 % block
{/$$}
I personally hate this notation and convention, so we also support LaTeX-style ( x ) and [ x^2 - 2 = 0 ] notation.
279 280 281 282 283 284 285 286 287 288 289 290 |
# File 'lib/polytexnic/preprocessors/polytex.rb', line 279 def cache_math(text, cache) text.gsub!(/(?:\{\$\$\}\n(.*?)\n\{\/\$\$\}|\\\[(.*?)\\\])/m) do key = digest($1 || $2) cache[[:block, key]] = $1 || $2 key end text.gsub!(/(?:\{\$\$\}(.*?)\{\/\$\$\}|\\\((.*?)\\\))/m) do key = digest($1 || $2) cache[[:inline, key]] = $1 || $2 key end end |
#cache_raw_latex(markdown) ⇒ Object
Caches raw LaTeX commands to be passed through the pipeline.
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
# File 'lib/polytexnic/preprocessors/polytex.rb', line 145 def cache_raw_latex(markdown) command_regex = /( ^[ \t]*\\\w+.*\}[ \t]*$ # Command on line with arg | ~\\ref\{.*?\} # reference with a tie | ~\\eqref\{.*?\} # eq reference with a tie | \\[^\s]+\{.*?\} # command with one arg | \\\w+ # normal command | \\- # hyphenation | \\[ %&$\#@] # space or special character | \\\\ # double backslashes ) /x markdown.gsub!(command_regex) do content = $1 puts content.inspect if debug? key = digest(content) # Used to speed up has_label? in convert_standalone_image. key += $label_salt if content.include?('\label') $cache[key] = content if content =~ /\{table\}|\\caption\{/ # Pad tables & captions with newlines for kramdown compatibility. "\n#{key}\n" else key end end end |
#convert_code_inclusion(text) ⇒ Object
Adds support for <<(path/to/code) inclusion.
117 118 119 120 121 122 123 |
# File 'lib/polytexnic/preprocessors/polytex.rb', line 117 def convert_code_inclusion(text) text.gsub!(/^\s*(<<\(.*?\))/) do key = digest($1) $cache[key] = "%= #{$1}" # reduce to a previously solved case key end end |
#convert_includegraphics(text) ⇒ Object
Converts includegraphics to image inside figures. The reason is that raw includegraphics is almost always too wide in the PDF. Instead, we use the custom-defined image command, which is specifically designed to fix this issue.
255 256 257 258 259 260 261 262 263 264 265 266 267 |
# File 'lib/polytexnic/preprocessors/polytex.rb', line 255 def convert_includegraphics(text) in_figure = false newtext = text.split("\n").map do |line| line.gsub!('\includegraphics', '\image') if in_figure if line =~ /^\s*\\begin\{figure\}/ in_figure = true elsif line =~ /^\s*\\end\{figure\}/ in_figure = false end line end.join("\n") text.replace(newtext) end |
#remove_kramdown_comments(text) ⇒ Object
Removes comments produced by kramdown. These have the special form of always being at the beginning of the line.
247 248 249 |
# File 'lib/polytexnic/preprocessors/polytex.rb', line 247 def remove_kramdown_comments(text) text.gsub!(/^% (.*)$/, '') end |
#restore_hashed_content(text) ⇒ Object
Restores raw code from the cache.
194 195 196 197 198 199 200 |
# File 'lib/polytexnic/preprocessors/polytex.rb', line 194 def restore_hashed_content(text) $cache.each do |key, value| # Because of the way backslashes get interpolated, we need to add # some extra ones to cover all the cases of hashed LaTeX. text.gsub!(key, value.gsub(/\\/, '\\\\\\')) end end |
#restore_math(text, cache) ⇒ Object
Restores the Markdown math. This is easy because we’re running everything through our LaTeX pipeline.
295 296 297 298 299 300 301 302 303 304 305 306 307 |
# File 'lib/polytexnic/preprocessors/polytex.rb', line 295 def restore_math(text, cache) cache.each do |(kind, key), value| case kind when :inline open = '\(' close = '\)' when :block open = '\[' close = '\]' end text.gsub!(key, open + value + close) end end |
#to_polytex ⇒ Object
Converts Markdown to PolyTeX. We adopt a unified approach: rather than convert “Markdown” (I use the term loosely*) directly to HTML, we convert it to PolyTeX and then run everything through the PolyTeX pipeline. Happily, kramdown comes equipped with a ‘to_latex` method that does most of the heavy lifting. The ouput isn’t as clean as that produced by Pandoc (our previous choice), but it comes with significant advantages: (1) It’s written in Ruby, available as a gem, so its use eliminates an external dependency. (2) It’s the foundation for the “Markdown” interpreter used by Leanpub, so by using it ourselves we ensure greater compatibility with Leanpub books.
-
<rant>The number of mutually incompatible markup languages going
by the name “Markdown” is truly mind-boggling. Most of them add things to John Gruber’s original Markdown language in an ever-expanding attempt to bolt on the functionality needed to write longer documents. At this point, I fear that “Markdown” has become little more than a marketing term.</rant>
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
# File 'lib/polytexnic/preprocessors/polytex.rb', line 87 def to_polytex math_cache = {} cleaned_markdown = cache_code_environments(@source) (cleaned_markdown, Proc.new { |source| cache_code_environments(source) }) puts cleaned_markdown if debug? cleaned_markdown.tap do |markdown| convert_code_inclusion(markdown) cache_latex_literal(markdown) cache_raw_latex(markdown) cache_image_locations(markdown) puts markdown if debug? cache_math(markdown, math_cache) end puts cleaned_markdown if debug? # Override the header ordering, which starts with 'section' by default. lh = 'chapter,section,subsection,subsubsection,paragraph,subparagraph' kramdown = Kramdown::Document.new(cleaned_markdown, latex_headers: lh) puts kramdown.inspect if debug? puts kramdown.to_html if debug? puts kramdown.to_latex if debug? @source = kramdown.to_latex.tap do |polytex| remove_kramdown_comments(polytex) convert_includegraphics(polytex) restore_math(polytex, math_cache) restore_hashed_content(polytex) end end |