Class: RDoc::Markup::ToLaTeX
- Inherits:
-
Formatter
- Object
- Formatter
- RDoc::Markup::ToLaTeX
- Includes:
- Text
- Defined in:
- lib/rdoc/markup/to_latex.rb
Overview
This is an RDoc Converter/Formatter that turns the RDoc markup into LaTeX code. It’s intended for use with the RDoc::Generator::Papyrus class, but if you like you can use it on it’s own (but note this class absolutely depends on RDoc’s parser). To use it, you first have to instanciate this class, and then call the #convert method on it with the text you want to convert:
f = RDoc::Markup::ToLaTeX.new
f.convert("A *bold* and +typed+ text.")
Should result in:
A \textbf{bold} and \texttt{typed} text.
If for any reason you want to just escape LaTeX control characters, you may do so by calling the #escape method. See it’s documentation for an example.
Some parts of this class are heavily inspired by RDoc’s own code, namely:
-
::new
-
#handle_special_TIDYLINK
See each method’s descriptions for more details.
How to write a formatter
RDoc offers an easy to adapt visitor pattern for creating new formatters. “Easy” to a certain extend, as soon as you get into inline formatting RDoc’s documentation lacks some serious information. Nevertheless, I’ll describe the process of formatting here, even if I reiterate some of the concepts the documentation for class RDoc::Markup::Formatter mentions.
First, you have to derive your class from RDoc::Markup::Formatter and then obscurely have to include the RDoc::Text module, because this one is responsible for parsing inline markup.
Assuming you already wrote a generator making use of your formatter (because without writing a generator, writing a formatter is a somewhat nonsense undertaking as noone instanciates the class), I continue on how RDoc interacts with your formatter class.
So, somewhere in your generator you call the ::new method of your formatter (preferably inside the YourGenerator#formatter method, but I assume you know this as it belongs to writing generators and not formatters). Ensure that this method takes at least one argument called markup
, which defaults to a nil
value! Till now I didn’t really find out what it’s for, but the RDoc::Markup::Formatter::new method expects it, so we should obey it. All other arguments are up to your choice, just ensure that you call super
inside your initialize
method with the markup
parameter as it’s sole argument.
The next task for your initialize
is to tell RDoc how to cope with the three inline formatting sequences: Bold, italic and teletypd text. Call the add_tag
inherited from the Formatter class and pass it one of the following symbols along with how you want to transform the given sequence:
-
:BOLD
-
:EM
-
:TT
If you want to add so-called “specials” to your formatter (and you’re likely to, as hyperlinks are such specials), you have to dig around in RDoc’s own formatters, namely RDoc::Markup::ToHtml, and find out that there’s an instance variable called @markup that allows you to do this. Call it’s add_special
method with a regular expression that finds your special and a name for it as a symbol. RDoc itself uses the following specials:
@markup.add_special(/((link:|https?:|mailto:|ftp:|www\.)\S+\w)/, :HYPERLINK)
@markup.add_special(/(((\{.*?\})|\b\S+?)\[\S+?\.\S+?\])/, :TIDYLINK)
If you add a special, you have to provide a handle_special_YOURSPECIAL
method in your formatter, where YOURSPECIAL corresponds to the symbol you previously passed to the add_special
method. This method gets passed a RDoc::Special object, from which you just need to know the text
method that retrieves the text your regular expression matched. Apply whatever you want, and return a string RDoc will incorporate into the result.
During the formatting process RDoc calls various methods on your formatter, the full list can be seen in the documentation for the class RDoc::Markup::Formatter. Note that those methods should not return a string–in fact, RDoc ignores their return values. You are expected to keep track of your formatted text, e.g. create an instance variable @result in your initialize
method and fill it with text in the methods called by RDoc.
When everything has been processed, RDoc calls the end_accepting
method on your formatter instance. It’s return value is expected to be the complete parsing result, so if you used a string instance variable @result as I recommended above, you should return it’s value from that method.
Inline formatting
This isn’t as hard as I explained earlier, but you have to know what to do, otherwise you’ll be stuck with paragraphs being treated as paragraphs as a whole, but no inline formatting happens. So, to achieve this, you have to define a method that initiates the inline formatting process, RDoc’s HTML formatter’s method is RDoc::Markup::HTML#to_html, so you may choose a name fitting that name scheme (I did for this formatter as well, but the to_latex
method is private). You then call this method inside your accept_paragraph
method with the paragraph’s text as it’s argument. The content of the method cannot be known if you didn’t dig around in RDoc’s formatter sources–it’s the following:
convert_flow(@am.flow(paragraph_text_here))
So, what does this do? It uses the superclass’s (undocumented) instance variable @am, which is an instance of RDoc::AttributeFormatter that is responsible for keeping track of which inline text attributes to apply where. It has this magic method called flow
which takes one argument: The text of the paragraph you want to format. It tokenizes the paragraph into little pieces of some RDoc tokens and plain strings and returns them as an array (yes, this was the inline parsing process). We then take that token array and pass it directly to the convert_flow
method (inhertied from the Formatter class) which knows how to handle the token sequence and comes back to your formatter instance each time it wants to format something, bold or teletyped text for instance (remember? You defined that with add_tag
). If you want to format plain text without any special markup as well (I had to for the LaTeX formatter, because for LaTeX several characters have to be escaped even in nonformatted text, e.g. the underscore) you have to provide the method convert_string
. It will get passed all strings that don’t have any markup applied; it’s return value will be in the final result.
Direct Known Subclasses
Constant Summary collapse
- LIST_TYPE2LATEX =
Maps RDoc’s list types to the corresponding LaTeX ones.
{ :BULLET => ["\\begin{itemize}", "\\end{itemize}"], :NUMBER => ["\\begin{enumerate}", "\\end{enumerate}"], :LABEL => ["\\begin{description}", "\\end{description}"], :UALPHA => ["\\begin{ualphaenum}", "\\end{ualphaenum}"], :LALPHA => ["\\begin{lalphaenum}", "\\end{lalphaenum}"], :NOTE => ["\\begin{description}", "\\end{description}"] }.freeze
- LATEX_HEADINGS =
LaTeX heading commands. 0 is nil as there is no zeroth heading.
[nil, #Dummy, no hash needed with this "\\section{%s}", #h1 "\\subsection{%s}", #h2 "\\subsubsection{%s}", #h3 "\\subsubsubsection{%s}", #h4 "\\microsection*{%s}", #h5 "\\paragraph*{%s.} ", #h6 "\\subparagraph*{%s}", #Needed?? "%s", "%s", "%s", "%s", "%s", "%s"].freeze
- LATEX_OPT_HEADINGS =
LaTeX heading commands for headings with an optional argument to change the TOC entry. Just reach till level 4, because lower headings don’t show up in the TOC at all.
[nil, "\\section[%s]{%s}", "\\subsection[%s]{%s}", "\\subsubsection[%s]{%s}", "\\subsubsubsection[%s]{%s}" ]
- LATEX_SPECIAL_CHARS =
Characters that need to be escaped for LaTeX and their corresponding escape sequences. Note the order if important, otherwise some things (especiallaly \ and {}) are escaped twice.
{ /\\/ => "\\textbackslash{}", /\$/ => "\\$", /#/ => "\\#", /%/ => "\\%", /\^/ => "\\^", /&/ => "\\\\&", #WTF? \\& in gsub doesn't do anything?! TODO: File Ruby bug when back from vaction... /(?<!textbackslash){/ => "\\{", /(?<!textbackslash{)}/ => "\\}", /_/ => "\\textunderscore{}", /\.{3}/ => "\\ldots{}", /~/ => "\\~", /©/ => "\\copyright{}", /LaTeX/ => "\\LaTeX{}" }.freeze
Instance Attribute Summary collapse
-
#heading_level ⇒ Object
readonly
Level relative to which headings are produced from this formatter.
-
#list_in_progress ⇒ Object
readonly
The innermost type of list we’re currently in or
nil
if we don’t process a list at the moment. -
#result ⇒ Object
(also: #res)
readonly
Contains everything processed so far as a string.
Instance Method Summary collapse
-
#accept_blank_line(line) ⇒ Object
Termiantes a paragraph by inserting two newlines.
-
#accept_heading(head) ⇒ Object
Adds a fitting section, subsection, etc.
-
#accept_list_end(list) ⇒ Object
Adds endlist_type.
-
#accept_list_item_end(item) ⇒ Object
Adds a terminating n for a list item if this is necessary (usually the newline is automatically created by processing the list paragraph).
-
#accept_list_item_start(item) ⇒ Object
Adds item.
-
#accept_list_start(list) ⇒ Object
Adds begintype>.
-
#accept_paragraph(par) ⇒ Object
Adds par’s text plus newline to the result.
-
#accept_raw(raw) ⇒ Object
Writes the raw thing as-is into the document.
-
#accept_rule(rule) ⇒ Object
Adds a rule.
-
#accept_verbatim(ver) ⇒ Object
Puts ver’s text between beginverbatim and endverbatim.
-
#convert_string(str) ⇒ Object
Called for each plaintext string in a paragraph by the #convert_flow method called in #to_latex.
-
#end_accepting ⇒ Object
Last method called.
-
#escape(str) ⇒ Object
Escapes all LaTeX control characters from a string.
-
#handle_special_HYPERLINK(special) ⇒ Object
Handles raw hyperlinks.
-
#handle_special_TIDYLINK(special) ⇒ Object
Method copied from RDoc project and slightly modified.
-
#initialize(heading_level = 0, inputencoding = "UTF-8", markup = nil) ⇒ ToLaTeX
constructor
Instanciates this formatter.
-
#start_accepting ⇒ Object
First method called.
Constructor Details
#initialize(heading_level = 0, inputencoding = "UTF-8", markup = nil) ⇒ ToLaTeX
Instanciates this formatter.
Parameters
- heading_level
-
Minimum heading level. Useful for context-based heading;
a value of 1 indicates that all requested level 2 headings
are turned into level 3 ones; a value of 2 would turn them
into level 4 ones.
- markup
-
Parameter expected by the superclass. TODO: What for?
Return value
A new instance of this class.
Example
f = RDoc::Formatter::ToLaTeX.new
puts f.convert("Some *bold* text") #=> Some \textbf{bold} text
Remarks
Some lines of this method have their origin in the RDoc project. See the code for more details.
234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 |
# File 'lib/rdoc/markup/to_latex.rb', line 234 def initialize(heading_level = 0, inputencoding = "UTF-8", markup = nil) super(markup) @heading_level = heading_level @inputencoding = "UTF-8" @result = "" @list_in_progress = nil #Copied from RDoc 3.8, adds link capabilities @markup.add_special(/((link:|https?:|mailto:|ftp:|www\.)\S+\w)/, :HYPERLINK) @markup.add_special(/(((\{.*?\})|\b\S+?)\[\S+?\.\S+?\])/, :TIDYLINK) #Add definitions for inline markup add_tag(:BOLD, "\\textbf{", "}") add_tag(:TT, "\\verb~", "~") add_tag(:EM, "\\textit{", "}") end |
Instance Attribute Details
#heading_level ⇒ Object (readonly)
Level relative to which headings are produced from this formatter. E.g., if this is 1, and the user requests a level 2 heading, he actually gets a level 3 one.
211 212 213 |
# File 'lib/rdoc/markup/to_latex.rb', line 211 def heading_level @heading_level end |
#list_in_progress ⇒ Object (readonly)
The innermost type of list we’re currently in or nil
if we don’t process a list at the moment.
217 218 219 |
# File 'lib/rdoc/markup/to_latex.rb', line 217 def list_in_progress @list_in_progress end |
#result ⇒ Object (readonly) Also known as: res
Contains everything processed so far as a string.
213 214 215 |
# File 'lib/rdoc/markup/to_latex.rb', line 213 def result @result end |
Instance Method Details
#accept_blank_line(line) ⇒ Object
Termiantes a paragraph by inserting two newlines.
319 320 321 322 |
# File 'lib/rdoc/markup/to_latex.rb', line 319 def accept_blank_line(line) @result.chomp! @result << "\n\n" end |
#accept_heading(head) ⇒ Object
Adds a fitting section, subsection, etc. for the heading.
325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 |
# File 'lib/rdoc/markup/to_latex.rb', line 325 def accept_heading(head) #Verbatim text inside headings is one of LaTeX’s ways to hell. #We need to take special care of this by means of fancyvrb’s #\SaveVerb command plus suppressing the verbatim inside the TOC. hsh = save_verbs(enc(head.text)) if hsh[:save_verbs].empty? @result << sprintf(LATEX_HEADINGS[@heading_level + head.level], hsh[:save_inline]) << "\n" else #OK, some fool must have verbatim in the heading... @result << hsh[:save_verbs] heading = LATEX_OPT_HEADINGS[@heading_level + head.level] if heading @result << sprintf(heading, hsh[:plain_inline], hsh[:save_inline]) << "\n" else #Heading not in TOC @result << sprintf(LATEX_HEADINGS[@heading_level + head.level], hsh[:save_inline]) << "\n" end end end |
#accept_list_end(list) ⇒ Object
Adds endlist_type.
289 290 291 292 |
# File 'lib/rdoc/markup/to_latex.rb', line 289 def accept_list_end(list) @result << LIST_TYPE2LATEX[list.type][1] << "\n" @list_in_progress = nil end |
#accept_list_item_end(item) ⇒ Object
Adds a terminating n for a list item if this is necessary (usually the newline is automatically created by processing the list paragraph).
314 315 316 |
# File 'lib/rdoc/markup/to_latex.rb', line 314 def accept_list_item_end(item) @result << "\n" unless @result.end_with?("\n") end |
#accept_list_item_start(item) ⇒ Object
Adds item.
295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 |
# File 'lib/rdoc/markup/to_latex.rb', line 295 def accept_list_item_start(item) if item.label #Verbatim inside list labels is dangerous! hsh = save_verbs(enc(item.label)) @result << hsh[:save_verbs] if @list_in_progress == :NOTE @result << "\\item[#{hsh[:save_inline]}:] " #Newline done by ending paragraph else @result << "\\item[#{hsh[:save_inline]}] " #Newline done by ending paragraph end else @result << "\\item " #Newline done by ending method end end |
#accept_list_start(list) ⇒ Object
Adds begintype>.
283 284 285 286 |
# File 'lib/rdoc/markup/to_latex.rb', line 283 def accept_list_start(list) @list_in_progress = list.type @result << LIST_TYPE2LATEX[list.type][0] << "\n" end |
#accept_paragraph(par) ⇒ Object
Adds par’s text plus newline to the result.
267 268 269 |
# File 'lib/rdoc/markup/to_latex.rb', line 267 def accept_paragraph(par) @result << to_latex(enc(par.text)) << "\n" end |
#accept_raw(raw) ⇒ Object
Writes the raw thing as-is into the document.
345 346 347 |
# File 'lib/rdoc/markup/to_latex.rb', line 345 def accept_raw(raw) @result << raw.parts.join("\n") end |
#accept_rule(rule) ⇒ Object
Adds a rule. The rule’s height is rule.weight
pt, the rule’s width textwidth.
278 279 280 |
# File 'lib/rdoc/markup/to_latex.rb', line 278 def accept_rule(rule) @result << "\\par\\noindent\\rule{\\textwidth}{" << rule.weight.to_s << "pt}\\par\n" end |
#accept_verbatim(ver) ⇒ Object
Puts ver’s text between beginverbatim and endverbatim
272 273 274 |
# File 'lib/rdoc/markup/to_latex.rb', line 272 def accept_verbatim(ver) @result << "\\begin{Verbatim}\n" << enc(ver.text).chomp << "\n\\end{Verbatim}\n" end |
#convert_string(str) ⇒ Object
Called for each plaintext string in a paragraph by the #convert_flow method called in #to_latex.
356 357 358 359 360 361 362 |
# File 'lib/rdoc/markup/to_latex.rb', line 356 def convert_string(str) if in_tt? enc(str) else escape(enc(str)) end end |
#end_accepting ⇒ Object
Last method called. Supposed to return the result string.
262 263 264 |
# File 'lib/rdoc/markup/to_latex.rb', line 262 def end_accepting @result end |
#escape(str) ⇒ Object
Escapes all LaTeX control characters from a string.
Parameter
- str
-
The string to remove the characters from.
Return value
A new string with many backslashes. :-)
Example
f = RDoc::Markup::ToLaTeX.new
str = "I paid 20$ to buy the_item #15."
puts f.escape(str) #=> I paid 20\$ to buy the\textunderscore{}item \#15.
386 387 388 389 390 391 392 |
# File 'lib/rdoc/markup/to_latex.rb', line 386 def escape(str) result = str.dup LATEX_SPECIAL_CHARS.each_pair do |regexp, escape_seq| result.gsub!(regexp, escape_seq) end result end |
#handle_special_HYPERLINK(special) ⇒ Object
Handles raw hyperlinks.
350 351 352 |
# File 'lib/rdoc/markup/to_latex.rb', line 350 def handle_special_HYPERLINK(special) make_url(special.text) end |
#handle_special_TIDYLINK(special) ⇒ Object
367 368 369 370 371 372 373 374 375 |
# File 'lib/rdoc/markup/to_latex.rb', line 367 def handle_special_TIDYLINK(special) text = enc(special.text) return escape(text) unless text =~ /\{(.*?)\}\[(.*?)\]/ or text =~ /(\S+)\[(.*?)\]/ label = $1 url = $2 make_url url, escape(label) end |
#start_accepting ⇒ Object
First method called.
257 258 259 |
# File 'lib/rdoc/markup/to_latex.rb', line 257 def start_accepting @result = "" end |