Class: RDocF95::Markup
- Inherits:
-
Object
- Object
- RDocF95::Markup
- Defined in:
- lib/rdoc-f95/markup.rb,
lib/rdoc-f95/markup/lines.rb,
lib/rdoc-f95/markup/inline.rb,
lib/rdoc-f95/markup/to_flow.rb,
lib/rdoc-f95/markup/fragments.rb
Overview
RDocF95::Markup parses plain text documents and attempts to decompose them into their constituent parts. Some of these parts are high-level: paragraphs, chunks of verbatim text, list entries and the like. Other parts happen at the character level: a piece of bold text, a word in code font. This markup is similar in spirit to that used on WikiWiki webs, where folks create web pages using a simple set of formatting rules.
RDocF95::Markup itself does no output formatting: this is left to a different set of classes.
RDocF95::Markup is extendable at runtime: you can add new markup elements to be recognised in the documents that RDocF95::Markup parses.
RDocF95::Markup is intended to be the basis for a family of tools which share the common requirement that simple, plain-text should be rendered in a variety of different output formats and media. It is envisaged that RDocF95::Markup could be the basis for formating RDoc style comment blocks, Wiki entries, and online FAQs.
Basic Formatting
-
RDocF95::Markup looks for a document’s natural left margin. This is used as the initial margin for the document.
-
Consecutive lines starting at this margin are considered to be a paragraph.
-
If a paragraph starts with a “*”, “-”, or with “<digit>.”, then it is taken to be the start of a list. The margin in increased to be the first non-space following the list start flag. Subsequent lines should be indented to this new margin until the list ends. For example:
* this is a list with three paragraphs in the first item. This is the first paragraph. And this is the second paragraph. 1. This is an indented, numbered list. 2. This is the second item in that list This is the third conventional paragraph in the first list item. * This is the second item in the original list
-
You can also construct labeled lists, sometimes called description or definition lists. Do this by putting the label in square brackets and indenting the list body:
[cat] a small furry mammal that seems to sleep a lot [ant] a little insect that is known to enjoy picnics
A minor variation on labeled lists uses two colons to separate the label from the list body:
cat:: a small furry mammal that seems to sleep a lot ant:: a little insect that is known to enjoy picnics
This latter style guarantees that the list bodies’ left margins are aligned: think of them as a two column table.
-
Any line that starts to the right of the current margin is treated as verbatim text. This is useful for code listings. The example of a list above is also verbatim text.
-
A line starting with an equals sign (=) is treated as a heading. Level one headings have one equals sign, level two headings have two,and so on.
-
A line starting with three or more hyphens (at the current indent) generates a horizontal rule. The more hyphens, the thicker the rule (within reason, and if supported by the output device)
-
You can use markup within text (except verbatim) to change the appearance of parts of that text. Out of the box, RDocF95::Markup supports word-based and general markup.
Word-based markup uses flag characters around individual words:
- *word*
-
displays word in a bold font
- _word_
-
displays word in an emphasized font
- word
-
displays word in a
code
font
General markup affects text between a start delimiter and and end delimiter. Not surprisingly, these delimiters look like HTML markup.
- <b>text…</b>
-
displays word in a bold font
- <em>text…</em>
-
displays word in an emphasized font
- <i>text…</i>
-
displays word in an emphasized font
- <tt>text…</tt>
-
displays word in a
code
font
Unlike conventional Wiki markup, general markup can cross line boundaries. You can turn off the interpretation of markup by preceding the first character with a backslash, so \bold text and \bold produce <b>bold text</b> and *bold respectively.
-
Hyperlinks to the web starting http:, mailto:, ftp:, or www. are recognized. An HTTP url that references an external image file is converted into an inline <IMG..>. Hyperlinks starting ‘link:’ are assumed to refer to local files whose path is relative to the –op directory.
Hyperlinks can also be of the form
label
[url], in which case the label is used in the displayed text, andurl
is used as the target. Iflabel
contains multiple words, put it in braces: word label[url].
Synopsis
This code converts input_string
to HTML. The conversion takes place in the convert
method, so you can use the same RDocF95::Markup object to convert multiple input strings.
require 'rdoc-f95/markup'
require 'rdoc-f95/markup/to_html'
p = RDocF95::Markup.new
h = RDocF95::Markup::ToHtml.new
puts p.convert(input_string, h)
You can extend the RDocF95::Markup parser to recognise new markup sequences, and to add special processing for text that matches a regular epxression. Here we make WikiWords significant to the parser, and also make the sequences word and <no>text…</no> signify strike-through text. When then subclass the HTML output class to deal with these:
require 'rdoc-f95/markup'
require 'rdoc-f95/markup/to_html'
class WikiHtml < RDocF95::Markup::ToHtml
def handle_special_WIKIWORD(special)
"<font color=red>" + special.text + "</font>"
end
end
m = RDocF95::Markup.new
m.add_word_pair("{", "}", :STRIKE)
m.add_html("no", :STRIKE)
m.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
h = WikiHtml.new
h.add_tag(:STRIKE, "<strike>", "</strike>")
puts "<body>" + m.convert(ARGF.read, h) + "</body>"
–
- Author
-
Dave Thomas, [email protected]
- License
-
Ruby license
Defined Under Namespace
Modules: Flow Classes: AttrChanger, AttrSpan, Attribute, AttributeManager, BlankLine, Formatter, Fragment, Heading, Line, LineCollection, Lines, ListBase, ListEnd, ListItem, ListStart, Paragraph, PreProcess, Rule, Special, ToFlow, ToHtml, ToHtmlCrossref, ToLaTeX, ToTest, ToXHtmlTexParser, Verbatim
Constant Summary collapse
- SPACE =
?\s
- SIMPLE_LIST_RE =
List entries look like:
* text 1. text [label] text label:: text
Flag it as a list entry, and work out the indent for subsequent lines
/^( ( \* (?# bullet) |- (?# bullet) |\d+\. (?# numbered ) |[A-Za-z]\. (?# alphabetically numbered ) ) \s+ )\S/x
- LABEL_LIST_RE =
/^( ( \[.*?\] (?# labeled ) |\S.*:: (?# note ) )(?:\s+|$) )/x
Instance Method Summary collapse
-
#add_html(tag, name) ⇒ Object
Add to the sequences recognized as general markup.
-
#add_special(pattern, name) ⇒ Object
Add to other inline sequences.
-
#add_word_pair(start, stop, name) ⇒ Object
Add to the sequences used to add formatting to an individual word (such as bold).
-
#content ⇒ Object
For debugging, we allow access to our line contents as text.
-
#convert(str, op, block_exceptions = nil) ⇒ Object
We take a string, split it into lines, work out the type of each line, and from there deduce groups of lines (for example all lines in a paragraph).
-
#get_line_types ⇒ Object
For debugging, return the list of line types.
-
#initialize ⇒ Markup
constructor
Take a block of text and use various heuristics to determine it’s structure (paragraphs, lists, and so on).
Constructor Details
#initialize ⇒ Markup
Take a block of text and use various heuristics to determine it’s structure (paragraphs, lists, and so on). Invoke an event handler as we identify significant chunks.
194 195 196 197 |
# File 'lib/rdoc-f95/markup.rb', line 194 def initialize @am = RDocF95::Markup::AttributeManager.new @output = nil end |
Instance Method Details
#add_html(tag, name) ⇒ Object
Add to the sequences recognized as general markup.
211 212 213 |
# File 'lib/rdoc-f95/markup.rb', line 211 def add_html(tag, name) @am.add_html(tag, name) end |
#add_special(pattern, name) ⇒ Object
Add to other inline sequences. For example, we could add WikiWords using something like:
parser.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
Each wiki word will be presented to the output formatter via the accept_special method.
224 225 226 |
# File 'lib/rdoc-f95/markup.rb', line 224 def add_special(pattern, name) @am.add_special(pattern, name) end |
#add_word_pair(start, stop, name) ⇒ Object
Add to the sequences used to add formatting to an individual word (such as bold). Matching entries will generate attibutes that the output formatters can recognize by their name
.
204 205 206 |
# File 'lib/rdoc-f95/markup.rb', line 204 def add_word_pair(start, stop, name) @am.add_word_pair(start, stop, name) end |
#content ⇒ Object
For debugging, we allow access to our line contents as text.
489 490 491 |
# File 'lib/rdoc-f95/markup.rb', line 489 def content @lines.as_text end |
#convert(str, op, block_exceptions = nil) ⇒ Object
We take a string, split it into lines, work out the type of each line, and from there deduce groups of lines (for example all lines in a paragraph). We then invoke the output formatter using a Visitor to display the result.
234 235 236 237 238 239 240 241 242 243 244 245 246 |
# File 'lib/rdoc-f95/markup.rb', line 234 def convert(str, op, block_exceptions=nil) lines = str.split(/\r?\n/).map { |line| Line.new line } @lines = Lines.new lines @block_exceptions = block_exceptions return "" if @lines.empty? @lines.normalize assign_types_to_lines group = group_lines # call the output formatter to handle the result #group.each { |line| p line } group.accept @am, op end |
#get_line_types ⇒ Object
For debugging, return the list of line types.
497 498 499 |
# File 'lib/rdoc-f95/markup.rb', line 497 def get_line_types @lines.line_types end |