Class: Text::Reform
- Inherits:
-
Object
- Object
- Text::Reform
- Defined in:
- lib/text/reform.rb
Overview
Introduction
Text::Reform class is a rewrite from the perl module with the same name by Damian Conway ([email protected]). Much of this documentation has been copied from the original documentation and adapted to the Ruby version.
The interface is subject to change, since it will undergo major Rubyfication.
Synopsis
require 'text/reform'
f = Text::Reform.new
puts f.format(template, data)
Description
The Reform#format method
Reform#format takes a series of format (or “picture”) strings followed by replacement values, interpolates those values into each picture string, and returns the result.
A picture string consists of sequences of the following characters:
- <
-
Left-justified field indicator. A series of two or more sequential <‘s specify a left-justified field to be filled by a subsequent value. A single < is formatted as the literal character ’<‘.
- >
-
Right-justified field indicator. A series of two or more sequential >‘s specify a right-justified field to be filled by a subsequent value. A single < is formatted as the literal character ’<‘.
- <<>>
-
Fully-justified field indicator. Field may be of any width, and brackets need not balance, but there must be at least 2 ‘<’ and 2 ‘>’.
- ^
-
Centre-justified field indicator. A series of two or more sequential ^‘s specify a centred field to be filled by a subsequent value. A single ^ is formatted as the literal character ’<‘.
- >>.<<<<
-
A numerically formatted field with the specified number of digits to either side of the decimal place. See _Numerical formatting_ below.
- [
-
Left-justified block field indicator. Just like a < field, except it repeats as required on subsequent lines. See below. A single [ is formatted as the literal character ‘[’.
- ]
-
Right-justified block field indicator. Just like a > field, except it repeats as required on subsequent lines. See below. A single ] is formatted as the literal character ‘]’.
- [[]]
-
Fully-justified block field indicator. Just like a <<<>>> field, except it repeats as required on subsequent lines. See below. Field may be of any width, and brackets need not balance, but there must be at least 2 ‘[’ and 2 ‘]’.
- |
-
Centre-justified block field indicator. Just like a ^ field, except it repeats as required on subsequent lines. See below. A single | is formatted as the literal character ‘|’.
- ]]].[[[[
-
A numerically formatted block field with the specified number of digits to either side of the decimal place. Just like a >>>.<<<< field, except it repeats as required on subsequent lines. See below.
- ~
-
A one-character wide block field.
- \
-
Literal escape of next character (e.g. ++ is formatted as ‘~’, not a one character wide block field).
- Any other character
-
That literal character.
Any substitution value which is nil
(either explicitly so, or because it is missing) is replaced by an empty string.
Controlling Reform instance options
There are several ways to influence options set in the Reform instance:
-
At creation:
# using a hash r1 = Text::Reform.new(:squeeze => true) # using a block r2 = Text::Reform.new do |rf| rf.squeeze = true rf.fill = true end
-
Using accessors:
r = Text::Reform.new r.squeeze = true r.fill = true
The Perl way of interleaving option changes with picture strings and data is currently NOT supported.
Controlling line filling
#squeeze replaces sequences of spaces or tabs to be replaced with a single space; #fill removes newlines from the input. To minimize all whitespace, you need to specify both options. Hence:
format = "EG> [[[[[[[[[[[[[[[[[[[[["
data = "h e\t l lo\nworld\t\t\t\t\t"
r = Text::Reform.new
r.squeeze = false # default, implied
r.fill = false # default, implied
puts r.format(format, data)
# all whitespace preserved:
#
# EG> h e l lo
# EG> world
r.squeeze = true
r.fill = false # default, implied
puts r.format(format, data)
# only newlines preserved
#
# EG> h e l lo
# EG> world
r.squeeze = false # default, implied
r.fill = true
puts r.format(format, data)
# only spaces/tabs preserved:
#
# EG> h e l lo world
r.fill = true
r.squeeze = true
puts r.format(format, data)
# no whitespace preserved:
#
# EG> h e l lo world
Whether or not filling or squeezing is in effect, #format can also be directed to trim any extra whitespace from the end of each line it formats, using the #trim option. If this option is specified with a true
value, every line returned by #format will automatically have the substitution .gsub!(/[ t]/, ”)+ applied to it.
r.format("[[[[[[[[[[[", 'short').length # => 11
r.trim = true
r.format("[[[[[[[[[[[", 'short').length # => 6
It is also possible to control the character used to fill lines that are too short, using the #filler option. If this option is specified the value of the #filler flag is used as the fill string, rather than the default “ ”.
For example:
r.filler = '*'
print r.format("Pay bearer: ^^^^^^^^^^^^^^^^^^^^", '$123.4')
prints:
Pay bearer: *******$123.4*******
If the filler string is longer than one character, it is truncated to the appropriate length. So:
r.filler = '-->'
print r.format("Pay bearer: ^^^^^^^^^^^^^^^^^^^^", '$123.4')
print r.format("Pay bearer: ^^^^^^^^^^^^^^^^^^^^", '$13.4')
print r.format("Pay bearer: ^^^^^^^^^^^^^^^^^^^^", '$1.4')
prints:
Pay bearer: -->-->-$123.4-->-->-
Pay bearer: -->-->--$13.4-->-->-
Pay bearer: -->-->--$1.4-->-->--
If the value of the #filler option is a hash, then its :left
and :right
entries specify separate filler strings for each side of an interpolated value.
Options
The Perl variant supports option switching during processing of the arguments of a single call to #format. This has been removed while porting to Ruby, since I believe that this does not add to clarity of code. So you have to change options explicitly.
Data argument types and handling
The data
part of the call to format can be either in String form, the items being newline separated, or in Array form. The array form can contain any kind of type you want, as long as it supports #to_s.
So all of the following examples return the same result:
# String form
r.format("]]]].[[", "1234\n123")
# Array form
r.format("]]]].[[", [ 1234, 123 ])
# Array with another type
r.format("]]]].[[", [ 1234.0, 123.0 ])
Multi-line format specifiers and interleaving
By default, if a format specifier contains two or more lines (i.e. one or more newline characters), the entire format specifier is repeatedly filled as a unit, until all block fields have consumed their corresponding arguments. For example, to build a simple look-up table:
values = (1..12).to_a
squares = values.map { |el| sprintf "%.6g", el**2 }
roots = values.map { |el| sprintf "%.6g", Math.sqrt(el) }
logs = values.map { |el| sprintf "%.6g", Math.log(el) }
inverses = values.map { |el| sprintf "%.6g", 1/el }
puts reform.format(
" N N**2 sqrt(N) log(N) 1/N",
"=====================================================",
"| [[ | [[[ | [[[[[[[[[[ | [[[[[[[[[ | [[[[[[[[[ |" +
"-----------------------------------------------------",
values, squares, roots, logs, inverses
)
The multiline format specifier:
"| [[ | [[[ | [[[[[[[[[[ | [[[[[[[[[ | [[[[[[[[[ |" +
"-----------------------------------------------------"
is treated as a single logical line. So #format alternately fills the first physical line (interpolating one value from each of the arrays) and the second physical line (which puts a line of dashes between each row of the table) producing:
N N**2 sqrt(N) log(N) 1/N
=====================================================
| 1 | 1 | 1 | 0 | 1 |
-----------------------------------------------------
| 2 | 4 | 1.41421 | 0.693147 | 0.5 |
-----------------------------------------------------
| 3 | 9 | 1.73205 | 1.09861 | 0.333333 |
-----------------------------------------------------
| 4 | 16 | 2 | 1.38629 | 0.25 |
-----------------------------------------------------
| 5 | 25 | 2.23607 | 1.60944 | 0.2 |
-----------------------------------------------------
| 6 | 36 | 2.44949 | 1.79176 | 0.166667 |
-----------------------------------------------------
| 7 | 49 | 2.64575 | 1.94591 | 0.142857 |
-----------------------------------------------------
| 8 | 64 | 2.82843 | 2.07944 | 0.125 |
-----------------------------------------------------
| 9 | 81 | 3 | 2.19722 | 0.111111 |
-----------------------------------------------------
| 10 | 100 | 3.16228 | 2.30259 | 0.1 |
-----------------------------------------------------
| 11 | 121 | 3.31662 | 2.3979 | 0.0909091 |
-----------------------------------------------------
| 12 | 144 | 3.4641 | 2.48491 | 0.0833333 |
-----------------------------------------------------
This implies that formats and the variables from which they’re filled need to be interleaved. That is, a multi-line specification like this:
puts r.format(
"Passed: ##
[[[[[[[[[[[[[[[ # single format specification
Failed: # (needs two sets of data)
[[[[[[[[[[[[[[[", ##
passes, fails) ## data for previous format
would print:
Passed:
<pass 1>
Failed:
<fail 1>
Passed:
<pass 2>
Failed:
<fail 2>
Passed:
<pass 3>
Failed:
<fail 3>
because the four-line format specifier is treated as a single unit, to be repeatedly filled until all the data in passes
and fails
has been consumed.
Unlike the table example, where this unit filling correctly put a line of dashes between lines of data, in this case the alternation of passes and fails is probably /not/ the desired effect.
Judging by the labels, it is far more likely that the user wanted:
Passed:
<pass 1>
<pass 2>
<pass 3>
Failed:
<fail 4>
<fail 5>
<fail 6>
To achieve that, either explicitly interleave the formats and their data sources:
puts r.format(
"Passed:", ## single format (no data required)
" [[[[[[[[[[[[[[[", ## single format (needs one set of data)
passes, ## data for previous format
"Failed:", ## single format (no data required)
" [[[[[[[[[[[[[[[", ## single format (needs one set of data)
fails) ## data for previous format
or instruct #format to do it for you automagically, by setting the ‘interleave’ flag true
:
r.interleave = true
puts r.format(
"Passed: ##
[[[[[[[[[[[[[[[ # single format
Failed: # (needs two sets of data)
[[[[[[[[[[[[[[[", ##
## data to be automagically interleaved
passes, fails) # as necessary between lines of previous
## format
How #format hyphenates
Any line with a block field repeats on subsequent lines until all block fields on that line have consumed all their data. Non-block fields on these lines are replaced by the appropriate number of spaces.
Words are wrapped whole, unless they will not fit into the field at all, in which case they are broken and (by default) hyphenated. Simple hyphenation is used (i.e. break at the N-1th character and insert a ‘-’), unless a suitable alternative subroutine is specified instead.
Words will not be broken if the break would leave less than 2 characters on the current line. This minimum can be varied by setting the min_break
option to a numeric value indicating the minumum total broken characters (including hyphens) required on the current line. Note that, for very narrow fields, words will still be broken (but __unhyphenated__). For example:
puts r.format('~', 'split')
would print:
s
p
l
i
t
whilst:
r.min_break= 1
puts r.format('~', 'split')
would print:
s-
p-
l-
i-
t
Alternative breaking strategies can be specified using the “break” option in a configuration hash. For example:
r.break = MyBreaker.new
r.format(fmt, data)
#format expects a user-defined line-breaking strategy to listen to the method #break that takes three arguments (the string to be broken, the maximum permissible length of the initial section, and the total width of the field being filled). #break must return a list of two strings: the initial (broken) section of the word, and the remainder of the string respectivly).
For example:
class MyBreaker
def break(str, initial, total)
[ str[0, initial-1].'~'], str[initial-1..-1] ]
end
end
r.break = MyBreaker.new
makes ‘~’ the hyphenation character, whilst:
class WrapAndSlop
def break(str, initial, total)
if (initial == total)
str =~ /\A(\s*\S*)(.*)/
[ $1, $2 ]
else
[ '', str ]
end
end
end
r.break = WrapAndSlop.new
wraps excessively long words to the next line and “slops” them over the right margin if necessary.
The Text::Reform class provides three functions to simplify the use of variant hyphenation schemes. Text::Reform::break_wrap returns an instance implementing the “wrap-and-slop” algorithm shown in the last example, which could therefore be rewritten:
r.break = Text::Reform.break_wrap
Text::Reform::break_with takes a single string argument and returns an instance of a class which hyphenates by cutting off the text at the right margin and appending the string argument. Hence the first of the two examples could be rewritten:
r.break = Text::Reform.break_with('~')
The method Text::Reform::break_at takes a single string argument and returns a reference to a sub which hyphenates by breaking immediately after that string. For example:
r.break = Text::Reform.break_at('-')
r.format("[[[[[[[[[[[[[[", "The Newton-Raphson methodology")
returns:
"The Newton-
Raphson
methodology"
Note that this differs from the behaviour of Text::Reform::break_with, which would be:
r.break = Text::Reform.break_width('-')
r.format("[[[[[[[[[[[[[[", "The Newton-Raphson methodology")
returns:
"The Newton-R-
aphson metho-
dology"
Choosing the correct breaking strategy depends on your kind of data.
The method Text::Reform::break_hyphen returns an instance of a class which hyphenates using a Ruby hyphenator. The hyphenator must be provided to the method. At the time of release, there are two implementations of hyphenators available: TeX::Hyphen by Martin DeMello and Austin Ziegler (a Ruby port of Jan Pazdziora’s TeX::Hyphen module); and Text::Hyphen by Austin Ziegler (a significant recoding of TeX::Hyphen to better support non-English languages).
For example:
r.break = Text::Reform.break_hyphen
Note that in the previous example the calls to .break_at, .break_wrap and .break_hyphen produce instances of the corresponding strategy class.
The algorithm #format uses is:
-
If interleaving is specified, split the first string in the argument list into individual format lines and add a terminating newline (unless one is already present). therwise, treat the entire string as a single “line” (like /s does in regexes)
-
For each format line…
-
determine the number of fields and shift that many values off the argument list and into the filling list. If insufficient arguments are available, generate as many empty strings as are required.
-
generate a text line by filling each field in the format line with the initial contents of the corresponding arg in the filling list (and remove those initial contents from the arg).
-
replace any <,>, or ^ fields by an equivalent number of spaces. Splice out the corresponding args from the filling list.
-
Repeat from step 2.2 until all args in the filling list are empty.
-
-
concatenate the text lines generated in step 2
Note that in difference to the Perl version of Text::Reform, this version does not currently loop over several format strings in one function call.
Reform#format examples
As an example of the use of #format, the following:
count = 1
text = "A big long piece of text to be formatted exquisitely"
output = ''
output << r.format(" |||| <<<<<<<<<< ", count, text)
output << r.format(" ---------------- ",
" ^^^^ ]]]]]]]]]]| ", count+11, text)
results in output
:
1 A big lon-
----------------
12 g piece|
of text|
to be for-|
matted ex-|
quisitely|
Note that block fields in a multi-line format string, cause the entire multi-line format to be repeated as often as necessary.
Unlike traditional Perl #format arguments, picture strings and arguments cannot be interleaved in Ruby version. This is partly by intention to see if the feature is a feature or if it can be disposed with. Another example:
report = ''
report << r.format(
'Name Rank Serial Number',
'==== ==== =============',
'<<<<<<<<<<<<< ^^^^ <<<<<<<<<<<<<',
name, rank, serial_number
)
results in:
Name Rank Serial Number
==== ==== =============
John Doe high 314159
Numerical formatting
The “>>>.<<<” and “]]].[[[” field specifiers may be used to format numeric values about a fixed decimal place marker. For example:
puts r.format('(]]]]].[[)', %w{
1
1.0
1.001
1.009
123.456
1234567
one two
})
would print:
( 1.0)
( 1.0)
( 1.00)
( 1.01)
( 123.46)
(#####.##)
(?????.??)
(?????.??)
Fractions are rounded to the specified number of places after the decimal, but only significant digits are shown. That’s why, in the above example, 1 and 1.0 are formatted as “1.0”, whilst 1.001 is formatted as “1.00”.
You can specify that the maximal number of decimal places always be used by giving the configuration option ‘numeric’ the value NUMBERS_ALL_PLACES. For example:
r.numeric = Text::Reform::NUMBERS_ALL_PLACES
puts r.format('(]]]]].[[)', <<EONUMS)
1
1.0
EONUMS
would print:
( 1.00)
( 1.00)
Note that although decimal digits are rounded to fit the specified width, the integral part of a number is never modified. If there are not enough places before the decimal place to represent the number, the entire number is replaced with hashes.
If a non-numeric sequence is passed as data for a numeric field, it is formatted as a series of question marks. This querulous behaviour can be changed by giving the configuration option ‘numeric’ a value that matches /bSkipNaNb/i in which case, any invalid numeric data is simply ignored. For example:
r.numeric = Text::Reform::NUMBERS_SKIP_NAN
puts r.format('(]]]]].[[)', %w{
1
two three
4
})
would print:
( 1.0)
( 4.0)
Filling block fields with lists of values
If an argument contains an array, then #format automatically joins the elements of the array into a single string, separating each element with a newline character. As a result, a call like this:
svalues = %w{ 1 10 100 1000 }
nvalues = [1, 10, 100, 1000]
puts r.format(
"(]]]].[[)",
svalues # you could also use nvalues here.
)
will print out
( 1.00)
( 10.00)
(100.00)
(1000.00)
as might be expected.
Note: While String arguments are consumed during formatting process and will be empty at the end of formatting, array arguments are not. So svalues (nvalues) still contains [1,10,100,1000] after the call to #format.
Headers, footers, and pages
The #format method can also insert headers, footers, and page-feeds as it formats. These features are controlled by the “header”, “footer”, “page_feed”, “page_len”, and “page_num” options.
If the page_num
option is set to an Integer value, page numbering will start at that value.
The page_len
option specifies the total number of lines in a page (including headers, footers, and page-feeds).
The page_width
option specifies the total number of columns in a page.
If the header
option is specified with a string value, that string is used as the header of every page generated. If it is specified as a block, that block is called at the start of every page and its return value used as the header string. When called, the block is passed the current page number.
Likewise, if the footer
option is specified with a string value, that string is used as the footer of every page generated. If it is specified as a block, that block is called at the start of every page and its return value used as the footer string. When called, the footer block is passed the current page number.
Both the header and footer options can also be specified as hash references. In this case the hash entries for keys left
, centre
(or center
), and right
specify what is to appear on the left, centre, and right of the header/footer. The entry for the key width
specifies how wide the footer is to be. If the width
key is omitted, the page_width
configuration option (which defaults to 72 characters) is used.
The :left
, :centre
, and :right
values may be literal strings, or blocks (just as a normal header/footer specification may be.) See the second example, below.
Another alternative for header and footer options is to specify them as a block that returns a hash reference. The subroutine is called for each page, then the resulting hash is treated like the hashes described in the preceding paragraph. See the third example, below.
The page_feed
option acts in exactly the same way, to produce a page_feed which is appended after the footer. But note that the page_feed is not counted as part of the page length.
All three of these page components are recomputed at the *start of each new page*, before the page contents are formatted (recomputing the header and footer first makes it possible to determine how many lines of data to format so as to adhere to the specified page length).
When the call to #format is complete and the data has been fully formatted, the footer subroutine is called one last time, with an extra argument of true
. The string returned by this final call is used as the final footer.
So for example, a 60-line per page report, starting at page 7, with appropriate headers and footers might be set up like so:
small = Text::Reform.new
r.header = lambda do |page| "Page #{page}\n\n" end
r. = lambda do |page, last|
if last
''
else
('-'*50 + "\n" + small.format('>'*50, "...#{page+1}"))
end
end
r.page_feed = "\n\n"
r.page_len = 60
r.page_num = 7
r.format(template, data)
Note that you can’t reuse the r
instance of Text::Reform inside the footer, it will end up calling itself recursivly until stack exhaustion.
Alternatively, to set up headers and footers such that the running head is right justified in the header and the page number is centred in the footer:
r.header = { :right => 'Running head' }
r. = { :centre => lambda do |page| "page #{page}" end }
r.page_len = 60
r.format(template, data)
The footer in the previous example could also have been specified the other way around, as a block that returns a hash (rather than a hash containing a block):
r.header = { :right => 'Running head' }
r. = lambda do |page| { :center => "page #{page}" } end
AUTHOR
Original Perl library and documentation: Damian Conway (damian at conway dot org)
Translating everything to Ruby (and leaving a lot of stuff out): Kaspar Schiess (eule at space dot ch)
BUGS
There are undoubtedly serious bugs lurking somewhere in code this funky :-) Bug reports and other feedback are most welcome.
COPYRIGHT
Copyright © 2005, Kaspar Schiess. All Rights Reserved. This module is free software. It may be used, redistributed and/or modified under the terms of the Ruby License (see www.ruby-lang.org/en/LICENSE.txt)
Defined Under Namespace
Classes: BreakAt, BreakHyphenator, BreakWith, BreakWrap
Constant Summary collapse
- VERSION =
"0.3.0"
- BSPECIALS =
various regexp parts for matching patterns.
%w{ [ | ] }
- LSPECIALS =
%w{ < ^ > }
- LJUSTIFIED =
"[<]{2,} [>]{2,}"
- BJUSTIFIED =
"[\\[]{2,} [\\]]{2,}"
- BSINGLE =
"~+"
- SPECIALS =
[BSPECIALS, LSPECIALS].flatten.map { |spec| Regexp.escape(spec)+"{2,}" }
- FIXED_FIELDPAT =
[LJUSTIFIED, BJUSTIFIED, BSINGLE, SPECIALS ].flatten.join('|')
- DECIMAL =
TODO: Make this locale dependent
'.'
- LNUMERICAL =
Matches one or more > followed by . followed by one or more <
"[>]+ (?:#{Regexp.escape(DECIMAL)}[<]{1,})"
- BNUMERICAL =
Matches one or more ] followed by . followed by one or more [
"[\\]]+ (?: #{Regexp.escape(DECIMAL)} [\\[]{1,})"
- FIELDPAT =
[LNUMERICAL, BNUMERICAL, FIXED_FIELDPAT].join('|')
- LFIELDMARK =
[LNUMERICAL, LJUSTIFIED, LSPECIALS.map { |l| Regexp.escape(l) + "{2}" } ].flatten.join('|')
- BFIELDMARK =
[BNUMERICAL, BJUSTIFIED, BSINGLE, BSPECIALS.map { |l| Regexp.escape(l) + "{2}" } ].flatten.join('|')
- FIELDMARK =
[LNUMERICAL, BNUMERICAL, BSINGLE, LJUSTIFIED, BJUSTIFIED, LFIELDMARK, BFIELDMARK].flatten.join('|')
- CLEAR_BLOCK =
For use with #header, #footer, and #page_feed; this will clear the header, footer, or page feed block result to be an empty block.
lambda { |*args| "" }
- NUMBERS_NORMAL =
Numbers are printed, leaving off unnecessary decimal places. Non- numeric data is printed as a series of question marks. This is the default for formatting numbers.
0
- NUMBERS_ALL_PLACES =
Numbers are printed, retaining all decimal places. Non-numeric data is printed as a series of question marks.
[[[[[.]] # format 1.0 -> 1.00 1 -> 1.00
1
- NUMBERS_SKIP_NAN =
Numbers are printed as ffor
NUMBERS_NORMAL
, but NaN (“not a number”) values are skipped. 2
- NUMBERS_ALL_AND_SKIP =
Numbers are printed as for
NUMBERS_ALL_PLACES
, but NaN values are skipped. NUMBERS_ALL_PLACES | NUMBERS_SKIP_NAN
Instance Attribute Summary collapse
-
#break ⇒ Object
Break class instance that is used to break words in hyphenation.
-
#fill ⇒ Object
If
true
, causes newlines to be removed from the input. -
#filler ⇒ Object
Controls character that is used to fill lines that are too short.
-
#footer ⇒ Object
Proc returning the page footer.
-
#header ⇒ Object
Proc returning page header.
-
#interleave ⇒ Object
This implies that formats and the variables from which they’re filled need to be interleaved.
-
#min_break ⇒ Object
Specifies the minimal number of characters that must be left on a line.
-
#numeric ⇒ Object
Specifies handling method for numerical data.
-
#page_feed ⇒ Object
Proc to be called for page feed text.
-
#page_len ⇒ Object
Specifies the total number of lines in a page (including headers, footers, and page-feeds).
-
#page_num ⇒ Object
Where to start page numbering.
-
#page_width ⇒ Object
Specifies the total number of columns in a page.
-
#squeeze ⇒ Object
If
true
, causes any sequence of spaces and/or tabs (but not newlines) in an interpolated string to be replaced with a single space. -
#trim ⇒ Object
Controls trimming of whitespace at end of lines.
Class Method Summary collapse
-
.break_at(bat) ⇒ Object
Takes a
bat
string as argument, breaks by looking for that substring and breaking just after it. -
.break_hyphenator(hyphenator) ⇒ Object
Hyphenates with a class that implements the API of TeX::Hyphen or Text::Hyphen.
-
.break_with(hyphen) ⇒ Object
Takes a
hyphen
string as argument, breaks by inserting that hyphen into the word to be hyphenated. -
.break_wrap ⇒ Object
Breaks by using a ‘wrap and slop’ algorithm.
Instance Method Summary collapse
-
#__construct_type(str, justifiedPattern) ⇒ Object
Construct a type that can be passed to #replace from last a string.
-
#count_lines(*args) ⇒ Object
Count occurrences of n (lines) of all strings that are passed as parameter.
-
#debug ⇒ Object
Turn on internal debugging output for the duration of the block.
-
#format(*args) ⇒ Object
Format data according to
format
. -
#initialize(options = {}) {|_self| ... } ⇒ Reform
constructor
Create a Text::Reform object.
-
#quote(str) ⇒ Object
Quotes any characters that might be interpreted in
str
to be normal characters. -
#replace(format, length, value) ⇒ Object
Replaces a placeholder with the text given.
-
#scanf_remains(value, fstr, &block) ⇒ Object
Using Scanf module, scanf a string and return what has not been matched in addition to normal scanf return.
-
#unchomp(str) ⇒ Object
Adds a n character to the end of the line unless it already has a n at the end of the line.
-
#unchomp!(str) ⇒ Object
Adds a n character to the end of the line unless it already has a n at the end of the line.
Constructor Details
#initialize(options = {}) {|_self| ... } ⇒ Reform
Create a Text::Reform object. Accepts an optional hash of construction option (this will change to named parameters in Ruby 2.0). After the initial object is constructed (with either the provided or default values), the object will be yielded (as self
) to an optional block for further construction and operation.
918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 |
# File 'lib/text/reform.rb', line 918 def initialize( = {}) #:yields self: @debug = [:debug] || false @header = [:header] || CLEAR_BLOCK @footer = [:footer] || CLEAR_BLOCK @page_feed = [:page_feed] || CLEAR_BLOCK @page_len = [:page_len] || nil @page_num = [:page_num] || nil @page_width = [:page_width] || 72 @break = [:break] || Text::Reform.break_with('-') @min_break = [:min_break] || 2 @squeeze = [:squeeze] || false @fill = [:fill] || false @filler = [:filler] || { :left => ' ', :right => ' ' } @interleave = [:interleave] || false @numeric = [:numeric] || 0 @trim = [:trim] || false yield self if block_given? end |
Instance Attribute Details
#break ⇒ Object
Break class instance that is used to break words in hyphenation. This class must have a #break method accepting the three arguments str
, initial_max_length
and maxLength
.
You can directly call the break_* methods to produce such a class instance for you; Available methods are #break_width, #break_at, #break_wrap, #break_hyphenator.
- Default
-
Text::Hyphen::break_with(‘-’)
806 807 808 |
# File 'lib/text/reform.rb', line 806 def break @break end |
#fill ⇒ Object
If true
, causes newlines to be removed from the input. If you want to squeeze all whitespace, set #fill and #squeeze to true.
- Default
-
false
825 826 827 |
# File 'lib/text/reform.rb', line 825 def fill @fill end |
#filler ⇒ Object
Controls character that is used to fill lines that are too short. If this attribute has a hash value, the symbols :left and :right store the filler character to use on the left and the right, respectivly.
- Default
-
‘ ’ on both sides
833 834 835 |
# File 'lib/text/reform.rb', line 833 def filler @filler end |
#footer ⇒ Object
Proc returning the page footer. This gets called before the page gets formatted to permit calculation of page length.
- Default
-
CLEAR_BLOCK
773 774 775 |
# File 'lib/text/reform.rb', line 773 def @footer end |
#header ⇒ Object
Proc returning page header. This is called before the page actually gets formatted to permit calculation of page length.
- Default
-
CLEAR_BLOCK
767 768 769 |
# File 'lib/text/reform.rb', line 767 def header @header end |
#interleave ⇒ Object
This implies that formats and the variables from which they’re filled need to be interleaved. That is, a multi-line specification like this:
print format(
"Passed: ##
[[[[[[[[[[[[[[[ # single format specification
Failed: # (needs two sets of data)
[[[[[[[[[[[[[[[", ##
fails, passes) ## two arrays, data for previous format
would print:
Passed:
<pass 1>
Failed:
<fail 1>
Passed:
<pass 2>
Failed:
<fail 2>
Passed:
<pass 3>
Failed:
<fail 3>
because the four-line format specifier is treated as a single unit, to be repeatedly filled until all the data in passes
and fails
has been consumed.
- Default
-
false
878 879 880 |
# File 'lib/text/reform.rb', line 878 def interleave @interleave end |
#min_break ⇒ Object
Specifies the minimal number of characters that must be left on a line. This prevents breaking of words below its value.
- Default
-
2
812 813 814 |
# File 'lib/text/reform.rb', line 812 def min_break @min_break end |
#numeric ⇒ Object
Specifies handling method for numerical data. Allowed values include:
-
NUMBERS_NORMAL
-
NUMBERS_ALL_PLACES
-
NUMBERS_SKIP_NAN
-
NUMBERS_ALL_AND_SKIP
- Default
-
NUMBERS_NORMAL
905 906 907 |
# File 'lib/text/reform.rb', line 905 def numeric @numeric end |
#page_feed ⇒ Object
Proc to be called for page feed text. This is also called at the start of each page, but does not count towards page length.
- Default
-
CLEAR_BLOCK
779 780 781 |
# File 'lib/text/reform.rb', line 779 def page_feed @page_feed end |
#page_len ⇒ Object
Specifies the total number of lines in a page (including headers, footers, and page-feeds).
- Default
-
nil
785 786 787 |
# File 'lib/text/reform.rb', line 785 def page_len @page_len end |
#page_num ⇒ Object
Where to start page numbering.
- Default
-
nil
790 791 792 |
# File 'lib/text/reform.rb', line 790 def page_num @page_num end |
#page_width ⇒ Object
Specifies the total number of columns in a page.
- Default
-
72
795 796 797 |
# File 'lib/text/reform.rb', line 795 def page_width @page_width end |
#squeeze ⇒ Object
If true
, causes any sequence of spaces and/or tabs (but not newlines) in an interpolated string to be replaced with a single space.
- Default
-
false
819 820 821 |
# File 'lib/text/reform.rb', line 819 def squeeze @squeeze end |
#trim ⇒ Object
Controls trimming of whitespace at end of lines.
- Default
-
true
910 911 912 |
# File 'lib/text/reform.rb', line 910 def trim @trim end |
Class Method Details
.break_at(bat) ⇒ Object
Takes a bat
string as argument, breaks by looking for that substring and breaking just after it.
1296 1297 1298 |
# File 'lib/text/reform.rb', line 1296 def break_at(bat) BreakAt.new(bat) end |
.break_hyphenator(hyphenator) ⇒ Object
Hyphenates with a class that implements the API of TeX::Hyphen or Text::Hyphen.
1307 1308 1309 |
# File 'lib/text/reform.rb', line 1307 def break_hyphenator(hyphenator) BreakHyphenator.new(hyphenator) end |
Instance Method Details
#__construct_type(str, justifiedPattern) ⇒ Object
Construct a type that can be passed to #replace from last a string.
1408 1409 1410 1411 1412 1413 1414 |
# File 'lib/text/reform.rb', line 1408 def __construct_type(str, justifiedPattern) if str =~ /#{justifiedPattern}/x 'J' else str end end |
#count_lines(*args) ⇒ Object
Count occurrences of n (lines) of all strings that are passed as parameter.
1401 1402 1403 1404 1405 |
# File 'lib/text/reform.rb', line 1401 def count_lines(*args) args.inject(0) do |sum, el| sum + el.count("\n") end end |
#debug ⇒ Object
Turn on internal debugging output for the duration of the block.
1280 1281 1282 1283 1284 1285 |
# File 'lib/text/reform.rb', line 1280 def debug d = @debug @debug = true yield @debug = d end |
#format(*args) ⇒ Object
Format data according to format
.
939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 |
# File 'lib/text/reform.rb', line 939 def format(*args) @page_num ||= 1 __debug("Acquiring header and footer: ", @page_num) header = __header(@page_num) = (@page_num, false) = line_count = count_lines(header, ) hf_count = line_count text = header format_stack = [] while (args and not args.empty?) or (not format_stack.empty?) __debug("Arguments: ", args) __debug("Formats left: ", format_stack) if format_stack.empty? if @interleave # split format in its parts and recombine line by line format_stack += args.shift.split(%r{\n}o).collect { |fmtline| fmtline << "\n" } else format_stack << args.shift end end format = format_stack.shift parts = format.split(%r{ ( # Capture \n | # newline... OR (?:\\.)+ | # one or more escapes... OR #{FIELDPAT} | # patterns )}ox) parts << "\n" unless parts[-1] == "\n" __debug("Parts: ", parts) # Count all fields (inject 0, increment when field) and prepare # data. field_count = parts.inject(0) do |count, el| if (el =~ /#{LFIELDMARK}/ox or el =~ /#{FIELDMARK}/ox) count + 1 else count end end if field_count.nonzero? data = args.first(field_count).collect do |el| if el.kind_of?(Array) el.join("\n") else el.to_s end end # shift all arguments that we have just consumed args = args[field_count..-1] # Is argument count correct? data += [''] * (field_count - data.length) unless data.length == field_count else data = [[]] # one line of data, contains nothing end first_line = true data_left = true while data_left idx = 0 data_left = false parts.each do |part| # Is part an escaped format literal ? if part =~ /\A (?:\\.)+/ox __debug("esc literal: ", part) text << part.gsub(/\\(.)/, "\1") # Is part a once field mark ? elsif part =~ /(#{LFIELDMARK})/ox if first_line type = __construct_type($1, LJUSTIFIED) __debug("once field: ", part) __debug("data is: ", data[idx]) text << replace(type, part.length, data[idx]) __debug("data now: ", data[idx]) else text << (@filler[:left] * part.length)[0, part.length] __debug("missing once field: ", part) end idx += 1 # Is part a multi field mark ? elsif part =~ /(#{FIELDMARK})/ox and part[0, 2] != '~~' type = __construct_type($1, BJUSTIFIED) __debug("multi field: ", part) __debug("data is: ", data[idx]) text << replace(type, part.length, data[idx]) __debug("text is: ", text) __debug("data now: ", data[idx]) data_left = true if data[idx].strip.length > 0 idx += 1 # Part is a literal. else __debug("literal: ", part) text << part.gsub(/\0(\0*)/, '\1') # XXX: What is this gsub for ? # New line ? if part == "\n" line_count += 1 if @page_len && line_count >= @page_len __debug("\tejecting page: #@page_num") @page_num += 1 page_feed = __pagefeed header = __header(@page_num) text << + page_feed + header = = (@page_num, false) line_count = hf_count = (header.count("\n") + .count("\n")) header = page_feed + header end end end # multiway if on part end # parts.each __debug("Accumulated: ", text) first_line = false end end # while args or formats left # Adjust final page header or footer as required if hf_count > 0 and line_count == hf_count # there is a header that we don't need text.sub!(/#{Regexp.escape(header)}\Z/, '') elsif line_count > 0 and @page_len and @page_len > 0 # missing footer: text << "\n" * (@page_len - line_count) + = end # Replace last footer if and not .empty? = (@page_num, true) = .count("\n") - .count("\n") # Enough space to squeeze the longer final footer in ? if > 0 && text =~ /(#{'^[^\S\n]*\n' * }#{Regexp.escape()})\Z/ = $1 = 0 end # If not, create an empty page for it. if > 0 @page_num += 1 lastHeader = __header(@page_num) = (@page_num, true) text << lastHeader text << "\n" * (@page_len - lastHeader.count("\n") - .count("\n")) text << else = "\n" * (-) + text[-(.length), text.length] = end end # Trim text text.gsub!(/[ ]+$/m, '') if @trim text end |
#quote(str) ⇒ Object
Quotes any characters that might be interpreted in str
to be normal characters.
1273 1274 1275 1276 |
# File 'lib/text/reform.rb', line 1273 def quote(str) puts 'Text::Reform warning: not quoting string...' if @debug str end |
#replace(format, length, value) ⇒ Object
Replaces a placeholder with the text given. The format
string gives the type of the replace match: When exactly two chars, this indicates a text replace field, when longer, this is a numeric field.
1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 |
# File 'lib/text/reform.rb', line 1118 def replace(format, length, value) text = '' remaining = length filled = 0 __debug("value is: ", value) if @fill value.sub!(/\A\s*/m, '') else value.sub!(/\A[ \t]*/, '') end if value and format.length > 2 # find length of numerical fields if format =~ /([\]>]+)#{Regexp.escape(DECIMAL)}([\[<]+)/ ilen, dlen = $1.length, $2.length end # Try to extract a numeric value from +value+ done = false while not done num, extra = scanf_remains(value, "%f") __debug "Number split into: ", [num, extra] done = true if extra.length == value.length value.sub!(/\s*\S*/, '') # skip offending non number value if (@numeric & NUMBERS_SKIP_NAN) > 0 && value =~ /\S/ __debug("Not a Number, retrying ", value) done = false else text = '?' * ilen + DECIMAL + '?' * dlen return text end end end num = num.first if num.kind_of?(Array) __debug("Finally number is: ", num) formatted = "%#{format.length}.#{dlen}f" % num if formatted.length > format.length text = '#' * ilen + DECIMAL + '#' * dlen else text = formatted end __debug("Formatted number is: ", text) # Only output significant digits. Unless not all places were # explicitly requested or the number has more digits than we just # output replace trailing zeros with spaces. unless (@numeric & NUMBERS_ALL_PLACES > 0) or num.to_s =~ /#{Regexp.escape(DECIMAL)}\d\d{#{dlen},}$/ text.sub!(/(#{Regexp.escape(DECIMAL)}\d+?)(0+)$/) do |mv| $1 + ' ' * $2.length end end value.replace(extra) remaining = 0 else while !((value =~ /\S/o).nil?) # Only whitespace remaining ? if ! @fill && value.sub!(/\A[ \t]*\n/, '') filled = 2 break end break unless value =~ /\A(\s*)(\S+)(.*)\z/om; ws, word, extra = $1, $2, $3 # Replace all newlines by spaces when fill was specified. nonnl = (ws =~ /[^\n]/o) if @fill ws.gsub!(/\n/) do |match| nonnl ? '' : ' ' end end # Replace all whitespace by one space if squeeze was specified. lead = @squeeze ? (ws.length > 0 ? ' ' : '') : ws match = lead + word __debug("Extracted: ", match) break if text and match =~ /\n/o if match.length <= remaining __debug("Accepted: ", match) text << match remaining -= match.length value.replace(extra) else __debug("Need to break: ", match) if (remaining - lead.length) >= @min_break __debug("Trying to break: ", match) broken, left = @break.break(match, remaining, length) text << broken __debug("Broke as: ", [broken, left]) value.replace left + extra # Adjust remaining chars, but allow for underflow. t = remaining-broken.length if t < 0 remaining = 0 else remaining = t end end break end filled = 1 end end if filled.zero? and remaining > 0 and value =~ /\S/ and text.empty? value.sub!(/^\s*(.{1,#{remaining}})/, '') text = $1 remaining -= text.length end # Justify format? if text =~ / /o and format == 'J' and value =~ /\S/o and filled != 2 # Fully justified text.reverse! text.gsub!(/( +)/o) do |mv| remaining -= 1 if remaining > 0 " #{$1}" else $1 end end while remaining > 0 text.reverse! elsif format =~ /\>|\]/o # Right justified text[0, 0] = (@filler[:left] * remaining)[0, remaining] if remaining > 0 elsif format =~ /\^|\|/o # Center justified half_remaining = remaining / 2 text[0, 0] = (@filler[:left] * half_remaining)[0, half_remaining] half_remaining = remaining - half_remaining text << (@filler[:right] * half_remaining)[0, half_remaining] else # Left justified text << (@filler[:right] * remaining)[0, remaining] end text end |
#scanf_remains(value, fstr, &block) ⇒ Object
Using Scanf module, scanf a string and return what has not been matched in addition to normal scanf return.
1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 |
# File 'lib/text/reform.rb', line 1388 def scanf_remains(value, fstr, &block) if block.nil? unless fstr.kind_of?(Scanf::FormatString) fstr = Scanf::FormatString.new(fstr) end [ fstr.match(value), fstr.string_left ] else value.block_scanf(fstr, &block) end end |
#unchomp(str) ⇒ Object
Adds a n character to the end of the line unless it already has a n at the end of the line. Returns a modified copy of str
.
1418 1419 1420 |
# File 'lib/text/reform.rb', line 1418 def unchomp(str) unchomp!(str.dup) end |
#unchomp!(str) ⇒ Object
Adds a n character to the end of the line unless it already has a n at the end of the line.
1424 1425 1426 1427 1428 1429 1430 |
# File 'lib/text/reform.rb', line 1424 def unchomp!(str) if str.empty? or str[-1] == ?\n str else str << "\n" end end |