Module: Typeset

Defined in:
lib/typeset.rb,
lib/typeset/quotes.rb,
lib/typeset/spaces.rb,
lib/typeset/hyphenate.rb,
lib/typeset/ligatures.rb,
lib/typeset/small_caps.rb,
lib/typeset/punctuation.rb,
lib/typeset/hanging_punctuation.rb

Overview

Contains all of our typeset-related class methods. Mix this module into a class, or just call ‘Typeset#typset` directly

Defined Under Namespace

Modules: HangingPunctuation

Constant Summary collapse

DefaultMethods =

The default typesetting methods and their configuration. Add new methods here in whatever order makes sense.

[
  [:quotes, true],
  [:hanging_punctuation, true],
  [:spaces, true],
  [:small_caps, true],
  [:ligatures, false],
  [:punctuation, false],
  [:hyphenate, true]
]
DefaultOptions =
{
  :disable => [],
  :language => "en_us"
}
Ligatures =

Map of raw text sequences to unicode ligatures

{
  'ffi' => '',
  'ffl' => '',
  'fi' => '',
  'fl' => '',
  'st' => '',
  'ff' => '',
  'ue' => ''
}
DefaultLigatures =

List of ligatures to process by default

%w{ffi ffl fi fl ff}

Class Method Summary collapse

Class Method Details

.apply_to_text_nodes(html, &func) ⇒ Object

Parse an HTML fragment with Nokogiri and apply a function to all of the descendant text nodes



15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# File 'lib/typeset.rb', line 15

def self.apply_to_text_nodes(html, &func)
  doc = Nokogiri::HTML("<div id='rtypeset_internal'>#{html}</div>", nil,"UTF-8",Nokogiri::XML::ParseOptions::NOENT)
  doc.search('//text()').each do |node|
    old_content = node.content
    new_content = func.call(node.content.strip)
    if old_content =~ /^(\s+)/
      new_content = " #{new_content}"
    end
    if old_content =~ /(\s+)$/
      new_content = "#{new_content} "
    end
    node.replace(new_content)
  end
  content = doc.css("#rtypeset_internal")[0].children.map { |child| child.to_html }
  return content.join("")
end

.hanging_punctuation(text, options) ⇒ Object

Add push/pull spans for hanging punctuation to text.



27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# File 'lib/typeset/hanging_punctuation.rb', line 27

def self.hanging_punctuation(text, options)
  return text if text.length < 2

  aligns = "CcOoYTAVvWwY".split('')
  words = text.split(/\s+/)
  words.each_with_index do |word, i|
    [[aligns, false],
     [HangingPunctuation::SingleWidth, 'single'],
     [HangingPunctuation::DoubleWidth, 'double']].each do |pair|
      pair[0].each do |signal|
        if word[0] == signal
          words[i] = "#{HangingPunctuation.pull(pair[1], signal)}#{word.slice(1,word.length)}"

          if not words[i-1].nil?
            words[i-1] = "#{words[i-1]}#{HangingPunctuation.push(pair[1] ? pair[1] : signal)}"
          end
        end
      end
    end
  end

  return words.join(" ")
end

.hyphenate(text, options) ⇒ Object

Hyphenate text, inserting soft hyphenation markers. Specify the language for hyphenation by passing in an options block to your typeset call, e.g.:

Typeset.typeset("do hyphenation on this", {:language => "en_gb"})


8
9
10
11
12
13
14
15
# File 'lib/typeset/hyphenate.rb', line 8

def self.hyphenate(text, options)
  options[:language] ||= 'en_us'
  hyphen = Text::Hyphen.new(:language => options[:language], :left => 0, :right => 0)

  text = hyphen.visualise(text, "\u00AD")

  return text
end

.ligatures(text, options) ⇒ Object

Find and replace sequences of text with their unicode ligature equivalents. Override the set of ligatures to find by passing in a custom options hash, e.g.:

Typeset.typeset("flue", {:ligatures => ["fl", "ue"]})
# -> returns "flᵫ"


21
22
23
24
25
26
27
28
29
# File 'lib/typeset/ligatures.rb', line 21

def self.ligatures(text, options)
  options[:ligatures] ||= DefaultLigatures

  options[:ligatures].each do |ligature|
    text.gsub!(ligature, Ligatures[ligature])
  end

  return text
end

.punctuation(text, options) ⇒ Object

Make dashes, elipses, and start/end punctuation a little prettier.



3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# File 'lib/typeset/punctuation.rb', line 3

def self.punctuation(text, options)
  # Dashes
  text.gsub!('--', '')
  text.gsub!('', "\u2009–\u2009")

  # Elipses
  text.gsub!('...', '')

  # Non-breaking space for start/end punctuation with spaces.
  start_punc = /([«¿¡\[\(]) /
  if text =~ start_punc
    text.gsub!(start_punc, "#{$1}&nbsp;")
  end
  end_punc = / ([\!\?:;\.,‽»\]\)])/
  if text =~ end_punc
    text.gsub!(end_punc,"&nbsp;#{$1}")
  end

  return text
end

.quotes(text, options) ⇒ Object

A poor-man’s Smarty Pants implementation. Converts single & double quotes, tick marks, backticks, and primes into prettier unicode equivalents.



4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# File 'lib/typeset/quotes.rb', line 4

def self.quotes(text, options)
  # Unencode encoded characters, so our regex mess below works
  text.gsub!('&#39;',"\'")
  text.gsub!('&quot;',"\"")

  if text =~ /(\W|^)"(\S+)/
    text.gsub!(/(\W|^)"(\S+)/, "#{$1}\u201c#{$2}") # beginning "
  end
  if text =~ /(\u201c[^"]*)"([^"]*$|[^\u201c"]*\u201c)/
    text.gsub!(/(\u201c[^"]*)"([^"]*$|[^\u201c"]*\u201c)/, "#{$1}\u201d#{$2}") # ending "
  end
  if text =~ /([^0-9])"/
    text.gsub!(/([^0-9])"/, "#{$1}\u201d") # remaining " at end of word
  end
  if text =~ /(\W|^)'(\S)/
    text.gsub!(/(\W|^)'(\S)/, "#{$1}\u2018#{$2}") # beginning '
  end
  if text =~ /([a-z])'([a-z])/i
    text.gsub!(/([a-z])'([a-z])/i, "#{$1}\u2019#{$2}") # conjunction's possession
  end
  if text =~ /((\u2018[^']*)|[a-z])'([^0-9]|$)/i
    text.gsub!(/((\u2018[^']*)|[a-z])'([^0-9]|$)/i, "#{$1}\u2019#{$3}") # ending '
  end
  if text =~ /(\u2018)([0-9]{2}[^\u2019]*)(\u2018([^0-9]|$)|$|\u2019[a-z])/i
    text.gsub!(/(\u2018)([0-9]{2}[^\u2019]*)(\u2018([^0-9]|$)|$|\u2019[a-z])/i, "\u2019#{$2}#{$3}") # abbrev. years like '93
  end
  if text =~ /(\B|^)\u2018(?=([^\u2019]*\u2019\b)*([^\u2019\u2018]*\W[\u2019\u2018]\b|[^\u2019\u2018]*$))/i
    text.gsub!(/(\B|^)\u2018(?=([^\u2019]*\u2019\b)*([^\u2019\u2018]*\W[\u2019\u2018]\b|[^\u2019\u2018]*$))/i, "#{$1}\u2019") # backwards apostrophe
  end
  text.gsub!(/'''/, "\u2034") # triple prime
  text.gsub!(/("|'')/, "\u2033") # double prime
  text.gsub!(/'/, "\u2032")

  # Allow escaped quotes
  text.gsub!('\\\“','\"')
  text.gsub!('\\\”','\"')
  text.gsub!('\\\’','\'')
  text.gsub!('\\\‘','\'')

  return text
end

.small_caps(text, options) ⇒ Object

Identify likely acronyms, and wrap them in a ‘small-caps’ span.



3
4
5
6
7
8
9
10
11
12
# File 'lib/typeset/small_caps.rb', line 3

def self.small_caps(text, options)
  words = text.split(" ")
  words.each_with_index do |word, i|
    if word =~ /^\W*([[:upper:]][[:upper:]][[:upper:]]+)\W*/
      leading,trailing = word.split($1)
      words[i] = "#{leading}<span class=\"small-caps\">#{$1}</span>#{trailing}"
    end
  end
  return words.map { |x| x.strip }.join(" ")
end

.spaces(text, options) ⇒ Object

Replace wide (normal) spaces around math operators with hair spaces.



3
4
5
6
7
8
9
10
# File 'lib/typeset/spaces.rb', line 3

def self.spaces(text, options)
  text.gsub!(" / ", "\u2009/\u2009")
  text.gsub!(" × ", "\u2009×\u2009")
  text.gsub!(" % ", "\u2009%\u2009")
  text.gsub!(" + ", "\u2009+\u2009")

  return text
end

.typeset(html, options = Typeset::DefaultOptions) ⇒ Object

The main entry point for Typeset. Pass in raw HTML or text, along with an optional options block.



51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# File 'lib/typeset.rb', line 51

def self.typeset(html, options=Typeset::DefaultOptions)
  methods = Typeset::DefaultMethods.dup
  options[:disable] ||= DefaultOptions[:disable]
  methods.reject! { |method| options[:disable].include?(method[0]) }

  methods.each do |func, use_text_nodes|
    new_html = html
    if use_text_nodes
      new_html = Typeset.apply_to_text_nodes(html) { |content| Typeset.send(func, content, options) }
    else
      new_html = Typeset.send(func, html, options).strip
    end
    html = new_html
  end
  return html
end