Class: TeX::Hyphen

Inherits:
Object
  • Object
show all
Defined in:
lib/tex/hyphen.rb

Overview

Introduction

TeX::Hyphen – hyphenate words using TeX’s patterns

Usage

require 'tex/hyphen'
hyp = TeX::Hyphen.new(:file => 'hyphen.tex', :style => 'czech',
                      :leftmin => 2, :rightmin => 2)
hyp = TeX::Hyphen.new

word = "representation"
points = hyp.hyphenate(word)  #=> [3, 5, 8, 10]
puts hyp.visualize(word)      #=> rep-re-sen-ta-tion

Description

Constructor new() creates a new Hyphen object and loads the file with patterns into memory. Then you can ask it for hyphenation of a word by calling a method of this object. If no file is specified, the default Donald E. Knuth’s hyphen.tex, that is included in this module, is used instead.

Copyright

Copyright © 2003 - 2004 Martin DeMello and Austin Ziegler

Version

0.4.0

Based On

Perl’s TeX::Hyphen

search.cpan.org/author/JANPAZ/TeX-Hyphen-0.140/lib/TeX/Hyphen.pm

Copyright © 1997 - 2002 Jan Pazdziora

Licence

Ruby’s

Constant Summary collapse

VERSION =
'0.4.0'
DEFAULT_MIN_LEFT =
2
DEFAULT_MIN_RIGHT =
2

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(arg = nil, &block) ⇒ Hyphen

A TeX hyphenation object can be constructed using a few different calling methods. Any method may also take a constructor block, allowing direct access to @file, @style, @min_left, and @min_right.

arg may be one of Array, Hash, String, or nil:

Array

A four-element array in the form of:

file, style, min_left, min_right
Hash

A hash with the following keys:

:file (or: 'file')

:style (or: 'style')

:min_left (or: :leftmin, 'min_left', 'leftmin')

:min_right (or: :rightmin, 'min_right', 'rightmin')

String

This corresponds to @file.

The parameters are:

  • file The name of the file with the hyphenation patterns. It

    will be loaded and the resulting object will be able to
    hyphenate according to patterns in that file. If the
    file is not specified or is +nil+, the default
    hyphen.tex will be used (and is included in the module
    definition).
    
  • style Various language use special shortcuts to specify the

    patterns. Instead of doing the full TeX expansion, we
    use Ruby code to parse the patterns. The style options
    requires a module found at "tex/hyphen/#{@style}" and
    uses the parsing functions found in it.
    
    Currently, the default Czech (works works for English
    well) and German are available. See TeX::Hyphen::Czech
    for more information, especially if you want to support
    other languages and styles.
    
  • min_left The minimum starting substring which will not be

    hyphenated. This overrides the default specified in the
    style file.
    
  • min_right The minimum ending substrnig which will not be

    hyphenated. This overrides the default specified int he
    style file.
    
Reference

“Object Construction and Blocks” <www.pragmaticprogrammer.com/ruby/articles/insteval.html>



94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
# File 'lib/tex/hyphen.rb', line 94

def initialize(arg = nil, &block)
  case arg
  when Array
    @file = arg[0]
    @style = arg[1]
    @min_left = arg[2]
    @min_right = arg[3]
  when Hash
    @file = arg[:file] || arg['file']
    @style = arg[:style] || arg['style']
    @min_left = arg[:min_left] || arg[:leftmin] || arg['min_left'] || arg['leftmin']
    @min_right = arg[:min_right] || arg[:rightmin] || arg['min_right'] || arg['rightmin']
  when String
    @file = arg
  when NilClass
    @file = @style = @min_left = @min_right = nil
  end
  @cache  = {}
  @vcache = {}

  instance_eval(&block) unless block.nil?

  if @file.nil?
    data = TeX::HyphenTeX.dup
  else
    data = @file
    @import = true
  end

  @hyphen, @begin_hyphen, @end_hyphen, @both_hyphen, @exception =
    (0...5).map{{}}

  @style ||= 'czech'

  unless @style.nil?
    require "tex/hyphen/#{@style}"
    mod = eval("TeX::Hyphen::#{@style.capitalize}")
    self.extend(mod)

    min_left ||= mod::DEFAULT_STYLE_MIN_LEFT
    min_right ||= mod::DEFAULT_STYLE_MIN_RIGHT
  end

  @min_left ||= DEFAULT_MIN_LEFT
  @min_right ||= DEFAULT_MIN_RIGHT

  data = File.open(@file, 'r') if data == @file
  parsepatterns(data) { |line| line =~ /\\patterns\{/ }
  parsepatterns(data) { |line| not process_patterns(line.chomp) }
  parsepatterns(data) { |line| line =~ /\\hyphenation\{/ }
  parsepatterns(data) { |line| not process_hyphenation(line.chomp) }
  data.close if data.kind_of?(IO)
end

Instance Attribute Details

#min_leftObject

Allows modification of the minimal starting substring for hyphenatino.



41
42
43
# File 'lib/tex/hyphen.rb', line 41

def min_left
  @min_left
end

#min_rightObject

Allows modification of the minimal ending substring for hyphenatino.



43
44
45
# File 'lib/tex/hyphen.rb', line 43

def min_right
  @min_right
end

Instance Method Details

#clear_cache!Object



199
200
201
202
# File 'lib/tex/hyphen.rb', line 199

def clear_cache!
  @cache.clear
  @vcache.clear
end

#hyphenate(word) ⇒ Object

Returns a list of places where the word can be divided, as

hyp.hyphenate('representation')

returns [3, 5, 8, 10]. If the word has been hyphenated previously, it will be returned from a per-instance cache.



154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
# File 'lib/tex/hyphen.rb', line 154

def hyphenate(word)
			STDERR.puts "Hyphenating #{word}" if $DEBUG
  return @cache[word] if @cache.has_key?(word)
  res = @exception[word]
  return @cache[word] = make_result_list(res) if res

  @result = [0] * (word.length + 1)
  rightstop = word.length - @min_right

    # Walk the word
  (0..rightstop).each do |pos|
    restlength = word.length - pos
    (1..restlength).each do |length|
      substr = word[pos, length]
	updateresult(@hyphen, substr, pos)
	updateresult(@begin_hyphen, substr, pos) if pos == 0
	updateresult(@end_hyphen, substr, pos) if length == restlength
    end
  end

  updateresult(@both_hyphen, word, 0)

			(0..@min_left).each { |i| @result[i] = 0 }
			((-1 - @min_right)..(-1)).each { |i| @result[i] = 0 }
			@cache[word] = make_result_list(@result)
end

#hyphenate_to(word, size) ⇒ Object

This function will hyphenate a word so that the first point is at most size characters.



206
207
208
209
210
211
212
213
# File 'lib/tex/hyphen.rb', line 206

def hyphenate_to(word, size)
  point = hyphenate(word).delete_if { |e| e >= size }.max
  if point.nil?
    [nil, word]
  else
    [word[0 ... point] + "-", word[point .. -1]]
  end
end

#visualise(word) ⇒ Object Also known as: visualize

Returns a visualization of the hyphenation points, so:

hyp.visualize('representation')

should return rep-re-sen-ta-tion, at elast for English patterns. If the word has been visualised previously, it will be returned from a per-instance cache.



188
189
190
191
192
193
194
195
# File 'lib/tex/hyphen.rb', line 188

def visualise(word)
    return @vcache[word] if @vcache.has_key?(word)
    w = word.dup
	hyphenate(w).each_with_index do |pos, n| 
		w[pos.to_i + n, 0] = '-' if pos != 0
    end
	@vcache[word] = w
end