Module: HexaPDF::Layout::TextLayouter::SimpleTextSegmentation
- Defined in:
- lib/hexapdf/layout/text_layouter.rb
Overview
Implementation of a simple text segmentation algorithm.
The algorithm breaks TextFragment objects into objects wrapped by Box, Glue or Penalty items, and inserts additional Penalty items when needed:
-
Any valid Unicode newline separator inserts a Penalty object describing a mandatory break.
-
Spaces and tabulators are wrapped by Glue objects, allowing breaks.
-
Non-breaking spaces are wrapped into Penalty objects that prohibit line breaking.
-
Hyphens are attached to the preceeding text fragment (or are a standalone text fragment) and followed by a Penalty object to allow a break.
-
If a soft-hyphens is encountered, a hyphen wrapped by a Penalty object is inserted to allow a break.
-
If a zero-width-space is encountered, a Penalty object is inserted to allow a break.
Constant Summary collapse
- BREAK_CHARS =
Breaks are detected at: space, tab, zero-width-space, non-breaking space, hyphen, soft-hypen and any valid Unicode newline separator
{}
Class Method Summary collapse
-
.call(items) ⇒ Object
Breaks the items (an array of InlineBox and TextFragment objects) into atomic pieces wrapped by Box, Glue or Penalty items, and returns those as an array.
Class Method Details
.call(items) ⇒ Object
Breaks the items (an array of InlineBox and TextFragment objects) into atomic pieces wrapped by Box, Glue or Penalty items, and returns those as an array.
228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 |
# File 'lib/hexapdf/layout/text_layouter.rb', line 228 def self.call(items) result = [] glues = {} penalties = {} items.each do |item| if item.kind_of?(InlineBox) result << Box.new(item) else i = 0 while i < item.items.size # Collect characters and kerning values until break character is encountered box_items = [] while (glyph = item.items[i]) && (glyph.kind_of?(Numeric) || !BREAK_CHARS.key?(glyph.str)) box_items << glyph i += 1 end # A hyphen belongs to the text fragment box_items << glyph if glyph && !glyph.kind_of?(Numeric) && glyph.str == '-' unless box_items.empty? result << Box.new(item.dup_attributes(box_items.freeze)) end if glyph case glyph.str when ' ' result << (glues[item.attributes_hash] ||= Glue.new(item.dup_attributes([glyph].freeze))) when "\n", "\v", "\f", "\u{85}", "\u{2029}" result << (penalties[item.attributes_hash] ||= Penalty.new(Penalty::PARAGRAPH_BREAK, 0)) when "\u{2028}" result << Penalty.new(Penalty::LINE_BREAK, 0) when "\r" if !item.items[i + 1] || item.items[i + 1].kind_of?(Numeric) || item.items[i + 1].str != "\n" result << (penalties[item.attributes_hash] ||= Penalty.new(Penalty::PARAGRAPH_BREAK, 0)) end when '-' result << Penalty::Standard when "\t" spaces = [item.style.font.decode_utf8(" ").first] * 8 result << Glue.new(item.dup_attributes(spaces.freeze)) when "\u{00AD}" frag = item.dup_attributes([item.style.font.decode_utf8("-").first].freeze) result << Penalty.new(Penalty::Standard.penalty, frag.width, item: frag) when "\u{00A0}" frag = item.dup_attributes([item.style.font.decode_utf8(" ").first].freeze) result << Penalty.new(Penalty::ProhibitedBreak.penalty, frag.width, item: frag) when "\u{200B}" result << Penalty.new(0) end end i += 1 end end end result end |