Module: Prawn::Rtl::Connector::Logic

Defined in:
lib/prawn/rtl/connector/logic.rb

Overview

Handles the logic for Arabic letter connection and contextual form selection.

This module implements the core algorithm for determining which form (isolated, initial, medial, or final) an Arabic character should take based on its surrounding characters. It maintains a mapping of Arabic Unicode characters to their various contextual forms.

Defined Under Namespace

Classes: CharacterInfo

Constant Summary collapse

@@charinfos =
nil

Class Method Summary collapse

Class Method Details

.add(common, isolated, final, initial, medial, connects, diacritic = false) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Adds a character and its contextual forms to the character mapping.

Parameters:

  • common (String)

    hex code of the base Unicode character

  • isolated (String)

    hex code of the isolated form

  • final (String)

    hex code of the final form

  • initial (String)

    hex code of the initial form

  • medial (String)

    hex code of the medial form

  • connects (Boolean)

    whether this character connects to the next

  • diacritic (Boolean) (defaults to: false)

    whether this character is a diacritic mark



197
198
199
200
201
202
203
204
205
206
207
208
# File 'lib/prawn/rtl/connector/logic.rb', line 197

def self.add(common, isolated, final, initial, medial, connects, diacritic = false)
  charinfo = CharacterInfo.new(
    [common.hex].pack('U'),
    [isolated.hex].pack('U'),
    [final.hex].pack('U'),
    [initial.hex].pack('U'),
    [medial.hex].pack('U'),
    connects,
    diacritic
  )
  @@charinfos[charinfo.common] = charinfo
end

.charinfosHash{String => CharacterInfo}

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Returns the character information mapping for Arabic characters.

Lazily initializes and returns a hash mapping Arabic Unicode characters to their CharacterInfo objects containing contextual forms.

Returns:

  • (Hash{String => CharacterInfo})

    the character information mapping



134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
# File 'lib/prawn/rtl/connector/logic.rb', line 134

def self.charinfos
  return @@charinfos unless @@charinfos.nil?

  @@charinfos = {}
  add('0627', 'fe8d', 'fe8e', 'fe8d', 'fe8e', false) # Alef
  add('0628', 'fe8f', 'fe90', 'fe91', 'fe92', true)  # Ba2
  add('062a', 'fe95', 'fe96', 'fe97', 'fe98', true)  # Ta2
  add('062b', 'fe99', 'fe9a', 'fe9b', 'fe9c', true)  # Tha2
  add('062c', 'fe9d', 'fe9e', 'fe9f', 'fea0', true)  # Jeem
  add('062d', 'fea1', 'fea2', 'fea3', 'fea4', true)  # 7a2
  add('062e', 'fea5', 'fea6', 'fea7', 'fea8', true)  # 7'a2
  add('062f', 'fea9', 'feaa', 'fea9', 'feaa', false) # Dal
  add('0630', 'feab', 'feac', 'feab', 'feac', false) # Thal
  add('0631', 'fead', 'feae', 'fead', 'feae', false) # Ra2
  add('0632', 'feaf', 'feb0', 'feaf', 'feb0', false) # Zain
  add('0633', 'feb1', 'feb2', 'feb3', 'feb4', true)  # Seen
  add('0634', 'feb5', 'feb6', 'feb7', 'feb8', true)  # Sheen
  add('0635', 'feb9', 'feba', 'febb', 'febc', true)  # 9ad
  add('0636', 'febd', 'febe', 'febf', 'fec0', true)  # 9'ad
  add('0637', 'fec1', 'fec2', 'fec3', 'fec4', true)  # 6a2
  add('0638', 'fec5', 'fec6', 'fec7', 'fec8', true)  # 6'a2
  add('0639', 'fec9', 'feca', 'fecb', 'fecc', true)  # 3ain
  add('063a', 'fecd', 'fece', 'fecf', 'fed0', true)  # 3'ain
  add('0641', 'fed1', 'fed2', 'fed3', 'fed4', true)  # Fa2
  add('0642', 'fed5', 'fed6', 'fed7', 'fed8', true)  # Qaf
  add('0643', 'fed9', 'feda', 'fedb', 'fedc', true)  # Kaf
  add('0644', 'fedd', 'fede', 'fedf', 'fee0', true)  # Lam
  add('0645', 'fee1', 'fee2', 'fee3', 'fee4', true)  # Meem
  add('0646', 'fee5', 'fee6', 'fee7', 'fee8', true)  # Noon
  add('0647', 'fee9', 'feea', 'feeb', 'feec', true)  # Ha2
  add('0648', 'feed', 'feee', 'feed', 'feee', false) # Waw
  add('064a', 'fef1', 'fef2', 'fef3', 'fef4', true)  # Ya2
  add('0621', 'fe80', 'fe80', 'fe80', 'fe80', false) # Hamza
  add('0622', 'fe81', 'fe82', 'fe81', 'fe82', false) # Alef Madda
  add('0623', 'fe83', 'fe84', 'fe83', 'fe84', false) # Alef Hamza Above
  add('0624', 'fe85', 'fe86', 'fe85', 'fe86', false) # Waw Hamza
  add('0625', 'fe87', 'fe88', 'fe87', 'fe88', false) # Alef Hamza Below
  add('0626', 'fe89', 'fe8a', 'fe8b', 'fe8c', true)  # Ya2 Hamza
  add('0629', 'fe93', 'fe94', 'fe93', 'fe94', false) # Ta2 Marbu6a
  add('0640', '0640', '0640', '0640', '0640', true)  # Tatweel
  add('0649', 'feef', 'fef0', 'feef', 'fef0', false) # Alef Layyina
  add('0651', 'fe7c', 'fe7c', 'fe7c', 'fe7d', false, true) # Shadda
  add('0652', 'fe7e', 'fe7e', 'fe7e', 'fe7f', false, true) # Sukun
  add('064e', 'fe76', 'fe76', 'fe76', 'fe77', false, true) # Fatha
  add('0650', 'fe7a', 'fe7a', 'fe7a', 'fe7b', false, true) # Kasra
  add('064f', 'fe78', 'fe78', 'fe78', 'fe79', false, true) # Damma
  add('0653', '0653', '0653', '0653', '0653', false, true) # Madda
  add('064b', 'fe79', 'fe70', 'fe70', 'fe71', false, true) # Fathatan
  add('064d', 'fe74', 'fe74', 'fe74', 'fe74', false, true) # Kasratan
  add('064c', 'fe72', 'fe72', 'fe72', 'fe72', false, true) # Dammatan
  @@charinfos
end

.determine_form(previous_previous_char, previous_char, next_char, next_next_char) ⇒ Symbol

Determines the contextual form of an Arabic character.

Determines the form of the current character (:isolated, :initial, :medial, or :final), given the previous character and the next one. In Arabic, all characters can connect with a previous character, but not all letters can connect with the next character (this is determined by CharacterInfo#connects?).

Parameters:

  • previous_previous_char (String, nil)

    the character two positions before

  • previous_char (String, nil)

    the character immediately before

  • next_char (String, nil)

    the character immediately after

  • next_next_char (String, nil)

    the character two positions after

Returns:

  • (Symbol)

    the contextual form (:isolated, :initial, :medial, or :final)



74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# File 'lib/prawn/rtl/connector/logic.rb', line 74

def self.determine_form(previous_previous_char, previous_char, next_char, next_next_char)
  charinfos = self.charinfos
  next_char = next_next_char if charinfos[next_char] && charinfos[next_char].diacritic?
  previous_char = previous_previous_char if charinfos[previous_char] && charinfos[previous_char].diacritic?
  if charinfos[previous_char] && charinfos[next_char]
    charinfos[previous_char].connects? ? :medial : :initial # If the current character does not connect,
  # its medial form will map to its final form,
  # and its initial form will map to its isolated form.
  elsif charinfos[previous_char] # The next character is not an arabic character.
    charinfos[previous_char].connects? ? :final : :isolated
  elsif charinfos[next_char] # The previous character is not an arabic character.
    :initial # If the current character does not connect, its initial form will map to its isolated form.
  else # Neither of the surrounding characters are arabic characters.
    :isolated
  end
end

.transform(str) ⇒ String

Transforms Arabic text by applying contextual letter forms.

Processes a string character by character, determining the appropriate contextual form for each Arabic letter based on its surrounding characters. Non-Arabic characters pass through unchanged.

Parameters:

  • str (String)

    the text to transform

Returns:

  • (String)

    the transformed text with Arabic letters in their contextual forms



99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
# File 'lib/prawn/rtl/connector/logic.rb', line 99

def self.transform(str)
  res = ''
  charinfos = self.charinfos
  previous_previous_char = nil
  previous_char = nil
  current_char = nil
  next_char = nil
  next_next_char = nil
  consume_character = lambda do |char|
    previous_previous_char = previous_char
    previous_char = current_char
    current_char = next_char
    next_char = next_next_char
    next_next_char = char
    return unless current_char

    if charinfos.key?(current_char)
      form = determine_form(previous_previous_char, previous_char, next_char, next_next_char)
      res += charinfos[current_char].formatted[form]
    else
      res += current_char
    end
  end
  str.each_char { |char| consume_character.call(char) }
  2.times { consume_character.call(nil) }
  res
end