Class: HexaPDF::Font::CMap
- Inherits:
-
Object
- Object
- HexaPDF::Font::CMap
- Defined in:
- lib/hexapdf/font/cmap.rb,
lib/hexapdf/font/cmap/parser.rb,
lib/hexapdf/font/cmap/writer.rb
Overview
Represents a CMap, a mapping from character codes to CIDs (character IDs) or to their Unicode value.
See: PDF2.0 s9.7.5, s9.10.3; Adobe Technical Notes #5014 and #5411
Defined Under Namespace
Constant Summary collapse
Instance Attribute Summary collapse
-
#name ⇒ Object
The name of the CMap.
-
#ordering ⇒ Object
The ordering part of the CMap version.
-
#registry ⇒ Object
The registry part of the CMap version.
-
#supplement ⇒ Object
The supplement part of the CMap version.
-
#wmode ⇒ Object
The writing mode of the CMap: 0 for horizontal, 1 for vertical writing.
Class Method Summary collapse
-
.create_cid_cmap(mapping) ⇒ Object
Returns a string containing a CID CMap that represents the given code to CID mapping.
-
.create_to_unicode_cmap(mapping) ⇒ Object
Returns a string containing a ToUnicode CMap that represents the given code to Unicode codepoint mapping.
-
.for_name(name) ⇒ Object
Creates a new CMap object by parsing a predefined CMap with the given name.
-
.parse(string) ⇒ Object
Creates a new CMap object from the given string which needs to contain a valid CMap file.
-
.predefined?(name) ⇒ Boolean
Returns
true
if the given name specifies a predefined CMap.
Instance Method Summary collapse
-
#add_cid_mapping(code, cid) ⇒ Object
Adds an individual mapping from character code to CID.
-
#add_cid_range(start_code, end_code, start_cid) ⇒ Object
Adds a CID range, mapping characters codes from
start_code
toend_code
to CIDs starting withstart_cid
. -
#add_codespace_range(first, *rest) ⇒ Object
Add a codespace range using an array of ranges for the individual bytes.
-
#add_unicode_mapping(code, string) ⇒ Object
Adds a mapping from character code to Unicode string in UTF-8 encoding.
-
#add_unicode_range_mapping(start_code, end_code, start_values) ⇒ Object
Adds a mapping from a range of character codes to strings starting with the given 16-bit integer values (representing the raw UTF-16BE characters).
-
#initialize ⇒ CMap
constructor
Creates a new CMap object.
-
#read_codes(string) ⇒ Object
Parses the string and returns all character codes.
-
#to_cid(code) ⇒ Object
Returns the CID for the given character code, or 0 if no mapping was found.
-
#to_unicode(code) ⇒ Object
Returns the Unicode string in UTF-8 encoding for the given character code, or
nil
if no mapping was found. -
#use_cmap(cmap) ⇒ Object
Add all mappings from the given CMap to this CMap.
Constructor Details
#initialize ⇒ CMap
Creates a new CMap object.
116 117 118 119 120 121 122 |
# File 'lib/hexapdf/font/cmap.rb', line 116 def initialize @codespace_ranges = [] @cid_mapping = {} @cid_range_mappings = [] @unicode_mapping = {} @unicode_range_mappings = [] end |
Instance Attribute Details
#name ⇒ Object
The name of the CMap.
105 106 107 |
# File 'lib/hexapdf/font/cmap.rb', line 105 def name @name end |
#ordering ⇒ Object
The ordering part of the CMap version.
99 100 101 |
# File 'lib/hexapdf/font/cmap.rb', line 99 def ordering @ordering end |
#registry ⇒ Object
The registry part of the CMap version.
96 97 98 |
# File 'lib/hexapdf/font/cmap.rb', line 96 def registry @registry end |
#supplement ⇒ Object
The supplement part of the CMap version.
102 103 104 |
# File 'lib/hexapdf/font/cmap.rb', line 102 def supplement @supplement end |
#wmode ⇒ Object
The writing mode of the CMap: 0 for horizontal, 1 for vertical writing.
108 109 110 |
# File 'lib/hexapdf/font/cmap.rb', line 108 def wmode @wmode end |
Class Method Details
.create_cid_cmap(mapping) ⇒ Object
Returns a string containing a CID CMap that represents the given code to CID mapping.
See: Writer#create_cid_cmap
91 92 93 |
# File 'lib/hexapdf/font/cmap.rb', line 91 def self.create_cid_cmap(mapping) Writer.new.create_cid_cmap(mapping) end |
.create_to_unicode_cmap(mapping) ⇒ Object
Returns a string containing a ToUnicode CMap that represents the given code to Unicode codepoint mapping.
See: Writer#create_to_unicode_cmap
84 85 86 |
# File 'lib/hexapdf/font/cmap.rb', line 84 def self.create_to_unicode_cmap(mapping) Writer.new.create_to_unicode_cmap(mapping) end |
.for_name(name) ⇒ Object
Creates a new CMap object by parsing a predefined CMap with the given name.
Raises an error if the given CMap is not found.
64 65 66 67 68 69 70 71 72 73 |
# File 'lib/hexapdf/font/cmap.rb', line 64 def self.for_name(name) return @cmap_cache[name] if @cmap_cache.key?(name) file = File.join(CMAP_DIR, name) if File.exist?(file) @cmap_cache[name] = parse(File.read(file, encoding: ::Encoding::UTF_8)) else raise HexaPDF::Error, "No CMap named '#{name}' found" end end |
.parse(string) ⇒ Object
Creates a new CMap object from the given string which needs to contain a valid CMap file.
76 77 78 |
# File 'lib/hexapdf/font/cmap.rb', line 76 def self.parse(string) Parser.new.parse(string) end |
.predefined?(name) ⇒ Boolean
Returns true
if the given name specifies a predefined CMap.
57 58 59 |
# File 'lib/hexapdf/font/cmap.rb', line 57 def self.predefined?(name) File.exist?(File.join(CMAP_DIR, name)) end |
Instance Method Details
#add_cid_mapping(code, cid) ⇒ Object
Adds an individual mapping from character code to CID.
178 179 180 |
# File 'lib/hexapdf/font/cmap.rb', line 178 def add_cid_mapping(code, cid) @cid_mapping[code] = cid end |
#add_cid_range(start_code, end_code, start_cid) ⇒ Object
Adds a CID range, mapping characters codes from start_code
to end_code
to CIDs starting with start_cid
.
184 185 186 |
# File 'lib/hexapdf/font/cmap.rb', line 184 def add_cid_range(start_code, end_code, start_cid) @cid_range_mappings << [start_code..end_code, start_cid] end |
#add_codespace_range(first, *rest) ⇒ Object
Add a codespace range using an array of ranges for the individual bytes.
This means that the first range is checked against the first byte, the second range against the second byte and so on.
137 138 139 |
# File 'lib/hexapdf/font/cmap.rb', line 137 def add_codespace_range(first, *rest) @codespace_ranges << [first, rest] end |
#add_unicode_mapping(code, string) ⇒ Object
Adds a mapping from character code to Unicode string in UTF-8 encoding.
203 204 205 |
# File 'lib/hexapdf/font/cmap.rb', line 203 def add_unicode_mapping(code, string) @unicode_mapping[code] = string end |
#add_unicode_range_mapping(start_code, end_code, start_values) ⇒ Object
Adds a mapping from a range of character codes to strings starting with the given 16-bit integer values (representing the raw UTF-16BE characters).
209 210 211 |
# File 'lib/hexapdf/font/cmap.rb', line 209 def add_unicode_range_mapping(start_code, end_code, start_values) @unicode_range_mappings << [start_code..end_code, start_values] end |
#read_codes(string) ⇒ Object
Parses the string and returns all character codes.
An error is raised if the string contains invalid bytes.
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
# File 'lib/hexapdf/font/cmap.rb', line 144 def read_codes(string) codes = [] bytes = string.each_byte loop do byte = bytes.next code = 0 found = @codespace_ranges.any? do |first_byte_range, rest_ranges| next unless first_byte_range.cover?(byte) code = (code << 8) + byte valid = rest_ranges.all? do |range| begin byte = bytes.next rescue StopIteration raise HexaPDF::Error, "Missing bytes while reading codes via CMap" end code = (code << 8) + byte range.cover?(byte) end codes << code if valid end unless found raise HexaPDF::Error, "Invalid byte while reading codes via CMap: #{byte}" end end codes end |
#to_cid(code) ⇒ Object
Returns the CID for the given character code, or 0 if no mapping was found.
189 190 191 192 193 194 195 196 197 198 199 200 |
# File 'lib/hexapdf/font/cmap.rb', line 189 def to_cid(code) cid = @cid_mapping.fetch(code, -1) if cid == -1 @cid_range_mappings.reverse_each do |range, start_cid| if range.cover?(code) cid = start_cid + code - range.first break end end end (cid == -1 ? 0 : cid) end |
#to_unicode(code) ⇒ Object
Returns the Unicode string in UTF-8 encoding for the given character code, or nil
if no mapping was found.
215 216 217 218 219 220 221 222 223 224 225 226 |
# File 'lib/hexapdf/font/cmap.rb', line 215 def to_unicode(code) @unicode_mapping.fetch(code) do @unicode_range_mappings.reverse_each do |range, start_values| if range.cover?(code) str = start_values[0..-2].append(start_values[-1] + code - range.first). pack('n*').encode(::Encoding::UTF_8, ::Encoding::UTF_16BE) return @unicode_mapping[code] = str end end nil end end |
#use_cmap(cmap) ⇒ Object
Add all mappings from the given CMap to this CMap.
125 126 127 128 129 130 131 |
# File 'lib/hexapdf/font/cmap.rb', line 125 def use_cmap(cmap) @codespace_ranges.concat(cmap.codespace_ranges) @cid_mapping.merge!(cmap.cid_mapping) @cid_range_mappings.concat(cmap.cid_range_mappings) @unicode_mapping.merge!(cmap.unicode_mapping) @unicode_range_mappings.concat(cmap.unicode_range_mappings) end |