Class: HexaPDF::Font::CMap

Inherits:

Object

Object
HexaPDF::Font::CMap

show all

Defined in:: lib/hexapdf/font/cmap.rb,
lib/hexapdf/font/cmap/parser.rb,
lib/hexapdf/font/cmap/writer.rb

Overview

Represents a CMap, a mapping from character codes to CIDs (character IDs) or to their Unicode value.

See: PDF2.0 s9.7.5, s9.10.3; Adobe Technical Notes #5014 and #5411

Defined Under Namespace

Classes: Parser, Writer

Constant Summary collapse

CMAP_DIR = :nodoc:

File.join(HexaPDF.data_dir, 'cmap')

Instance Attribute Summary collapse

#name ⇒ Object

The name of the CMap.
#ordering ⇒ Object

The ordering part of the CMap version.
#registry ⇒ Object

The registry part of the CMap version.
#supplement ⇒ Object

The supplement part of the CMap version.
#wmode ⇒ Object

The writing mode of the CMap: 0 for horizontal, 1 for vertical writing.

Class Method Summary collapse

.create_cid_cmap(mapping) ⇒ Object

Returns a string containing a CID CMap that represents the given code to CID mapping.
.create_to_unicode_cmap(mapping) ⇒ Object

Returns a string containing a ToUnicode CMap that represents the given code to Unicode codepoint mapping.
.for_name(name) ⇒ Object

Creates a new CMap object by parsing a predefined CMap with the given name.
.parse(string) ⇒ Object

Creates a new CMap object from the given string which needs to contain a valid CMap file.
.predefined?(name) ⇒ Boolean

Returns true if the given name specifies a predefined CMap.

Instance Method Summary collapse

#add_cid_mapping(code, cid) ⇒ Object

Adds an individual mapping from character code to CID.
#add_cid_range(start_code, end_code, start_cid) ⇒ Object

Adds a CID range, mapping characters codes from start_code to end_code to CIDs starting with start_cid.
#add_codespace_range(first, *rest) ⇒ Object

Add a codespace range using an array of ranges for the individual bytes.
#add_unicode_mapping(code, string) ⇒ Object

Adds a mapping from character code to Unicode string in UTF-8 encoding.
#add_unicode_range_mapping(start_code, end_code, start_values) ⇒ Object

Adds a mapping from a range of character codes to strings starting with the given 16-bit integer values (representing the raw UTF-16BE characters).
#initialize ⇒ CMap constructor

Creates a new CMap object.
#read_codes(string) ⇒ Object

Parses the string and returns all character codes.
#to_cid(code) ⇒ Object

Returns the CID for the given character code, or 0 if no mapping was found.
#to_unicode(code) ⇒ Object

Returns the Unicode string in UTF-8 encoding for the given character code, or nil if no mapping was found.
#use_cmap(cmap) ⇒ Object

Add all mappings from the given CMap to this CMap.

Constructor Details

#initialize ⇒ `CMap`

Creates a new CMap object.

# File 'lib/hexapdf/font/cmap.rb', line 116

def initialize
  @codespace_ranges = []
  @cid_mapping = {}
  @cid_range_mappings = []
  @unicode_mapping = {}
  @unicode_range_mappings = []
end

Instance Attribute Details

#name ⇒ `Object`

The name of the CMap.



105
106
107

# File 'lib/hexapdf/font/cmap.rb', line 105

def name
  @name
end

#ordering ⇒ `Object`

The ordering part of the CMap version.



99
100
101

# File 'lib/hexapdf/font/cmap.rb', line 99

def ordering
  @ordering
end

#registry ⇒ `Object`

The registry part of the CMap version.



96
97
98

# File 'lib/hexapdf/font/cmap.rb', line 96

def registry
  @registry
end

#supplement ⇒ `Object`

The supplement part of the CMap version.



102
103
104

# File 'lib/hexapdf/font/cmap.rb', line 102

def supplement
  @supplement
end

#wmode ⇒ `Object`

The writing mode of the CMap: 0 for horizontal, 1 for vertical writing.



108
109
110

# File 'lib/hexapdf/font/cmap.rb', line 108

def wmode
  @wmode
end

Class Method Details

.create_cid_cmap(mapping) ⇒ `Object`

Returns a string containing a CID CMap that represents the given code to CID mapping.

See: Writer#create_cid_cmap



91
92
93

# File 'lib/hexapdf/font/cmap.rb', line 91

def self.create_cid_cmap(mapping)
  Writer.new.create_cid_cmap(mapping)
end

.create_to_unicode_cmap(mapping) ⇒ `Object`

Returns a string containing a ToUnicode CMap that represents the given code to Unicode codepoint mapping.

See: Writer#create_to_unicode_cmap



84
85
86

# File 'lib/hexapdf/font/cmap.rb', line 84

def self.create_to_unicode_cmap(mapping)
  Writer.new.create_to_unicode_cmap(mapping)
end

.for_name(name) ⇒ `Object`

Creates a new CMap object by parsing a predefined CMap with the given name.

Raises an error if the given CMap is not found.

# File 'lib/hexapdf/font/cmap.rb', line 64

def self.for_name(name)
  return @cmap_cache[name] if @cmap_cache.key?(name)

  file = File.join(CMAP_DIR, name)
  if File.exist?(file)
    @cmap_cache[name] = parse(File.read(file, encoding: ::Encoding::UTF_8))
  else
    raise HexaPDF::Error, "No CMap named '#{name}' found"
  end
end

.parse(string) ⇒ `Object`

Creates a new CMap object from the given string which needs to contain a valid CMap file.



76
77
78

# File 'lib/hexapdf/font/cmap.rb', line 76

def self.parse(string)
  Parser.new.parse(string)
end

.predefined?(name) ⇒ `Boolean`

Returns true if the given name specifies a predefined CMap.

Returns:

(Boolean)



57
58
59

# File 'lib/hexapdf/font/cmap.rb', line 57

def self.predefined?(name)
  File.exist?(File.join(CMAP_DIR, name))
end

Instance Method Details

#add_cid_mapping(code, cid) ⇒ `Object`

Adds an individual mapping from character code to CID.



178
179
180

# File 'lib/hexapdf/font/cmap.rb', line 178

def add_cid_mapping(code, cid)
  @cid_mapping[code] = cid
end

#add_cid_range(start_code, end_code, start_cid) ⇒ `Object`

Adds a CID range, mapping characters codes from start_code to end_code to CIDs starting with start_cid.



184
185
186

# File 'lib/hexapdf/font/cmap.rb', line 184

def add_cid_range(start_code, end_code, start_cid)
  @cid_range_mappings << [start_code..end_code, start_cid]
end

#add_codespace_range(first, *rest) ⇒ `Object`

Add a codespace range using an array of ranges for the individual bytes.

This means that the first range is checked against the first byte, the second range against the second byte and so on.



137
138
139

# File 'lib/hexapdf/font/cmap.rb', line 137

def add_codespace_range(first, *rest)
  @codespace_ranges << [first, rest]
end

#add_unicode_mapping(code, string) ⇒ `Object`

Adds a mapping from character code to Unicode string in UTF-8 encoding.



203
204
205

# File 'lib/hexapdf/font/cmap.rb', line 203

def add_unicode_mapping(code, string)
  @unicode_mapping[code] = string
end

#add_unicode_range_mapping(start_code, end_code, start_values) ⇒ `Object`

Adds a mapping from a range of character codes to strings starting with the given 16-bit integer values (representing the raw UTF-16BE characters).



209
210
211

# File 'lib/hexapdf/font/cmap.rb', line 209

def add_unicode_range_mapping(start_code, end_code, start_values)
  @unicode_range_mappings << [start_code..end_code, start_values]
end

#read_codes(string) ⇒ `Object`

Parses the string and returns all character codes.

An error is raised if the string contains invalid bytes.

# File 'lib/hexapdf/font/cmap.rb', line 144

def read_codes(string)
  codes = []
  bytes = string.each_byte

  loop do
    byte = bytes.next
    code = 0

    found = @codespace_ranges.any? do |first_byte_range, rest_ranges|
      next unless first_byte_range.cover?(byte)

      code = (code << 8) + byte
      valid = rest_ranges.all? do |range|
        begin
          byte = bytes.next
        rescue StopIteration
          raise HexaPDF::Error, "Missing bytes while reading codes via CMap"
        end
        code = (code << 8) + byte
        range.cover?(byte)
      end

      codes << code if valid
    end

    unless found
      raise HexaPDF::Error, "Invalid byte while reading codes via CMap: #{byte}"
    end
  end

  codes
end

#to_cid(code) ⇒ `Object`

Returns the CID for the given character code, or 0 if no mapping was found.

# File 'lib/hexapdf/font/cmap.rb', line 189

def to_cid(code)
  cid = @cid_mapping.fetch(code, -1)
  if cid == -1
    @cid_range_mappings.reverse_each do |range, start_cid|
      if range.cover?(code)
        cid = start_cid + code - range.first
        break
      end
    end
  end
  (cid == -1 ? 0 : cid)
end

#to_unicode(code) ⇒ `Object`

Returns the Unicode string in UTF-8 encoding for the given character code, or nil if no mapping was found.

# File 'lib/hexapdf/font/cmap.rb', line 215

def to_unicode(code)
  @unicode_mapping.fetch(code) do
    @unicode_range_mappings.reverse_each do |range, start_values|
      if range.cover?(code)
        str = start_values[0..-2].append(start_values[-1] + code - range.first).
          pack('n*').encode(::Encoding::UTF_8, ::Encoding::UTF_16BE)
        return @unicode_mapping[code] = str
      end
    end
    nil
  end
end

#use_cmap(cmap) ⇒ `Object`

Add all mappings from the given CMap to this CMap.

# File 'lib/hexapdf/font/cmap.rb', line 125

def use_cmap(cmap)
  @codespace_ranges.concat(cmap.codespace_ranges)
  @cid_mapping.merge!(cmap.cid_mapping)
  @cid_range_mappings.concat(cmap.cid_range_mappings)
  @unicode_mapping.merge!(cmap.unicode_mapping)
  @unicode_range_mappings.concat(cmap.unicode_range_mappings)
end

Class: HexaPDF::Font::CMap

Overview

Defined Under Namespace

Constant Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize ⇒ CMap

Instance Attribute Details

#name ⇒ Object

#ordering ⇒ Object

#registry ⇒ Object

#supplement ⇒ Object

#wmode ⇒ Object

Class Method Details

.create_cid_cmap(mapping) ⇒ Object

.create_to_unicode_cmap(mapping) ⇒ Object

.for_name(name) ⇒ Object

.parse(string) ⇒ Object

.predefined?(name) ⇒ Boolean