Class: HexaPDF::Serializer

Inherits:
Object
  • Object
show all
Defined in:
lib/hexapdf/serializer.rb

Overview

Knows how to serialize Ruby objects for a PDF file.

For normal serialization purposes, the #serialize or #serialize_to_io methods should be used. However, if the type of the object to be serialized is known, a specialized serialization method like #serialize_float can be used.

Additionally, an object for encrypting strings and streams while serializing can be set via the #encrypter= method. The assigned object has to respond to #encrypt_string(str, ind_obj) (where the string is part of the indirect object; returns the encrypted string) and #encrypt_stream(stream) (returns a fiber that represents the encrypted stream).

How This Class Works

The main public interface consists of the #serialize and #serialize_to_io methods which accept an object and return its serialized form. During serialization of this object it is accessible by individual serialization methods via the @object instance variable (useful if the object is a composed object).

Internally, the #__serialize method is used for invoking the correct serialization method based on the class of a given object. It is also used for serializing individual parts of a composed object.

Therefore the serializer contains one serialization method for each class it needs to serialize. The naming scheme of these methods is based on the class name: The full class name is converted to lowercase, the namespace separator ‘::’ is replaced with a single underscore and the string “serialize_” is then prepended.

Examples:

NilClass                 => serialize_nilclass
TrueClass                => serialize_trueclass
HexaPDF::Object          => serialize_hexapdf_object

If no serialization method for a specific class is found, the ancestors classes are tried.

See: PDF2.0 s7.3

Constant Summary collapse

NAME_SUBSTS =

The regexp matches all characters that need to be escaped and the substs hash contains the mapping from these characters to their escaped form.

See PDF2.0 s7.3.5

{}
NAME_REGEXP =

:nodoc:

/[^!-~&&[^##{Regexp.escape(Tokenizer::DELIMITER)}#{Regexp.escape(Tokenizer::WHITESPACE)}]]/
NAME_CACHE =

:nodoc:

Utils::LRUCache.new(1000)
BYTE_IS_DELIMITER =

:nodoc:

{40 => true, 47 => true, 60 => true, 91 => true, # :nodoc:
41 => true, 62 => true, 93 => true}.freeze
STRING_ESCAPE_MAP =

:nodoc:

{"(" => "\\(", ")" => "\\)", "\\" => "\\\\", "\r" => "\\r"}.freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeSerializer

Creates a new Serializer object.



92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# File 'lib/hexapdf/serializer.rb', line 92

def initialize
  @dispatcher = {
    Hash => 'serialize_hash',
    Array => 'serialize_array',
    Symbol => 'serialize_symbol',
    String => 'serialize_string',
    Integer => 'serialize_integer',
    Float => 'serialize_float',
    Time => 'serialize_time',
    TrueClass => 'serialize_trueclass',
    FalseClass => 'serialize_falseclass',
    NilClass => 'serialize_nilclass',
    HexaPDF::Reference => 'serialize_hexapdf_reference',
    HexaPDF::Object => 'serialize_hexapdf_object',
    HexaPDF::Stream => 'serialize_hexapdf_stream',
    HexaPDF::Dictionary => 'serialize_hexapdf_object',
    HexaPDF::PDFArray => 'serialize_hexapdf_object',
    HexaPDF::Rectangle => 'serialize_hexapdf_object',
  }
  @dispatcher.default_proc = lambda do |h, klass|
    h[klass] = if klass <= HexaPDF::Stream
                 "serialize_hexapdf_stream"
               elsif klass <= HexaPDF::Object
                 "serialize_hexapdf_object"
               else
                 method = nil
                 klass.ancestors.each do |ancestor_klass|
                   name = ancestor_klass.name.to_s.downcase
                   name.gsub!(/::/, '_')
                   method = "serialize_#{name}"
                   break if respond_to?(method, true)
                 end
                 method
               end
  end
  @encrypter = false
  @io = nil
  @object = nil
  @in_object = false
end

Instance Attribute Details

#encrypterObject

The encrypter to use for encrypting strings and streams. If nil, strings and streams are not encrypted.

Default: nil



89
90
91
# File 'lib/hexapdf/serializer.rb', line 89

def encrypter
  @encrypter
end

Instance Method Details

#serialize(obj) ⇒ Object

Returns the serialized form of the given object.

For developers: While the object is serialized, methods can use the instance variable



137
138
139
140
141
142
# File 'lib/hexapdf/serializer.rb', line 137

def serialize(obj)
  @object = obj
  __serialize(obj)
ensure
  @object = nil
end

#serialize_array(obj) ⇒ Object

Serializes an Array object.

See: PDF2.0 s7.3.6



244
245
246
247
248
249
250
251
252
253
254
255
# File 'lib/hexapdf/serializer.rb', line 244

def serialize_array(obj)
  str = +"["
  index = 0
  while index < obj.size
    tmp = __serialize(obj[index])
    str << " " unless BYTE_IS_DELIMITER[tmp.getbyte(0)] ||
      BYTE_IS_DELIMITER[str.getbyte(-1)]
    str << tmp
    index += 1
  end
  str << "]"
end

#serialize_basicobject(obj) ⇒ Object

Raises an error to provide better failure messages.

Raises:



155
156
157
158
159
160
161
162
# File 'lib/hexapdf/serializer.rb', line 155

def serialize_basicobject(obj)
  object_message = if @object.kind_of?(HexaPDF::Object)
                     "#{obj} (part of #{@object.oid},#{@object.gen})"
                   else
                     obj.inspect
                   end
  raise HexaPDF::Error, "No serialization method for #{object_message}"
end

#serialize_date(obj) ⇒ Object

See: #serialize_time



309
310
311
# File 'lib/hexapdf/serializer.rb', line 309

def serialize_date(obj)
  serialize_time(obj.to_time)
end

#serialize_datetime(obj) ⇒ Object

See: #serialize_time



314
315
316
# File 'lib/hexapdf/serializer.rb', line 314

def serialize_datetime(obj)
  serialize_time(obj.to_time)
end

#serialize_falseclass(_obj) ⇒ Object

Serializes the false value.

See: PDF2.0 s7.3.2



181
182
183
# File 'lib/hexapdf/serializer.rb', line 181

def serialize_falseclass(_obj)
  "false"
end

#serialize_float(obj) ⇒ Object

Serializes a Float object.

See: PDF2.0 s7.3.3



205
206
207
208
209
210
211
212
213
# File 'lib/hexapdf/serializer.rb', line 205

def serialize_float(obj)
  if -0.0001 < obj && obj < 0.0001 && obj != 0
    sprintf("%.6f", obj)
  elsif obj.finite?
    obj.round(6).to_s
  else
    raise HexaPDF::Error, "Can't serialize special floating point number #{obj}"
  end
end

#serialize_hash(obj) ⇒ Object

Serializes a Hash object (i.e. a PDF dictionary object).

See: PDF2.0 s7.3.7



260
261
262
263
264
265
266
267
268
269
270
271
# File 'lib/hexapdf/serializer.rb', line 260

def serialize_hash(obj)
  str = +"<<"
  obj.each do |k, v|
    next if v.nil? || (v.respond_to?(:null?) && v.null?)
    str << serialize_symbol(k)
    tmp = __serialize(v)
    str << " " unless BYTE_IS_DELIMITER[tmp.getbyte(0)] ||
      BYTE_IS_DELIMITER[str.getbyte(-1)]
    str << tmp
  end
  str << ">>"
end

#serialize_integer(obj) ⇒ Object

Serializes an Integer object.

See: PDF2.0 s7.3.3



198
199
200
# File 'lib/hexapdf/serializer.rb', line 198

def serialize_integer(obj)
  obj.to_s
end

#serialize_nilclass(_obj) ⇒ Object

Serializes the nil value.

See: PDF2.0 s7.3.9



167
168
169
# File 'lib/hexapdf/serializer.rb', line 167

def serialize_nilclass(_obj)
  "null"
end

#serialize_numeric(obj) ⇒ Object

Serializes a Numeric object (either Integer or Float).

This method should be used for cases where it is known that the object is either an Integer or a Float.

See: PDF2.0 s7.3.3



191
192
193
# File 'lib/hexapdf/serializer.rb', line 191

def serialize_numeric(obj)
  obj.kind_of?(Integer) ? obj.to_s : serialize_float(obj)
end

#serialize_string(obj) ⇒ Object

Serializes a String object.

See: PDF2.0 s7.3.4



278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
# File 'lib/hexapdf/serializer.rb', line 278

def serialize_string(obj)
  obj = if @encrypter && @object.kind_of?(HexaPDF::Object) && @object.indirect?
          encrypter.encrypt_string(obj, @object)
        elsif obj.encoding != Encoding::BINARY
          if obj.match?(/[^ -~\t\r\n]/)
            "\xFE\xFF".b << obj.encode(Encoding::UTF_16BE).force_encoding(Encoding::BINARY)
          else
            obj.b
          end
        else
          obj.dup
        end
  obj.gsub!(/[()\\\r]/n, STRING_ESCAPE_MAP)
  "(#{obj})"
end

#serialize_symbol(obj) ⇒ Object

Serializes a Symbol object (i.e. a PDF name object).

See: PDF2.0 s7.3.5



229
230
231
232
233
234
235
236
# File 'lib/hexapdf/serializer.rb', line 229

def serialize_symbol(obj)
  NAME_CACHE[obj] ||=
    begin
      str = obj.to_s.dup.force_encoding(Encoding::BINARY)
      str.gsub!(NAME_REGEXP, NAME_SUBSTS)
      str.empty? ? "/ " : "/#{str}"
    end
end

#serialize_time(obj) ⇒ Object

The ISO PDF specification differs in respect to the supported date format. When converting to a date string, a format suitable for both is output.

See: PDF2.0 s7.9.4, ADB1.7 3.8.3



298
299
300
301
302
303
304
305
306
# File 'lib/hexapdf/serializer.rb', line 298

def serialize_time(obj)
  zone = obj.strftime("%z'")
  if zone == "+0000'"
    zone = ''
  else
    zone[3, 0] = "'"
  end
  serialize_string(obj.strftime("D:%Y%m%d%H%M%S#{zone}"))
end

#serialize_to_io(obj, io) ⇒ Object

Serializes the given object and writes it to the IO.

Also see: #serialize



147
148
149
150
151
152
# File 'lib/hexapdf/serializer.rb', line 147

def serialize_to_io(obj, io)
  @io = io
  @io << serialize(obj).freeze
ensure
  @io = nil
end

#serialize_trueclass(_obj) ⇒ Object

Serializes the true value.

See: PDF2.0 s7.3.2



174
175
176
# File 'lib/hexapdf/serializer.rb', line 174

def serialize_trueclass(_obj)
  "true"
end