Class: HexaPDF::Type::FileSpecification

Inherits:
Dictionary show all
Defined in:
lib/hexapdf/type/file_specification.rb

Overview

Represents a file specification dictionary.

File specifications are used to refer to other files or URLs from within a PDF file. Simple file specifications are just strings. However, the are automatically converted on access to a full file specification to provide a unified interface.

Working with File Specifications

A file specification may refer to a file or an URL. This can easily be checked with #url?. Independent of whether the file specification referes to an URL or a file, the #path method returns the “best” useable path for it.

Modifying a file specification should be done via the #path= and #url= methods as they ensure that no obsolescent entries are used and the file specification is consistent.

Finally, since embedded files in a PDF document are always linked to a file specification it is useful to provide embedding/unembedding operations in this class, see #embed and #unembed.

See: PDF2.0 s7.11

Defined Under Namespace

Classes: EFDictionary

Constant Summary

Constants included from DictionaryFields

DictionaryFields::Boolean, DictionaryFields::PDFByteString, DictionaryFields::PDFDate

Instance Attribute Summary

Attributes inherited from Object

#data, #document, #must_be_indirect

Instance Method Summary collapse

Methods inherited from Dictionary

#[], #[]=, define_field, define_type, #delete, #each, each_field, #empty?, field, #key?, #to_hash, type, #type

Methods inherited from Object

#<=>, #==, #cache, #cached?, #clear_cache, deep_copy, #deep_copy, #document?, #eql?, field, #gen, #gen=, #hash, #indirect?, #initialize, #inspect, make_direct, #must_be_indirect?, #null?, #oid, #oid=, #type, #validate, #value, #value=

Constructor Details

This class inherits a constructor from HexaPDF::Object

Instance Method Details

#embed(file_or_io, name: nil, mime_type: nil, register: true) ⇒ Object

:call-seq:

file_spec.embed(filename, name: File.basename(filename), mime_type: nil, register: true)   -> ef_stream
file_spec.embed(io, name:, mime_type: nil, register: true)                                 -> ef_stream

Embeds the given file or IO stream into the PDF file, sets the path and MIME type accordingly and returns the created stream object.

If a file is given, the name option defaults to the basename of the file. However, if an IO object is given, the name argument is mandatory.

If there already was a file embedded for this file specification, it is unembedded first.

The embedded file stream automatically uses the FlateEncode filter for compressing the embedded file.

Options:

name

The name that should be used as path value and when registering.

mime_type

Optionally specifies the MIME type of the file.

register

Specifies whether the embedded file will be added to the EmbeddedFiles name tree under the name. If the name is already taken, it’s value is overwritten.

The file has to be available until the PDF document gets written because reading and writing is done lazily.



192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
# File 'lib/hexapdf/type/file_specification.rb', line 192

def embed(file_or_io, name: nil, mime_type: nil, register: true)
  name ||= File.basename(file_or_io) if file_or_io.kind_of?(String)
  if name.nil?
    raise ArgumentError, "The name argument is mandatory when given an IO object"
  end

  unembed
  self.path = name

  self[:EF] ||= {}
  ef_stream = self[:EF][:UF] = self[:EF][:F] = document.add({Type: :EmbeddedFile})
  ef_stream[:Subtype] = mime_type.to_sym if mime_type
  stat = if file_or_io.kind_of?(String)
           File.stat(file_or_io)
         elsif file_or_io.respond_to?(:stat)
           file_or_io.stat
         end
  if stat
    ef_stream[:Params] = {Size: stat.size, CreationDate: stat.ctime, ModDate: stat.mtime}
  end
  ef_stream.set_filter(:FlateDecode)
  ef_stream.stream = HexaPDF::StreamData.new(file_or_io)

  if register
    (document.catalog[:Names] ||= {})[:EmbeddedFiles] ||= {}
    document.catalog[:Names][:EmbeddedFiles].add_entry(name, self)
  end

  ef_stream
end

#embedded_file?Boolean

Returns true if this file specification contains an embedded file.

See: #embedded_file_stream

Returns:



148
149
150
# File 'lib/hexapdf/type/file_specification.rb', line 148

def embedded_file?
  key?(:EF) && !self[:EF].empty?
end

#embedded_file_streamObject

Returns the embedded file associated with this file specification, or nil if this file specification references no embedded file.

If there are multiple possible embedded files, the /EF fields are searched in the following order and the first one with a value is used: /UF, /F, /Unix, /Mac, /DOS.



157
158
159
160
161
# File 'lib/hexapdf/type/file_specification.rb', line 157

def embedded_file_stream
  return unless key?(:EF)
  ef = self[:EF]
  ef[:UF] || ef[:F] || ef[:Unix] || ef[:Mac] || ef[:DOS]
end

#pathObject

Returns the path for the referenced file or URL. An empty string is returned if no file specification string is set.

If multiple file specification strings are available, the fields are search in the following order and the first one with a value is used: /UF, /F, /Unix, /Mac, /DOS.

The encoding of the returned path string is either UTF-8 (for /UF) or BINARY (for /F /Unix, /Mac and /DOS).



111
112
113
114
115
116
# File 'lib/hexapdf/type/file_specification.rb', line 111

def path
  tmp = (self[:UF] || self[:F] || self[:Unix] || self[:Mac] || self[:DOS] || '').dup
  tmp.gsub!(/\\\//, "/") # PDF2.0 s7.11.2.1 but / in filename is interpreted as separator!
  tmp.tr!("\\", "/") # always use slashes instead of back-slashes!
  tmp
end

#path=(filename) ⇒ Object

Sets the file specification string to the given filename.

Since the /Unix, /Mac and /DOS fields are deprecated, only the /F and /UF fields are set.



121
122
123
124
125
126
127
128
# File 'lib/hexapdf/type/file_specification.rb', line 121

def path=(filename)
  self[:UF] = filename
  self[:F] = filename.b
  delete(:FS)
  delete(:Unix)
  delete(:Mac)
  delete(:DOS)
end

#unembedObject

Deletes any embedded file streams associated with this file specification. A possible entry in the EmbeddedFiles name tree is also deleted.



225
226
227
228
229
230
231
232
233
234
235
# File 'lib/hexapdf/type/file_specification.rb', line 225

def unembed
  return unless key?(:EF)
  self[:EF].each {|_, ef_stream| document.delete(ef_stream) }

  if document.catalog.key?(:Names) && document.catalog[:Names].key?(:EmbeddedFiles)
    tree = document.catalog[:Names][:EmbeddedFiles]
    tree.each_entry.find_all {|_, spec| spec == self }.each do |(name, _)|
      tree.delete_entry(name)
    end
  end
end

#url=(url) ⇒ Object

Sets the file specification string to the given URL and updates the file system entry appropriately.

The provided URL needs to be in an RFC1738 compliant string representation. If not, an error is raised.



135
136
137
138
139
140
141
142
143
# File 'lib/hexapdf/type/file_specification.rb', line 135

def url=(url)
  begin
    URI(url)
  rescue URI::InvalidURIError => e
    raise HexaPDF::Error, e
  end
  self.path = url
  self[:FS] = :URL
end

#url?Boolean

Returns true if this file specification references an URL and not a file.

Returns:



99
100
101
# File 'lib/hexapdf/type/file_specification.rb', line 99

def url?
  self[:FS] == :URL
end