Class: Mapi::Pst

Inherits:

Object

Object
Mapi::Pst

show all

Includes:: Enumerable

Defined in:: lib/mapi/pst.rb

Defined Under Namespace

Modules: Desc2, Index2 Classes: Attachment, AttachmentTable, BlockParser, CompressibleEncryption, Desc, Desc64, FormatError, Header, ID2Assoc, ID2Assoc64, ID2Mapping, Index, Index64, Item, RangesIOEncryptable, RangesIOID2, RangesIOIdxChain, RawPropertyStore, RawPropertyStoreTable, Recipient, RecipientTable, TablePtr

Constant Summary collapse

ToTree = this is the index and desc record loading code

Module.new

ITEM_COUNT_OFFSET = more constants from libpst.c these relate to the index block

0x1f0

LEVEL_INDICATOR_OFFSET = count byte

0x1f3

BACKLINK_OFFSET = node or leaf

0x1f8

ITEM_COUNT_OFFSET_64 = mostly guesses.

0x1e8

LEVEL_INDICATOR_OFFSET_64 = diff of 3 between these 2 as above…

0x1eb

Instance Attribute Summary collapse

#desc ⇒ Object readonly

Returns the value of attribute desc.
#header ⇒ Object readonly

Returns the value of attribute header.
#idx ⇒ Object readonly

Returns the value of attribute idx.
#io ⇒ Object readonly

Returns the value of attribute io.
#special_folder_ids ⇒ Object readonly

Returns the value of attribute special_folder_ids.

Class Method Summary collapse

.make_property_set(property_list) ⇒ Object

higher level item code.
.unpack(str, unpack_spec) ⇒ Object

unfortunately there is no Q analogue which is little endian only.

Instance Method Summary collapse

#desc_from_id(id) ⇒ Object

as for idx.
#dump_debug_info ⇒ Object

other random code —————————————————————————-.
#each(&block) ⇒ Object
#encrypted? ⇒ Boolean
#id2_block_idx_chain(idx) ⇒ Object

corresponds to: * _pst_ff_getID2block * _pst_ff_getID2data * _pst_ff_compile_ID.
#idx_from_id(id) ⇒ Object

most access to idx objects will use this function.
#initialize(io) ⇒ Pst constructor

corresponds to * pst_open * pst_load_index.
#inspect ⇒ Object
#load_desc ⇒ Object

corresponds to * _pst_build_desc_ptr * record_descriptor.
#load_desc_rec(offset, linku1, start_val) ⇒ Object

load the flat list of desc records recursively.
#load_idx ⇒ Object

corresponds to * _pst_build_id_ptr.
#load_idx2(idx) ⇒ Object
#load_idx2_rec(idx) ⇒ Object

corresponds to * _pst_build_id2.
#load_idx_rec(offset, linku1, start_val) ⇒ Object

load the flat idx table, which maps ids to file ranges.
#load_xattrib ⇒ Object

corresponds to * pst_load_extended_attributes.
#name ⇒ Object
#pst_parse_item(desc) ⇒ Object

corresponds to * _pst_parse_item.
#pst_read_block_size(offset, size, decrypt = true) ⇒ Object

corresponds to: * _pst_read_block_size * _pst_read_block ?? * _pst_ff_getIDblock_dec ?? * _pst_ff_getIDblock ??.
#root ⇒ Object
#root_desc ⇒ Object
#root_item ⇒ Object
#warn(s) ⇒ Object

until i properly fix logging…

Constructor Details

#initialize(io) ⇒ `Pst`

corresponds to

pst_open
pst_load_index

Raises:

(FormatError)

# File 'lib/mapi/pst.rb', line 265

def initialize io
	@io = io
	io.pos = 0
	@header = Header.new io.read(Header::SIZE)

	# would prefer this to be in Header#validate, but it doesn't have the io size.
	# should perhaps downgrade this to just be a warning...
	raise FormatError, "header size field invalid (#{header.size} != #{io.size}}" unless header.size == io.size

	load_idx
	load_desc
	load_xattrib

	@special_folder_ids = {}
end

Instance Attribute Details

#desc ⇒ `Object` (readonly)

Returns the value of attribute desc.



260
261
262

# File 'lib/mapi/pst.rb', line 260

def desc
  @desc
end

#header ⇒ `Object` (readonly)

Returns the value of attribute header.



260
261
262

# File 'lib/mapi/pst.rb', line 260

def header
  @header
end

#idx ⇒ `Object` (readonly)

Returns the value of attribute idx.



260
261
262

# File 'lib/mapi/pst.rb', line 260

def idx
  @idx
end

#io ⇒ `Object` (readonly)

Returns the value of attribute io.



260
261
262

# File 'lib/mapi/pst.rb', line 260

def io
  @io
end

#special_folder_ids ⇒ `Object` (readonly)

Returns the value of attribute special_folder_ids.



260
261
262

# File 'lib/mapi/pst.rb', line 260

def special_folder_ids
  @special_folder_ids
end

Class Method Details

.make_property_set(property_list) ⇒ `Object`

higher level item code. wraps up the raw properties above, and gives nice objects to work with. handles item relationships too.

# File 'lib/mapi/pst.rb', line 1502

def self.make_property_set property_list
	hash = property_list.inject({}) do |hash, (key, type, value)|
		hash.update PropertySet::Key.new(key) => value
	end
	PropertySet.new hash
end

.unpack(str, unpack_spec) ⇒ `Object`

unfortunately there is no Q analogue which is little endian only. this translates T as an unsigned quad word, little endian byte order, to not pollute the rest of the code.

didn’t want to override String#unpack, cause its too hacky, and incomplete.

# File 'lib/mapi/pst.rb', line 74

def self.unpack str, unpack_spec
	return str.unpack(unpack_spec) unless unpack_spec['T']
	@unpack_cache ||= {}
	t_offsets, new_spec = @unpack_cache[unpack_spec]
	unless t_offsets
		t_offsets = []
		offset = 0
		new_spec = ''
		unpack_spec.scan(/([^\d])_?(\*|\d+)?/o) do
			num_elems = $1.downcase == 'a' ? 1 : ($2 || 1).to_i
			if $1 == 'T'
				num_elems.times { |i| t_offsets << offset + i }
				new_spec << "V#{num_elems * 2}"
			else
				new_spec << $~[0]
			end
			offset += num_elems
		end
		@unpack_cache[unpack_spec] = [t_offsets, new_spec]
	end
	a = str.unpack(new_spec)
	t_offsets.each do |offset|
		low, high = a[offset, 2]
		a[offset, 2] = low && high ? low + (high << 32) : nil
	end
	a
end

Instance Method Details

#desc_from_id(id) ⇒ `Object`

as for idx

corresponds to:

_pst_getDptr



748
749
750

# File 'lib/mapi/pst.rb', line 748

def desc_from_id id
	@desc_from_id[id]
end

#dump_debug_info ⇒ `Object`

other random code

# File 'lib/mapi/pst.rb', line 1689

def dump_debug_info
	puts "* pst header"
	p header

=begin
Looking at the output of this, for blank-o1997.pst, i see this part:
...
- (26624,516) desc block data (overlap of 4 bytes)
- (27136,516) desc block data (gap of 508 bytes)
- (28160,516) desc block data (gap of 2620 bytes)
...

which confirms my belief that the block size for idx and desc is more likely 512
=end
	if 0 + 0 == 0
		puts '* file range usage'
		file_ranges =
			# these 3 things, should account for most of the data in the file.
			[[0, Header::SIZE, 'pst file header']] +
			@idx_offsets.map { |offset| [offset, Index::BLOCK_SIZE, 'idx block data'] } +
			@desc_offsets.map { |offset| [offset, Desc::BLOCK_SIZE, 'desc block data'] } +
			@idx.map { |idx| [idx.offset, idx.size, 'idx id=0x%x (%s)' % [idx.id, idx.type]] }
		(file_ranges.sort_by { |idx| idx.first } + [nil]).to_enum(:each_cons, 2).each do |(offset, size, name), next_record|
			# i think there is a padding of the size out to 64 bytes
			# which is equivalent to padding out the final offset, because i think the offset is 
			# similarly oriented
			pad_amount = 64
			warn 'i am wrong about the offset padding' if offset % pad_amount != 0
			# so, assuming i'm not wrong about that, then we can calculate how much padding is needed.
			pad = pad_amount - (size % pad_amount)
			pad = 0 if pad == pad_amount
			gap = next_record ? next_record.first - (offset + size + pad) : 0
			extra = case gap <=> 0
				when -1; ["overlap of #{gap.abs} bytes)"]
				when  0; []
				when +1; ["gap of #{gap} bytes"]
			end
			# how about we check that padding
			@io.pos = offset + size
			pad_bytes = @io.read(pad)
			extra += ["padding not all zero"] unless pad_bytes == 0.chr * pad
			puts "- #{offset}:#{size}+#{pad} #{name.inspect}" + (extra.empty? ? '' : ' [' + extra * ', ' + ']')
		end
	end

	# i think the idea of the idx, and indeed the idx2, is just to be able to
	# refer to data indirectly, which means it can get moved around, and you just update
	# the idx table. it is simply a list of file offsets and sizes.
	# not sure i get how id2 plays into it though....
	# the sizes seem to be all even. is that a co-incidence? and the ids are all even. that
	# seems to be related to something else (see the (id & 2) == 1 stuff)
	puts '* idx entries'
	@idx.each { |idx| puts "- #{idx.inspect}" }

	# if you look at the desc tree, you notice a few things:
	# 1. there is a desc that seems to be the parent of all the folders, messages etc.
	#    it is the one whose parent is itself.
	#    one of its children is referenced as the subtree_entryid of the first desc item,
	#    the root.
	# 2. typically only 2 types of desc records have idx2_id != 0. messages themselves,
	#    and the desc with id = 0x61 - the xattrib container. everything else uses the
	#    regular ids to find its data. i think it should be reframed as small blocks and
	#    big blocks, but i'll look into it more.
	#
	# idx_id and idx2_id are for getting to the data. desc_id and parent_desc_id just define
	# the parent <-> child relationship, and the desc_ids are how the items are referred to in
	# entryids.
	# note that these aren't unique! eg for 0, 4 etc. i expect these'd never change, as the ids
	# are stored in entryids. whereas the idx and idx2 could be a bit more volatile.
	puts '* desc tree'
	# make a dummy root hold everything just for convenience
	root = Desc.new ''
	def root.inspect; "#<Pst::Root>"; end
	root.children.replace @orphans
	# this still loads the whole thing as a string for gsub. should use directo output io
	# version.
	puts root.to_tree.gsub(/, (parent_desc_id|idx2_id)=0x0(?!\d)/, '')

	# this is fairly easy to understand, its just an attempt to display the pst items in a tree form
	# which resembles what you'd see in outlook.
	puts '* item tree'
	# now streams directly
	root_item.to_tree STDOUT
end

#each(&block) ⇒ `Object`

# File 'lib/mapi/pst.rb', line 1791

def each(&block)
	root = self.root
	block[root]
	root.each_recursive(&block)
end

#encrypted? ⇒ `Boolean`

Returns:

(Boolean)



281
282
283

# File 'lib/mapi/pst.rb', line 281

def encrypted?
	@header.encrypted?
end

#id2_block_idx_chain(idx) ⇒ `Object`

corresponds to:

_pst_ff_getID2block
_pst_ff_getID2data
_pst_ff_compile_ID

# File 'lib/mapi/pst.rb', line 911

def id2_block_idx_chain idx
	if (idx.id & 0x2) == 0
		[idx]
	else
		buf = idx.read
		type, fdepth, count = buf[0, 4].unpack 'CCv'
		unless type == 1 # libpst.c:3958
			warn 'Error in idx_chain - %p, %p, %p - attempting to ignore' % [type, fdepth, count]
			return [idx]
		end
		# there are 4 unaccounted for bytes here, 4...8
		if header.version_2003?
			ids = buf[8, count * 8].unpack("T#{count}")
		else
			ids = buf[8, count * 4].unpack('V*')
		end
		if fdepth == 1
			ids.map { |id| idx_from_id id }
		else
			ids.map { |id| id2_block_idx_chain idx_from_id(id) }.flatten
		end
	end
end

#idx_from_id(id) ⇒ `Object`

most access to idx objects will use this function

corresponds to

_pst_getID



652
653
654

# File 'lib/mapi/pst.rb', line 652

def idx_from_id id
	@idx_from_id[id]
end

#inspect ⇒ `Object`



1801
1802
1803

# File 'lib/mapi/pst.rb', line 1801

def inspect
	"#<Pst name=#{name.inspect} io=#{io.inspect}>"
end

#load_desc ⇒ `Object`

corresponds to

_pst_build_desc_ptr
record_descriptor

# File 'lib/mapi/pst.rb', line 659

def load_desc
	@desc = []
	@desc_offsets = []
	if header.version_2003?
		@desc = Desc64.load_chain io, header
		@desc.each { |desc| desc.pst = self }
	else
		load_desc_rec header.index2, header.index2_count, 0x21
	end

	# first create a lookup cache
	@desc_from_id = {}
		@desc.each do |desc|
		desc.pst = self
		warn "there are duplicate desc records with id #{desc.desc_id}" if @desc_from_id[desc.desc_id]
		@desc_from_id[desc.desc_id] = desc
	end

	# now turn the flat list of loaded desc records into a tree

	# well, they have no parent, so they're more like, the toplevel descs.
	@orphans = []
	# now assign each node to the parents child array, putting the orphans in the above
	@desc.each do |desc|
		parent = @desc_from_id[desc.parent_desc_id]
		# note, besides this, its possible to create other circular structures.
		if parent == desc
			# this actually happens usually, for the root_item it appears.
			#warn "desc record's parent is itself (#{desc.inspect})"
		# maybe add some more checks in here for circular structures
		elsif parent
			parent.children << desc
			next
		end
		@orphans << desc
	end

	# maybe change this to some sort of sane-ness check. orphans are expected
#		warn "have #{@orphans.length} orphan desc record(s)." unless @orphans.empty?
end

#load_desc_rec(offset, linku1, start_val) ⇒ `Object`

load the flat list of desc records recursively

corresponds to

_pst_build_desc_ptr
record_descriptor

# File 'lib/mapi/pst.rb', line 705

def load_desc_rec offset, linku1, start_val
	@desc_offsets << offset
	
	buf = pst_read_block_size offset, Desc::BLOCK_SIZE, false
	item_count = buf[ITEM_COUNT_OFFSET]

	# not real desc
	desc = Desc.new buf[BACKLINK_OFFSET, 4]
	raise 'blah 1' unless desc.desc_id == linku1

	if buf[LEVEL_INDICATOR_OFFSET] == 0
		# leaf pointers
		raise "have too many active items in index (#{item_count})" if item_count > Desc::COUNT_MAX
		# split the data into item_count desc objects
		buf[0, Desc::SIZE * item_count].scan(/.{#{Desc::SIZE}}/mo).each_with_index do |data, i|
			desc = Desc.new data
			# first entry
			raise 'blah 3' if i == 0 and start_val != 0 and desc.desc_id != start_val
			# this shouldn't really happen i'd imagine
			break if desc.desc_id == 0
			@desc << desc
		end
	else
		# node pointers
		raise "have too many active items in index (#{item_count})" if item_count > Index::COUNT_MAX
		# split the data into item_count table pointers
		buf[0, TablePtr::SIZE * item_count].scan(/.{#{TablePtr::SIZE}}/mo).each_with_index do |data, i|
			table = TablePtr.new data
			# for the first value, we expect the start to be equal note that ids -1, so even for the
			# first we expect it to be equal. thats the 0x21 (dec 33) desc record. this means we assert
			# that the first desc record is always 33...
			raise 'blah 3' if i == 0 and start_val != -1 and table.start != start_val
			# this shouldn't really happen i'd imagine
			break if table.start == 0
			load_desc_rec table.offset, table.u1, table.start
		end
	end
end

#load_idx ⇒ `Object`

corresponds to

_pst_build_id_ptr

# File 'lib/mapi/pst.rb', line 588

def load_idx
	@idx = []
	@idx_offsets = []
	if header.version_2003?
		@idx = Index64.load_chain io, header
		@idx.each { |idx| idx.pst = self }
	else
		load_idx_rec header.index1, header.index1_count, 0
	end

	# we'll typically be accessing by id, so create a hash as a lookup cache
	@idx_from_id = {}
		@idx.each do |idx|
		warn "there are duplicate idx records with id #{idx.id}" if @idx_from_id[idx.id]
		@idx_from_id[idx.id] = idx
	end
end

#load_idx2(idx) ⇒ `Object`

# File 'lib/mapi/pst.rb', line 856

def load_idx2 idx
	if header.version_2003?
		id2 = ID2Assoc64.load_chain idx
	else
		id2 = load_idx2_rec idx
	end
	ID2Mapping.new self, id2
end

#load_idx2_rec(idx) ⇒ `Object`

corresponds to

_pst_build_id2

# File 'lib/mapi/pst.rb', line 867

def load_idx2_rec idx
	# i should perhaps use a idx chain style read here?
	buf = pst_read_block_size idx.offset, idx.size, false
	type, count = buf.unpack 'v2'
	unless type == 0x0002
		raise 'unknown id2 type 0x%04x' % type
		#return
	end
	id2 = []
	count.times do |i|
		assoc = ID2Assoc.new buf[4 + ID2Assoc::SIZE * i, ID2Assoc::SIZE]
		id2 << assoc
		if assoc.table2 != 0
			id2 += load_idx2_rec idx_from_id(assoc.table2)
		end
	end
	id2
end

#load_idx_rec(offset, linku1, start_val) ⇒ `Object`

load the flat idx table, which maps ids to file ranges. this is the recursive helper

corresponds to

_pst_build_id_ptr

# File 'lib/mapi/pst.rb', line 610

def load_idx_rec offset, linku1, start_val
	@idx_offsets << offset

	#_pst_read_block_size(pf, offset, BLOCK_SIZE, &buf, 0, 0) < BLOCK_SIZE)
	buf = pst_read_block_size offset, Index::BLOCK_SIZE, false

	item_count = buf[ITEM_COUNT_OFFSET]
	raise "have too many active items in index (#{item_count})" if item_count > Index::COUNT_MAX

	idx = Index.new buf[BACKLINK_OFFSET, Index::SIZE]
	raise 'blah 1' unless idx.id == linku1

	if buf[LEVEL_INDICATOR_OFFSET] == 0
		# leaf pointers
		# split the data into item_count index objects
		buf[0, Index::SIZE * item_count].scan(/.{#{Index::SIZE}}/mo).each_with_index do |data, i|
			idx = Index.new data
			# first entry
			raise 'blah 3' if i == 0 and start_val != 0 and idx.id != start_val
			idx.pst = self
			# this shouldn't really happen i'd imagine
			break if idx.id == 0
			@idx << idx
		end
	else
		# node pointers
		# split the data into item_count table pointers
		buf[0, TablePtr::SIZE * item_count].scan(/.{#{TablePtr::SIZE}}/mo).each_with_index do |data, i|
			table = TablePtr.new data
			# for the first value, we expect the start to be equal
			raise 'blah 3' if i == 0 and start_val != 0 and table.start != start_val
			# this shouldn't really happen i'd imagine
			break if table.start == 0
			load_idx_rec table.offset, table.u1, table.start
		end
	end
end

#load_xattrib ⇒ `Object`

corresponds to

pst_load_extended_attributes

# File 'lib/mapi/pst.rb', line 754

def load_xattrib
	unless desc = desc_from_id(0x61)
		warn "no extended attributes desc record found"
		return
	end
	unless desc.desc
		warn "no desc idx for extended attributes"
		return
	end
	if desc.list_index
	end
	#warn "skipping loading xattribs"
	# FIXME implement loading xattribs
end

#name ⇒ `Object`



1797
1798
1799

# File 'lib/mapi/pst.rb', line 1797

def name
	@name ||= root_item.props.display_name
end

#pst_parse_item(desc) ⇒ `Object`

corresponds to

_pst_parse_item



1680
1681
1682

# File 'lib/mapi/pst.rb', line 1680

def pst_parse_item desc
	Item.new desc, RawPropertyStore.new(desc).to_a
end

#pst_read_block_size(offset, size, decrypt = true) ⇒ `Object`

corresponds to:

_pst_read_block_size
_pst_read_block ??
_pst_ff_getIDblock_dec ??
_pst_ff_getIDblock ??

# File 'lib/mapi/pst.rb', line 774

def pst_read_block_size offset, size, decrypt=true
	io.seek offset
	buf = io.read size
	warn "tried to read #{size} bytes but only got #{buf.length}" if buf.length != size
	encrypted? && decrypt ? CompressibleEncryption.decrypt(buf) : buf
end

#root ⇒ `Object`



1784
1785
1786

# File 'lib/mapi/pst.rb', line 1784

def root
	root_item
end

#root_desc ⇒ `Object`



1774
1775
1776

# File 'lib/mapi/pst.rb', line 1774

def root_desc
	@desc.first
end

#root_item ⇒ `Object`

# File 'lib/mapi/pst.rb', line 1778

def root_item
	item = pst_parse_item root_desc
	item.type = :root
	item
end

#warn(s) ⇒ `Object`

until i properly fix logging…



286
287
288

# File 'lib/mapi/pst.rb', line 286

def warn s
	Mapi::Log.warn s
end

Class: Mapi::Pst

Defined Under Namespace

Constant Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(io) ⇒ Pst

Instance Attribute Details

#desc ⇒ Object (readonly)

#header ⇒ Object (readonly)

#idx ⇒ Object (readonly)

#io ⇒ Object (readonly)

#special_folder_ids ⇒ Object (readonly)

Class Method Details

.make_property_set(property_list) ⇒ Object

.unpack(str, unpack_spec) ⇒ Object

Instance Method Details

#desc_from_id(id) ⇒ Object

#dump_debug_info ⇒ Object

#each(&block) ⇒ Object

#encrypted? ⇒ Boolean

#id2_block_idx_chain(idx) ⇒ Object

#idx_from_id(id) ⇒ Object

#inspect ⇒ Object

#load_desc ⇒ Object

#load_desc_rec(offset, linku1, start_val) ⇒ Object

#load_idx ⇒ Object

#load_idx2(idx) ⇒ Object

#load_idx2_rec(idx) ⇒ Object

#load_idx_rec(offset, linku1, start_val) ⇒ Object

#load_xattrib ⇒ Object

#name ⇒ Object

#pst_parse_item(desc) ⇒ Object

#pst_read_block_size(offset, size, decrypt = true) ⇒ Object

#root ⇒ Object

#root_desc ⇒ Object

#root_item ⇒ Object

#warn(s) ⇒ Object

#initialize(io) ⇒ `Pst`

#desc ⇒ `Object` (readonly)

#header ⇒ `Object` (readonly)

#idx ⇒ `Object` (readonly)

#io ⇒ `Object` (readonly)

#special_folder_ids ⇒ `Object` (readonly)

.make_property_set(property_list) ⇒ `Object`

.unpack(str, unpack_spec) ⇒ `Object`

#desc_from_id(id) ⇒ `Object`

#dump_debug_info ⇒ `Object`

#each(&block) ⇒ `Object`

#encrypted? ⇒ `Boolean`

#id2_block_idx_chain(idx) ⇒ `Object`

#idx_from_id(id) ⇒ `Object`

#inspect ⇒ `Object`

#load_desc ⇒ `Object`

#load_desc_rec(offset, linku1, start_val) ⇒ `Object`

#load_idx ⇒ `Object`

#load_idx2(idx) ⇒ `Object`

#load_idx2_rec(idx) ⇒ `Object`

#load_idx_rec(offset, linku1, start_val) ⇒ `Object`

#load_xattrib ⇒ `Object`

#name ⇒ `Object`

#pst_parse_item(desc) ⇒ `Object`

#pst_read_block_size(offset, size, decrypt = true) ⇒ `Object`

#root ⇒ `Object`

#root_desc ⇒ `Object`

#root_item ⇒ `Object`

#warn(s) ⇒ `Object`