Class: Mapi::Pst::BlockParser

Inherits:
Object
  • Object
show all
Includes:
Types::Constants
Defined in:
lib/mapi/pst.rb

Overview

the job of this class, is to take a desc record, and be able to enumerate through the mapi properties of the associated thing.

corresponds to

  • _pst_parse_block

  • _pst_process (in some ways. although perhaps thats more the Item::Properties#add_property)

Constant Summary collapse

TYPES =
{
	0xbcec => 1,
	0x7cec => 2,
	# type 3 is removed. an artifact of not handling the indirect blocks properly in libpst.
}
PR_SUBJECT =
PropertySet::TAGS.find { |num, (name, type)| name == 'PR_SUBJECT' }.first.hex
PR_BODY_HTML =
PropertySet::TAGS.find { |num, (name, type)| name == 'PR_BODY_HTML' }.first.hex
IMMEDIATE_TYPES =

these lists are very incomplete. think they are largely copied from libpst

[
	PT_SHORT, PT_LONG, PT_BOOLEAN
]
INDIRECT_TYPES =
[
	PT_DOUBLE, PT_OBJECT,
	0x0014, # whats this? probably something like PT_LONGLONG, given the correspondence with the
					# ole variant types. (= VT_I8)
	PT_STRING8, PT_UNICODE, # unicode isn't in libpst, but added here for outlook 2003 down the track
	PT_SYSTIME,
	0x0048, # another unknown
	0x0102, # this is PT_BINARY vs PT_CLSID
	#0x1003, # these are vector types, but they're commented out for now because i'd expect that
	#0x1014, # there's extra decoding needed that i'm not doing. (probably just need a simple
	#        # PT_* => unpack string mapping for the immediate types, and just do unpack('V*') etc
	#0x101e,
	#0x1102
]
ID2_ATTACHMENTS =

the attachment and recipient arrays appear to be always stored with these fixed id2 values. seems strange. are there other extra streams? can find out by making higher level IO wrapper, which has the id2 value, and doing the diff of available id2 values versus used id2 values in properties of an item.

0x671
ID2_RECIPIENTS =
0x692

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(desc) ⇒ BlockParser

Returns a new instance of BlockParser.

Raises:



991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
# File 'lib/mapi/pst.rb', line 991

def initialize desc
	raise FormatError, "unable to get associated index record for #{desc.inspect}" unless desc.desc
	@desc = desc
	#@data = desc.desc.read
	if Pst::Index === desc.desc
		#@data = RangesIOIdxChain.new(desc.pst, desc.desc).read
		idxs = desc.pst.id2_block_idx_chain desc.desc
		# this gets me the plain index chain.
	else
		# fake desc
		#@data = desc.desc.read
		idxs = [desc.desc]
	end

	@data_chunks = idxs.map { |idx| idx.read }
	@data = @data_chunks.first

	load_header

	@index_offsets = [@index_offset] + @data_chunks[1..-1].map { |chunk| chunk.unpack('v')[0] }
	@offset_tables = []
	@ignored = []
	@data_chunks.zip(@index_offsets).each do |chunk, offset|
		ignore = chunk[offset, 2].unpack('v')[0]
		@ignored << ignore
#				p ignore
		@offset_tables.push offset_table = []
		# maybe its ok if there aren't to be any values ?
		raise FormatError if offset == 0
		offsets = chunk[offset + 2..-1].unpack('v*')
		#p offsets
		offsets[0, ignore + 2].each_cons 2 do |from, to|
			#next if to == 0
			raise FormatError, [from, to].inspect if from > to
			offset_table << [from, to]
		end
	end

	@offset_table = @offset_tables.first
	@idxs = idxs

	# now, we may have multiple different blocks
end

Instance Attribute Details

#dataObject (readonly)

Returns the value of attribute data.



990
991
992
# File 'lib/mapi/pst.rb', line 990

def data
  @data
end

#data_chunksObject (readonly)

Returns the value of attribute data_chunks.



990
991
992
# File 'lib/mapi/pst.rb', line 990

def data_chunks
  @data_chunks
end

#descObject (readonly)

Returns the value of attribute desc.



990
991
992
# File 'lib/mapi/pst.rb', line 990

def desc
  @desc
end

#offset_tablesObject (readonly)

Returns the value of attribute offset_tables.



990
991
992
# File 'lib/mapi/pst.rb', line 990

def offset_tables
  @offset_tables
end

Instance Method Details

#get_data_indirect(offset) ⇒ Object

based on the value of offset, return either some data from buf, or some data from the id2 chain id2, where offset is some key into a lookup table that is stored as the id2 chain. i think i may need to create a BlockParser class that wraps up all this mess.

corresponds to:

  • _pst_getBlockOffsetPointer

  • _pst_getBlockOffset



1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
# File 'lib/mapi/pst.rb', line 1057

def get_data_indirect offset
	return get_data_indirect_io(offset).read

	if offset == 0
		nil
	elsif (offset & 0xf) == 0xf
		RangesIOID2.new(desc.pst, offset, idx2).read
	else
		low, high = offset & 0xf, offset >> 4
		raise FormatError if low != 0 or (high & 0x1) != 0 or (high / 2) > @offset_table.length
		from, to = @offset_table[high / 2]
		data[from...to]
	end
end

#get_data_indirect_io(offset) ⇒ Object



1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
# File 'lib/mapi/pst.rb', line 1072

def get_data_indirect_io offset
	if offset == 0
		nil
	elsif (offset & 0xf) == 0xf
		if idx2[offset]
			RangesIOID2.new desc.pst, offset, idx2
		else
			warn "tried to get idx2 record for #{offset} but failed"
			return StringIO.new('')
		end
	else
		low, high = offset & 0xf, offset >> 4
		if low != 0 or (high & 0x1) != 0
#				raise FormatError, 
			warn "bad - #{low} #{high} (1)" 
			return StringIO.new('')
		end
		# lets see which block it should come from.
		block_idx, i = high.divmod 4096
		unless block_idx < @data_chunks.length
			warn "bad - block_idx to high (not #{block_idx} < #{@data_chunks.length})"
			return StringIO.new('')
		end
		data_chunk, offset_table = @data_chunks[block_idx], @offset_tables[block_idx]
		if i / 2 >= offset_table.length
			warn "bad - #{low} #{high} - #{i / 2} >= #{offset_table.length} (2)"
			return StringIO.new('')
		end
		#warn "ok  - #{low} #{high} #{offset_table.length}"
		from, to = offset_table[i / 2]
		StringIO.new data_chunk[from...to]
	end
end

#handle_indirect_values(key, type, value) ⇒ Object



1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
# File 'lib/mapi/pst.rb', line 1106

def handle_indirect_values key, type, value
	case type
	when PT_BOOLEAN
		value = value != 0
	when *IMMEDIATE_TYPES # not including PT_BOOLEAN which we just did above
		# no processing current applied (needed?).
	when *INDIRECT_TYPES
		# the value is a pointer
		if String === value # ie, value size > 4 above
			value = StringIO.new value
		else
			value = get_data_indirect_io(value)
		end
		# keep strings as immediate values for now, for compatability with how i set up
		# Msg::Properties::ENCODINGS
		if value
			if type == PT_STRING8
				value = value.read
			elsif type == PT_UNICODE
				value = Ole::Types::FROM_UTF16.iconv value.read
			end
		end
		# special subject handling
		if key == PR_BODY_HTML and value
			# to keep the msg code happy, which thinks body_html will be an io
			# although, in 2003 version, they are 0102 already
			value = StringIO.new value unless value.respond_to?(:read)
		end
		if key == PR_SUBJECT and value
			ignore, offset = value.unpack 'C2'
			offset = (offset == 1 ? nil : offset - 3)
			value = value[2..-1]
=begin
			index = value =~ /^[A-Z]*:/ ? $~[0].length - 1 : nil
			unless ignore == 1 and offset == index
				warn 'something wrong with subject hack' 
				$x = [ignore, offset, value]
				require 'irb'
				IRB.start
				exit
			end
=end
=begin
new idea:

making sense of the \001\00[156] i've seen prefixing subject. i think its to do with the placement
of the ':', or the ' '. And perhaps an optimization to do with thread topic, and ignoring the prefixes
added by mailers. thread topic is equal to subject with all that crap removed.

can test by creating some mails with bizarre subjects.

subject="\001\005RE: blah blah"
subject="\001\001blah blah"
subject="\001\032Out of Office AutoReply: blah blah"
subject="\001\020Undeliverable: blah blah"

looks like it

=end

			# now what i think, is that perhaps, value[offset..-1] ...
			# or something like that should be stored as a special tag. ie, do a double yield
			# for this case. probably PR_CONVERSATION_TOPIC, in which case i'd write instead:
			# yield [PR_SUBJECT, ref_type, value]
			# yield [PR_CONVERSATION_TOPIC, ref_type, value[offset..-1]
			# next # to skip the yield.
		end

		# special handling for embedded objects
		# used for attach_data for attached messages. in which case attach_method should == 5,
		# for embedded object.
		if type == PT_OBJECT and value
			value = value.read if value.respond_to?(:read)
			id2, unknown = value.unpack 'V2'
			io = RangesIOID2.new desc.pst, id2, idx2

			# hacky
			desc2 = OpenStruct.new(:desc => io, :pst => desc.pst, :list_index => desc.list_index, :children => [])
			# put nil instead of desc.list_index, otherwise the attachment is attached to itself ad infinitum.
			# should try and fix that FIXME
			# this shouldn't be done always. for an attached message, yes, but for an attached
			# meta file, for example, it shouldn't. difference between embedded_ole vs embedded_msg
			# really.
			# note that in the case where its a embedded ole, you actually get a regular serialized ole
			# object, so i need to create an ole storage object on a rangesioidxchain!
			# eg:
=begin
att.props.display_name # => "Picture (Metafile)"
io = att.props.attach_data
io.read(32).unpack('H*') # => ["d0cf11e0a1b11ae100000.... note the docfile signature.
# plug some missing rangesio holes:
def io.rewind; seek 0; end
def io.flush; raise IOError; end
ole = Ole::Storage.open io
puts ole.root.to_tree

- #<Dirent:"Root Entry">
|- #<Dirent:"\001Ole" size=20 data="\001\000\000\002\000...">
|- #<Dirent:"CONTENTS" size=65696 data="\327\315\306\232\000...">
\- #<Dirent:"\003MailStream" size=12 data="\001\000\000\000[...">
=end
			# until properly fixed, i have disabled this code here, so this will break
			# nested messages temporarily.
			#value = Item.new desc2, RawPropertyStore.new(desc2).to_a
			#desc2.list_index = nil
			value = io
		end
	# this is PT_MV_STRING8, i guess.
	# should probably have the 0x1000 flag, and do the or-ring.
	# example of 0x1102 is PR_OUTLOOK_2003_ENTRYIDS. less sure about that one.
	when 0x101e, 0x1102 
		# example data:
		# 0x802b "\003\000\000\000\020\000\000\000\030\000\000\000#\000\000\000BusinessCompetitionFavorites"
		# this 0x802b would be an extended attribute for categories / keywords.
		value = get_data_indirect_io(value).read unless String === value
		num = value.unpack('V')[0]
		offsets = value[4, 4 * num].unpack("V#{num}")
		value = (offsets + [value.length]).to_enum(:each_cons, 2).map { |from, to| value[from...to] }
		value.map! { |str| StringIO.new str } if type == 0x1102
	else
		name = Mapi::Types::DATA[type].first rescue nil
		warn '0x%04x %p' % [key, get_data_indirect_io(value).read]
		raise NotImplementedError, 'unsupported mapi property type - 0x%04x (%p)' % [type, name]
	end
	[key, type, value]
end

#idx2Object

a given desc record may or may not have associated idx2 data. we lazily load it here, so it will never actually be requested unless get_data_indirect actually needs to use it.

Raises:



1037
1038
1039
1040
1041
1042
# File 'lib/mapi/pst.rb', line 1037

def idx2
	return @idx2 if @idx2
	raise FormatError, 'idx2 requested but no idx2 available' unless desc.list_index
	# should check this can't return nil
	@idx2 = desc.pst.load_idx2 desc.list_index
end

#load_headerObject

Raises:



1044
1045
1046
1047
1048
# File 'lib/mapi/pst.rb', line 1044

def load_header
	@index_offset, type, @offset1 = data.unpack 'vvV'
	raise FormatError, 'unknown block type signature 0x%04x' % type unless TYPES[type]
	@type = TYPES[type]
end